Discriminant analysis of the equicorrelated Gaussian observations

In this paper the problem of classification of an observation into one of two Gaussian populations with different means and common variance is considered in the case when equicorrelated training sample is given. Unknown means and common variance are estimated from training sample and these estimators are pluged in the Bayes discriminant function. The maximum likelihood estimators are used. The approximation of the expected error rate associated w ith Bayes plug-in discriminant function is derived. Numerical analysis of the accuracy of that approximation for various values of correlation is presented. Keywords: equicorrelation, Bayes discriminant function, actual error rate, expected error rate. Introduction Discriminant analysis (DA) sometimes called supervised classification traditionally assumes that observations to be classified and observations in training sample are independent. However, in practical situations with temporally and spatially distributed data this is usually not the case. Data that are close together in time or space, are likely to be correlated, at best equicorrelated [4, 5]. Equicorrelation arises naturally from physical and biological considerations [1, 2]. Thus, to include temporal or spatial dependencies in the classification problem is very important. In this paper, we consider the performance of the plug-in linear Bayes discriminant function (PBDF) when the parameters are estimated from training samples as realizations of a equicorrelated Gaussian random process. We use the maximum likelihood (ML) estimators of unknown parameters of means and common variance assuming that the correlation is known. 1. The main concepts and definitions The model ofZ(s) in population l is Z(s) = β ′ l x(s) + ε(s), s ∈ D ⊂ I, wherex(s) is aq ×1 vector of non random regressors whose first element is 1 and βl is aq ×1 vector of parameters, l = 1,2 andI is index set. The error term{ε(s): s ∈ D ⊂ I} is zero-mean stationary spatial Gaussian random process with covariance function defined by model for all s,u ∈ D cov { ε(s), ε(u) } = { σ 2ρ if s = u, σ 2 if s = u, Discriminant analysis of the equicorrelated Gaussian observations 365 whereσ 2 is constant variance and ρ is the intraclass and the interclass correlation. Consider the problem of classification of the observation Z0 = Z(s0), with s0 ∈ D, into one of two populations specified above. Assume that training sample T is also given. Since the observation Z0 is equicorrelated with observations from training sample, we have to deal with conditional means and variance μlT (s0;β) = E(Z0/T ; l), σ 2 0T (σ 2) = V (Z0/T ; l), l = 1,2. (1) Suppose that we observe the training sample T ′ = (T ′ 1,T ′ 2), whereTl is thenl × 1 vector ofnl observations of Z(s) from l, l = 1,2. ThenT is then × 1 vector, where n = n1 + n2. Assume that 2 q × 1 parameter vector β ′ = (β ′ 1,β ′ 2) andσ 2 are unknown andρ is known. Let β̂ andσ̂ 2 be the estimators of β andσ 2, respectively, based on T . Denote the 2q × 1 vector of parameters by ′ = (β ′,σ 2) and denote the vector of their estimators by ̂′ = (β̂ ′, σ̂ 2). Let π1, π2 be prior probabilities of 1 and 2. The plug-in BDF is obtained by replacing the parameters in (1) with their estimators. Then PBDF to the classification problem specified above is W(Z0; ̂) = ( Z0 − 1 2 (μ̂lT + μ̂2T ) ) (μ̂lT − μ̂2T )/σ̂ 2 0T + γ, (2) where μ̂lT = μlT (s0; β̂), σ̂ 2 0T = σ 2 0T (σ̂ 2), γ = ln(π1/π2). In the considered case the actual error rate for W(Z0; ̂) can be rewritten as P (̂) = 2 ∑ l=1 ( πl (Q̂l) ) , (3) where () is the standard normal distribution function, and Q̂l = (−1)′ (μ0 lT − 2(μ̂lT + μ̂2T ))(μ̂lT − μ̂2T )/σ̂ 2 0T + γ √ (μ̂1 − μ̂2)σ 2 0T /(σ̂ 2 0T )2 , l = 1,2. (4) DEFINITION 1. The expectation of the actual error rate with respect to the distribution of T , designated as ET {P (
̂)}, is called the expected error rate (EER). Hence the EER for the considered problem of Z0 classification by PBDF is ET (P (
̂)) = ET { 2 ∑ l=1 ( πl (Q̂l) )} . 366 K. Dučinskas, J. Neverdauskaitė 2. The proposed approximation Suppose that the model of T is T = Xβ + E, whereβ ′ = (β ′ 1,β ′ 2) andE is then-vector of random errors that has multivariate Gaussian distributionNn(0,σ 2,R). The ML estimator ofβ and bias adjusted ML estimator of σ 2 [3] are β̂ = (X R−1X)−1XT R−1T , (5) σ̂ 2 = (T − Xβ̂)R−1(X − Xβ̂)/(n − 2q). (6) whereR is the coreelation matrix of T . The Mahalanobis distance between conditional distributios of Z0 specified by is 0 = ∣∣(μ01T − μ2T )/σ0T ∣∣. PutRβ = (X′R−1X)−1,ρ0 = ρ/(1− ρ + nρ). Denote the approximation of EER ET {P (
̂)} by AER. THEOREM 1. Suppose that observation Z0 is classified by PBDF defined in (2)and let ML estimators of parameters specified in (5), (6)be used. Then the approximations of EER based on Taylor series expansion is AER = PB + π1φ(− 0/2− γ/ 0) × {[ρ0X′1n − γ1 × x0]′Rβ[ρ0X′1n − γ1 × x0] + 2γ 2/(n − 2q)}/(2 0), where PB is Bayes error and γ ′ 1 = ( 0/2+ γ/ 0, 0/2− γ/ 0), x0 = x(s0). Proof. Taylor series expansion of the actual error rate given by formulas (3) an d (4) up to the second order derivatives about true values of parameters is used. Taking the expectation of it by distribution of training sample T the proof is completed. Remark. In the caseρ = 0, the proposed approximation of EER agrees with the asymptotic approximations of EER for independent Gaussian observations case [5]. The order of the remainder term of the Taylor expansion depends on the sampling design of the training sample. 3. Numerical illustration and discussions Numerical examples for comparison and evaluation of the accuracy of the derived approximations of EER is implemented for the constant means, i.e., q = 1 andx(s) = 1. Thenμl(s) = βl, l = 1,2 andX = 1n1 ⊕ 1n2. Discriminant analysis of the equicorrelated Gaussian observations 367 Table 1. Values of AER, TER, AER/TER for the case n1 = n2 = n0 andπ1 = π2 = 0.5 n0 = 4 n0 = 20 ρ AER TER AER/TER AER TER AER/TER = 0.2 = 0.2 0 0.46265 0.49164 0.94105 0.46067 0.48140 0.95694 0.1 0.46047 0.49027 0. 93922 0.45854 0.47779 0.95970 0.2 0.45805 0.48801 0. 93860 0.45603 0.47252 0.96511 0.3 0.45515 0.48463 0. 93917 0.45301 0.46516 0.97388 0.4 0.45157 0.47944 0. 94187 0.44927 0.45462 0.98923 0.5 0.44697 0.47100 0. 94897 0.44445 0.43898 1.01247 0.6 0.44075 0.45613 0. 96629 0.43795 0.41503 1.05522 = 1 = 1 0 0.31954 0.34720 0.92034 0.31074 0.31100 0.99917 0.1 0.30986 0.33004 0. 93886 0.30133 0.29359 1.02636 0.2 0.29917 0.30482 0. 98147 0.29042 0.27081 1.07242 0.3 0.28661 0.27300 1. 04991 0.27750 0.24280 1.14292 0.4 0.27142 0.23424 1. 5875 0.26188 0.20800 1.25899 0.5 0.25250 0.18784 1. 34425 0.24245 0.16456 1.47338 0.6 0.22803 0.13253 1. 72056 0.21744 0.11128 1.95401 The Mahalanobis distance between marginal distributions of Z0 is specified by = ∣∣(β1 − β2)/σ ∣∣. With an insignificant loss of generality the cases with n1 = n2 = n0, π1 = 0.5. Computed values of proposed approximation AER is compared with theoretical values obtained by using the procedures of numerical integration of Maple 9.5. Denote these theoretical values by TER. The values of AER, TER and AER/TER are given in Table 1 for various values of andρ with n0 = 4 and 20. From Table 1 it is evident for bothn0 = 4 and 20, the values of AER and TER decreases as andρ increases. The reason of this effect is the increasing of the Mahalanobis distance 0 when andρ increases. It is also evident from Table 1 that deviation of AER from TER evaluated by and ratio AER/TER show the high accuracy of pr o osed approximation for = 0.2 and = 1 and all selected values of ρ.


Introduction
Discriminant analysis (DA) sometimes called supervised classification traditionally assumes that observations to be classified and observations in training sample are independent. However, in practical situations with temporally and spatially distributed data this is usually not the case. Data that are close together in time or space, are likely to be correlated, at best equicorrelated [4,5]. Equicorrelation arises naturally from physical and biological considerations [1,2]. Thus, to include temporal or spatial dependencies in the classification problem is very important.
In this paper, we consider the performance of the plug-in linear Bayes discriminant function (PBDF) when the parameters are estimated from training samples as realizations of a equicorrelated Gaussian random process. We use the maximum likelihood (ML) estimators of unknown parameters of means and common variance assuming that the correlation is known.

The main concepts and definitions
The model of Z(s) in population l is where x(s) is a q × 1 vector of non random regressors whose first element is 1 and β l is a q × 1 vector of parameters, l = 1, 2 and I is index set. The error term {ε(s): s ∈ D ⊂ I } is zero-mean stationary spatial Gaussian random process with covariance function defined by model for all where σ 2 is constant variance and ρ is the intraclass and the interclass correlation. Consider the problem of classification of the observation Z 0 = Z(s 0 ), with s 0 ∈ D, into one of two populations specified above. Assume that training sample T is also given. Since the observation Z 0 is equicorrelated with observations from training sample, we have to deal with conditional means and variance Suppose that we observe the training sample T = (T 1 , T 2 ), where T l is the n l × 1 vector of n l observations of Z(s) from l , l = 1, 2. Then T is the n × 1 vector, where n = n 1 + n 2 . Assume that 2q × 1 parameter vector β = (β 1 , β 2 ) and σ 2 are unknown and ρ is known.
The plug-in BDF is obtained by replacing the parameters in (1) with their estimators. Then PBDF to the classification problem specified above is In the considered case the actual error rate for W (Z 0 ;ˆ ) can be rewritten as where () is the standard normal distribution function, and DEFINITION 1. The expectation of the actual error rate with respect to the distribution of T , designated as E T {P (ˆ )}, is called the expected error rate (EER).
Hence the EER for the considered problem of Z 0 classification by PBDF is π l (Q l ) .

The proposed approximation
Suppose that the model of T is where β = (β 1 , β 2 ) and E is the n-vector of random errors that has multivariate Gaussian distribution N n (0, σ 2 , R).
The ML estimator of β and bias adjusted ML estimator of σ 2 [3] arê where R is the coreelation matrix of T . The Mahalanobis distance between conditional distributios of Z 0 specified by is Denote the approximation of EER E T {P (ˆ )} by AER.

THEOREM 1. Suppose that observation Z 0 is classified by PBDF defined in (2) and let ML estimators of parameters specified in (5), (6) be used. Then the approximations of EER based on Taylor series expansion is
where P B is Bayes error and Proof. Taylor series expansion of the actual error rate given by formulas (3)  Remark. In the case ρ = 0, the proposed approximation of EER agrees with the asymptotic approximations of EER for independent Gaussian observations case [5].
The order of the remainder term of the Taylor expansion depends on the sampling design of the training sample.

Numerical illustration and discussions
Numerical examples for comparison and evaluation of the accuracy of the derived approximations of EER is implemented for the constant means, i.e., q = 1 and x(s) = 1. Then µ l (s) = β l , l = 1, 2 and X = 1 n 1 ⊕ 1 n 2 . Table 1. Values of AER, TER, AER/TER for the case n 1 = n 2 = n 0 and π 1 = π 2 = 0.5 The Mahalanobis distance between marginal distributions of Z 0 is specified by With an insignificant loss of generality the cases with n 1 = n 2 = n 0 , π 1 = 0.5. Computed values of proposed approximation AER is compared with theoretical values obtained by using the procedures of numerical integration of Maple 9.5. Denote these theoretical values by TER.
The values of AER, TER and AER/TER are given in Table 1 for various values of and ρ with n 0 = 4 and 20. From Table 1 it is evident for both n 0 = 4 and 20, the values of AER and TER decreases as and ρ increases.
The reason of this effect is the increasing of the Mahalanobis distance 0 when and ρ increases.
It is also evident from Table 1 that deviation of AER from TER evaluated by and ratio AER/TER show the high accuracy of proposed approximation for = 0.2 and = 1 and all selected values of ρ.