Error rates in spatial classification of Gaussian data with random labeling

In spatial classification it is usually assumed that features observations given labels are independently distributed. We have retracted this assumption by proposing stationary Gaussian random field model for features observations. The label are assumed to follow Disrete Random Field (DRF) model. Formula for exact error rate based on Bayes discriminant function (BDF) is derived. In the case of partial parametric uncertainty (mean parameters and variance are unknown), the approximation of the expected error rate associated with plug-in BDF is also derived. The dependence of considered error rates on the values of range and clustering parameters is investigated numerically for training locations being second-order neighbors to location of observation to be classified.


Introduction
Spatial supervised classification is a problem of classifying locations (sites) into several categories by learning the features observation and the adjacency relationships with training sample. Switzer [5] was the first to treat classification of spatial data. It is usually assumed that feature observations are independent conditional on class labels (conditional independence) and normally distributed. This approach is widely used in image classification [3].
In the case of complete parametric certainty, the formula of exact error rate due to Bayes classification rule (BCR) under described assumptions is derived by Nishii and Eguchi [4]. In this paper we have derived the above formula by retracting the assumption of conditional independence. The observation of features to be classified is assumed to be dependent on a training sample.
The stationary Gaussian Random Fields (GRF) model for features and DRF model for class labels are considered. In the case of partial parametric uncertainty, the original approximation of the expected error rate associated with plug-in BDF is proposed. This is the generalization of the similar approximations derived in the case of training sample with fixed sampling design and fixed labels [1]. The numerical analysis of derived exact error rate and proposed approximation of the expected error rate is carried out in the case of isotropic exponential spatial correlation function among features observations. For the second-order neighborhood system, the influence of the some statistical model parameters on the values of considered error rates is numerically evaluated.

The main concepts and definitions
The main objective of this paper is to classify the feature observations modeled by stationary Gaussian random field {Z(s): s ∈ D ⊂ R 2 }.
The marginal model of observation Z(s) in class Ω l is where µ l is constant mean and the error term is generated by zero-mean stationary Gaussian random field {ε(s): s ∈ D} with covariance function defined by the following model for all s, u ∈ D cov ε(s), ε(u) = σ 2 r(s − u), Denote by R the matrix of spatial correlations among components of Z. Suppose that S n is fixed, but the labels are distributed randomly on it.
So for Y = y, S n is partitioned into the union of two disjoint subsets, i.e., S n = S (1) y , where S (l) is the subset of S n that contains n l locations with labels equal l, l = 1, 2 (n 1 + n 2 = n).
Then the model of vector Z for given Y = y is where X y is the n × 2 design matrix, µ ′ = (µ 1 , µ 2 ) and E is the n-vector of random errors that has multivariate Gaussian distribution N n (0, σ 2 R).
Consider the problem of classification (estimation of Y (s 0 )) of the feature observation Z 0 = Z(s 0 ), s 0 ∈ D, s 0 / ∈ S n with given training sample T . Denote by r 0 the vector of spatial correlations between Z 0 and Z given in (1). So we have to deal with conditional Gaussian distribution of Z 0 given T = t with means

Error rates of spatial classification
At the beginning we specify the DRF model for class labels. Denote by {π(y) = P (Y = y)} the prior distribution of the labels vector Y .
Under the assumption that the classes are completely specified the Bayes discriminant function (BDF) [2] minimizing the probability of misclassification is formed by the logarithm of ratio of conditional densities described above. We shall call that situation the case of complete parametric certainty.
The conditional Mahalanobis distance given T = t is It is obvious that ∆ 0n depends on S n but does not depend on t.
Then conditional Bayes error rate (for given The exact Bayes error rate for W t (Z 0 ) is where E T denotes the expectation with respect to T distribution and k differences between classes. Suppose that means {µ l } and σ 2 are unknown and need to be estimated from training sample T .
The plug -in BDF (PBDF)is obtained by replacing the parameters in BDF with their estimates based on T = t. Then PBDF to the classification problem specified above is whereμ 0 lt = E(Z 0 |T = t; Y (s 0 ) = l) = µ l + α ′ 0 (z n − X yμ ), l = 1, 2 and In the considered case the actual error rate [1] for W t (Z 0 ;Ψ) is specified by and for l = 1, 2 Definition 1. The expectation of the actual error rate with respect to the joint distribution of T designated as E T {P (Ψ )}, is called the expected error rate (EER).
The EER is useful in providing a guide to the performance of PBDF before it is actually formed from training sample.Hence the EER for the considered problem of Z 0 classification by PBDF is Set H = (1, 1) ′ , G = (1, −1) ′ . In the present paper we consider increasing domain asymptotic scheme for spatial sampling. (2) and let the assumptions of theorem [1] hold.
Proof. The proof of lemma is based on Taylor series expansion about points µ =μ andσ 2 = σ 2 of P T (Ψ ) presented in (3), (4). Then taking the expectation of the main term of Taylor described above we complete the proof of lemma. For details see the proof of theorem [1].

Numerical example and conclusions
Here we analyze numerically the dependence of exact error rate on some statistical model parameters. Suppose D is 2-dimensional rectangular lattice with unit scaling, S 0 = (0, 0) and S 8 is the set of second-order neighbors to S 0 .
We consider the case of model (3) with constant means and isotropic exponential spatial correlation function given by r(h) = exp{−|h|/α}, where α is the range parameter. Denote by ρ the clustering parameter or granularity [4]. The non-negative parameter ρ gives the degree of spatial dependency of the class labels.