Application of spatial auto-beta models in statistical classification

In this paper, spatial data specified by auto-beta models is analysed by considering a supervised classification problem of classifying feature observation into one of two populations. Two classification rules based on conditional Bayes discriminant function (BDF) and linear discriminant function (LDF) are proposed. These classification rules are critically compared by the values of the actual error rates through the simulation study.


Introduction
An approach for spatial classification using Bayes rules was introduced by DuÄinskas [5]. This approach is based on conditional distributions of observations to be classified given training sample for continuous spatial index. Case with discrete spatial index for Gaussian Markov random fields is explored in [4,6,7]. General statistical analysis of spatial non-gaussian data associated with exponential family and based on generalized linear models has been analysed in [2,13]. Spatial discrimination based on BDF for feature observations having elliptically contoured distributions is implemented in [1,8].
In this paper we focus on auto-beta models introduced by Besag [2] for the case when the sufficient statistic as well as the canonical parameter are one-dimensional.
Moller [12] presented the simulation algorithms for several spatial one-parameter automodels. Specific attention will be paid to the multi-parameter auto-models that are properly studied in [10,9,3]. We consider a particular case of spatial auto-beta models for solving classification problem of feature observation by using plug-in discriminant functions.
This paper is organized as follows: the problem description and the introduction of spatial auto-beta model are presented in the first section and discriminant functions and error rates are analyzed in the next section; in Section 3 numerical experiments are described and the conclusions are in the last section.
We focus on the spatial auto-beta models (SABE) and supervised classification problem with fixed STL, when feature observation Z 0 , T = (Z , Y ) are given. Then conditional distribution for unlabeled observation Z 0 under SABE model is Denote the full conditional density function for the feature by where -is Euler Beta function. Spatial auto-beta models have been recently studied by several authors [11].
Then conditional BDF for SABE obtain the form where γ 0 (Ψ ) = ln(u), with u = The prior probabilities depend on the location of focal observation and the number of neighbours: where d ij is the distance between sites s i and s j , i, j = 1, . . . , n, N N i = N N 1 i + N N 2 i , where N N l i are sites belonging to the nearest neighbourhood set of s i in population Ω l , l = 1, 2.
So BDF allocates the observation in the following way: classify observation Z 0 given Z = z to the population Ω 1 if W (Z 0 ; Ψ ) 0 and to the population Ω 2 , otherwise.
We compare BDF with LDF for SABE in order to classify testing samples. In this work a modified LDF function is used where class conditional means and dispersions are used for the estimation. The modified LDF function: where γ * 0 (Ψ ) = ln(π 1 0 /π 2 0 ), π l 0 , -prior probability, µ l , σ 2 l -conditional means and variance for the distribution Beta(µ 0l ; φ 0l ), l = 1, 2.
From the statistical decisions theory it is known that Bayes discriminant functions ensures the minimum of misclassification probability. Definition 1. The Bayes error rate for the W (Z 0 ; Ψ ) specified in (2) is defined as where for l = 1, 2, with H(·) denoted the Heaviside step function: H(ν) := 1 ν>0 and probability measure P lz is based on conditional Beta distribution with pdf f 0l specified in (1).
In practice parameter estimators are obtained by maximizing the pseudo-likelihood function, i.e.: By replacing the parameters with their estimators in W (Z 0 ; Ψ ) and L(Z 0 ; Ψ ) we construct their plug-in versions denoted by W (Z 0 ; Ψ ) and L(Z 0 ; Ψ ).
The actual error rate for the L(Z 0 ; Ψ ) which is denoted by LAR( Ψ ) is defined in (6), when W (Z 0 ; Ψ ) is replaced by L(Z 0 ; Ψ ).

Numerical experiments
To evaluate proposed classification procedure a few different scenarios were chosen that differ in model shape defined by different parameter values. Based on the chosen parameter scenarios and using the first order neighbour scheme S = [1, N ] × [1, N ]: each site i ∈ S has four neighbours denoted as {i e = i + (1, 0), i w = i − (1, 0), i n = i + (0, 1), i s = i − (0, 1)} with obvious neighbour adjustments near the boundary. Conditional natural parameter expressions are chosen for population: In this case parameter vector Ψ l = (β l 1 , β l 2 , β l 3 , β l 4 , η 1 , η 2 ) First, based on the chosen parameter scenarios, 100 replications of data were generated. Each simulation was divided into two sets: 80% training sample and 20% testing sample. In the learning stage, training sample is used to build the model and in the testing stage sample is used to compare classification rules. In the learning stage all feature values of the attributes and spatial dependency are used to build the model and in the testing step one value is hidden. In this stage model parameter vector is considered unknown and model parameters are evaluated using maximum pseudo-likelihood method described in (5). Therefore, simulations are conducted on the lattice size n = 16 × 16. Two types of parametrical structures were chosen: when all parameters are fixed except the class 2 mean tendency parameter and when spatial dependency parameter that describes effects of the north-south neighbourhood points is changing. Chosen parameter values are presented in Table 1.
The calculations were performed using 3 kinds of prior probabilities: 1) when π 1 0 = π 2 0 = 0.5 and actual error rate is noted as AR 1 ; 2) using inverse distance function with all training sample points AR 2 ; 3) using inverse distance function for neighbour points of up to fourth order AR 3 The Actual error rate ratio for different priors is presented in Fig. 1.
In both cases when beta parameter changes and when eta parameters change the ratio AR 1 /AR 3 and AR 2 /AR 3 is greater than 1, i.e. the smallest AR estimates are obtained when priors are calculated using the third method. Actual error rate values were compared for different classification rules when prior probabilities are calculated by the third method described above.
Actual error rate ratio curves are presented in A part of Fig. 2. When beta values are chosen less than 1 the ratio is greater than 1 and LDF based classification rule   is performing better. When β > 1 the ratio decreases and BDF based classification rule gains advantage. In B part when eta is chosen less than 10 the ratio is less than 1 and BDF based classification rule is performing better. When eta value is chosen 25 or greater LDF based classification rule gains advantage.

Discussion and conclusions
In this paper we proposed two classification rules for non-gaussian spatial data based on auto-beta models in the frameworks of Bayesian and linear discriminant functions. Simulation data study was conducted to estimate and empirically compare the BDF classifier with LDF classifier for various parametric structures and prior class probability models. Numerical analysis showed that: 1. While considering the situations with different prior probabilities better results are achieved by including fourth order neighbours in calculating prior probabilities in cases when prior probabilities are equal and when prior probabilities include all training points.
2. In situation when all parameters are fixed except for the second class mean tendency parameter the results show that class separability increases when mean tendency parameters increase. When β < 1 LDF has advantage and when β > 1 DBF performs better.
3. In situation when all parameters are fixed except for the spatial dependency parameter that is common for both classes the results yield that class separability decreases when spatial dependency parameter increases, and Actual error rate approximates 1.
The results of performed calculations in all examples give us the strong argument to encourage the users to consider non-gaussian spatial data models directly ignoring various data normalization procedures.