Classiﬁcation of Gaussian spatio-temporal data with stationary separable covariances

. The novel approach to classiﬁcation of spatio-temporal data based on Bayes discriminant functions is developed. We focus on the problem of supervised classifying of the spatio-temporal Gaussian random ﬁeld (GRF) observation into one of two classes speciﬁed by different drift parameters, separable nonlinear covariance functions and nonstationary label ﬁeld. The performance of proposed classiﬁcation rule is validated by the values of local Bayes and empirical error rates realized by leave one out procedure. A simulation study for spatial covariance functions belonging to powered-exponential family and temporal covariance functions of AR(1) models is carried out. The inﬂuence of the values of spatial and temporal covariance parameters to error rates for several label ﬁeld models are studied. The results showed that the proposed classiﬁcation methodology can be applied successfully in practice with small error rates and can be a useful tool for discriminant analysis of spatio-temporal data.


Introduction
Spatial supervised classification is a problem of labeling observations based on feature information and information about spatial adjacency relationships with training sample. Switzer [25] was the first to treat classification of spatial data. Atkinson and Lewis [1] reviewed geostatistical techniques for classification of remotely sensed images. De Oliveira [20] proposed spatial classification techniques based clipping of Gaussian random fields. Spatial contextual classification problems arising in geospatial domain is considered by Shekhar et al. [22]. It is usually assumed that feature observations conditional on labels are independent (conditional independence) and normally distributed and the labels follow the random field (RF) model. This approach is widely used in image classification rules for spatial Gaussian data by avoiding the assumption of conditional independence. Comprehensive overview of methods for statistical classification and discrimination of Gaussian spatial data is provided by Berrett and Calder [3]. The novel approach to classification of Gaussian Markov random fields observation is developed by Dučinskas and Dreižienė [11]. Critical comparison of spatial linear mixed models for ecological data based on the correct classification rates is performed by Dreižienė and Dučinskas [8].
Some authors have investigated the performance of the Bayes classification rules (BCR) when training samples consist of temporally dependent observations (see, e.g., [16,18]).
Spatio-temporal data are often collected at monitored discrete time lags in locations belonging to continuous area. Such type of data sets is usually viewed as a spatial time series (see, e.g., [7]).
Valid and practical covariance structures are needed to model these types of data sets in various disciplines such as environmental science, climatology and agriculture. Usually, in environmental and agricultural research, the data are recorded at regular time intervals (time lags) and at irregular stations (locations) in compact area (see, e.g., [14]). Recently, deep learning methods via convolutional neural networks have been intensively explored and used in image analysis and spatial data mining (see, e.g., [2,[27][28][29][30][31]).
However, statistical discriminant analysis of spatio-temporal data has been rarely considered previously (see, e.g., [15]). Šaltytė-Benth and Dučinskas [26] considered classification of spatio-temporal data modeled by GRF in particular case when observation of feature at focal location is uncorrelated with the training sample that consists of interdependent feature variables.
In the present paper, avoiding this restriction, we focus on the classification of data modeled by random fields with separable spatio-temporal covariance structures specified by geostatistical spatial margins and discrete temporal margins (see, e.g., [6]). Separability of covariances was assumed for the sake of reduction of complexity due to interdependencies between features.
The main distinctive feature of proposed approach is the allowing label field to be nonstationary in time for each location, i.e., class label at each location can vary in time. That essentially widens the application area of presented investigations.
For the performance of classifiers, the values of derived in local Bayes error rates and empirical error rates are used. Empirical error rates are validated by modified leave-oneout method when all but one observation is used to when complete the classification rule, and this rule is then used to classify the omitted observation (see, e.g., [12]). For numerical illustrations, the two powered-exponential isotropic models for spatial covariance are considered. Temporal covariance is obtained by the Yule-Walker equations for AR(1) models. Performance of proposed classification rule is compared for different parameters of pure spatial and temporal covariances and prior class probabilities models. This paper is organized as follows: proposed spatio-temporal data models and conditional distributions are delivered in the next section; in Section 3, conditional Bayes classification rules and its error rate is presented; in Section 4, the numerical illustrations and simulations for various separable stationary spatio-temporal covariance and prior probabilities models are displayed, and finally, the conclusions are in the last section. http://www.journals.vu.lt/nonlinear-analysis 2 Spatio-temporal data models and conditional distributions The main objective of this paper is to classify observations of GRF {Z(s; t): s ∈ D ⊂ R 2 , t ∈ D T = [0, ∞]}, where s and t define spatial and temporal coordinates, respectively. Let {Y (s; t): s ∈ D ⊂ R 2 , t ∈ D T } be a random field that represents class label and takes only the value 0 or 1 (see, e.g., [23]).
In this study, we assume that for l = 0, 1, the model of observation Z(s; t) conditional on Y (s; t) = l is Z(s; t) = µ l (s; t) + ε(s; t), where µ l (s; t) -deterministic spatio-temporal trend. The error term is assumed to be generated by the univariate zeromean GRF {ε(s; t): s ∈ D ⊂ R 2 , t ∈ T } with covariance function defined by model cov(ε(s; t), ε(u; r)) = C(s, u; t, r) for all s, u ∈ D and t, r ∈ T .
In present paper, we restrict our attention to the separable spatio-temporal covariance model C(s, u; t, r) = C S (s, u)C T (t, r), where C S (s, u) denotes pure spatial covariance between observations in locations s and u, and C T (t, r) denotes pure temporal covariance between observations at time points t and r. Under this assumption, the spatiotemporal covariance structure factors into a purely spatial and a purely temporal component, which allows for computationally efficient estimation and inference. Consequently, separable covariance models have been popular even in situations in which they are not physically justifiable. Many statistical tests for separability have been proposed recently and are based on parametric models (see, e.g., [5,13]) or spectral methods [21].
Let S n = {s i ∈ D, i = 1, . . . , n} be a set of locations, where observations are taken at time t ∈ D p = {1, 2, . . . , p, p + 1}. At every moment of time t ∈ D p , the set S n is split into two classes, S Denote n lt the number of locations (of n) at time t that belong to class l; thus n lt is the number of points in the set S (l) t , and n = n 0t + n 1t for every t ∈ D p . Hence a set of class labels at any time moment can differ in composition.
Joint training sample Z is stratified training sample specified by n × p matrix Z = (Z 1 , . . . , Z p ), where Z t = (Z(s 1 , t), . . . , Z(s n , t)) . This structure of data presentation is motivated by a model that assumes multivariate (in space) time series. Denote by z t = (z 1 t , . . . , z n t ) and y t = (y 1 t , . . . , y n t ) the realized value of Z t and Y t = (Y (s 1 , t), . . . , Y (s n , t)) , respectively.
In what follows, with an insignificant loss of generality, we focuse on the linear independent of time drift µ l (s; t) = β l x(s), where x(s) = (x 1 (s), . . . , x q (s)) is the vector of a spatial covariates, and β l is a q-dimensional vector of parameters, l = 0, 1.
Then the model of where vec(E) is the np × 1 vector of random errors that has normal distribution, i.e., . In present paper, we concern with the problem of classification of the observations Z(s i , p + 1), i = 1, . . . , n, into one of two classes with given joint training sample M or, in other words, based on training sample information we want to predict label at an unobserved location t = p + 1.
Set c p+1,r ) and e i -the ith row of identity matrix I n .
Under spatio-temporal data model specification, we can conclude that in l = 0, 1, the conditional distribution of Z(s i , p + 1) given M = m and Y (s i , p + 1) = l, is Gaussian, i.e., where In this study, we assume that the conditional distribution of label Y (s i , p + 1), i = 1, . . . , n, given joint training sample M depends only on class labels values, i.e., conditional distribution of (Y (s i , p This assumption is quite frequently used by image classification researches (see, e.g., [19]). Set P(Y (s i , p + 1) = l | M = m) = π l (s i , p + 1), l = 0, 1, and shortly call them prior class probabilities.
It is easy to deduce that discriminant function W (Z(s i , p + 1)) is optimal under the criterion of the minimum of misclassification probability (see [18]).
Call the probability of misclassification for W (Z(s i , p + 1)) as local Bayes error rate and denote it by P i . Also, denote squared Mahalanobis distance between conditional distributions by

Lemma 1. The local Bayes error rate is
where Φ(x) is the standard normal cumulative distribution function.
Proof. It is easy to derive that conditional distribution of W (Z(s i , p+1)) given M = m, Y (s i , p + 1) = l is univariate Gaussian distribution with mean Using properties of the multivariate Gaussian distribution, we complete the proof.
Error estimation is critical to classification because the validity of the resulting classifier model, composed of the classifier and its error estimate, is based on the accuracy of the error estimation procedure. Given a set of sample data, the data can be split between training and test data with a classifier being designed on the training data and its error being validated on the test data. In this paper, our focus is on using p temporal observations for training and the observations at p + 1th time moment is using for testing.
Performance of the classification rule based on W (Z(s i , p + 1)) could be evaluated by several methods (e.g., [12]). In the present study, we prefer the leave-one-out estimator or procedure when all but one (test observation) observation is used to complete the classification rule, and this rule (based on CBDF) is then used to classify the omitted observation. This procedure consists of simulating a sample of v independent values of Z(s i , p + 1), denoted by {Z j (s i , p + 1), j = 1, . . . , v}, drawn from conditional distribution specified in (2) with prescribed labels Y (s i , p + 1).

Numerical illustrations and simulations
For numerical illustrations of obtained results, we considered the Gaussian spatio-temporal model with pure spatial covariances belonging to the family of powered-exponential isotropic models and with pure temporal covariance of AR(1) model. It is known that for this model, c 1,1 T = c t,t T for t = 2, . . . , p + 1, parameter α quantifies temporal dependency by equation c 1,1 , and the inverse of temporal covariance matrix C T is obtained by the Yule-Walker equations (see [4]).
Temporal covariance matrix C −1 T is obtained by the Yule-Walker equations for AR(1) model, i.e., It is easy to derive that (c p+1 T ) C −1 T = αe p and ρ p+1 = σ 2 T , where e p denotes the pth row of identity matrix I p .
Hence µ p+1 li(m) = β l x i + α(e p ⊗ e i ) vec(E) and Σ p+1,i(m) = c ii S σ 2 T . Here α is AR(1) model parameter that quantifies temporal dependency, and σ 2 T is the white noise variance for this model.
In the study, two isotropic nugetless spatial covariance structures belonging to the powered-exponential family are considered. Assuming that C s = σ 2 s R, where R = (r ij ) is spatial correlation matrix, we concern on the following two particular cases: (i) exponential case with r ij = r(|s i − s j |) = e −|si−sj |/ϕ ; (ii) squared-exponential case r ij = r(|s i − s j |) = e −(|si−sj |/ϕ) 2 .
Here ϕ is the so called range parameter that represents the spatial dependence. This choice of is based on the smoothness level of sample paths. Sample paths of a GRF with the exponential covariance function are not smooth when the squared exponential covariance model has smooth sample paths.
Two methods for prior class probabilities is proposed. First one is based on temporal weighted moving average (TWMA) method Second one adds spatial correlations for weighting where i 0 denotes the index of the nearest neighbor to s i . Denote this method by (STWMA).
We have compared these four particular cases by calculating the P i and LO i for i = 1, . . . , n, and we have presented them in tables.
Numerical illustrations are performed on 20 locations on two dimensional area that are depicted in Fig. 1. Class labels for 20 locations and 4 time points in training sample is presented in Table 1.
Local Bayes error rates P i and their averages AP = 20 i=1 P i /20 for two cases of spatial covariances and two models for prior probabilities are presented in Table 2.
As it might be seen from Table 2, for α = 0.1, 0.3, classifiers with STWMA priors in majority locations have an advantage against cases with TWMA priors for both spatial covariance models. For large α values, significant difference between these two is not observed.   For v = 30 independent replications, local empirical error rates LO i and their averages ALO = Table 3. Local and average empirical error rates for ∆ = |µ p+1 1i(m) − µ p+1 0i(m) | = 1, ϕ = 3 and various α. i π 1t (s i , p + 1) π 1ts (s i , p + 1) α = 0. As it might be seen from Table 3, for all values of α, classifiers with STWMA and TWMA in majority locations have the similar empirical error rates for both spatial covariance models.
The last raw of Tables 2 and 3 (i.e., AP and ALO) allow us to compare averages of Bayes and empirical error rates for various combinations of spatial covariance and prior class probability models and to make optimal decisions in construction for the classifiers of spatio-temporal Gaussian data.

Conclusions
In this paper, we propose approach to classification of spatio-temporal data in the framework of Bayes discriminant for separable spatio-temporal covariance case stations. Several simulation studies were conducted to estimate and compare empirically the classifiers for various separable stationary spatio-temporal covariance and prior class probabilities models. Numerical analysis showed that: (i) Bayes and empirical error rates increases when temporal correlation increases; (ii) Incorporation spatial correlation in class prior probabilities improves the performances of classifiers; (iii) Classifiers with spatial squared-exponential covariance have an advantage against classifiers with exponential covariance.
The results of performed calculations in all examples give us the strong argument to encourage the users do not ignore the spatial, temporal dependency and locational information from training sample in classification of spatio-temporal data and to apply the proposed approach in deep learning for spatio-temporal data mining.