SOME GOODNESS OF FIT TESTS FOR RANDOM SEQUENCES

. In this paper we had made an attempt to incorporate the results from the theory of square Gaussian random variables in order to construct the goodness of ﬁts test for random sequences (time series). We considered two versions of such tests. The ﬁrst one was designed for testing the adequacy of the hypotheses on expectation and covariance function of univariate non-centered sequence, the other one was constructed for testing the hypotheses on covariance of the multivariate centered sequence. The simulation results illustrate the behavior of these tests in some particular cases.


Introduction
The task of investigating the properties of a random sequence is very important for its application. Very often some phenomena can be observed only at certain points in time. In some cases the values of continuous quantities, such as temperature and voltage, can be written only at discrete moments of time. And even if the observations can be recorded continuously we can only use discrete data for computational purposes. That is why we usually deal with random sequences or time series in practice. The latter term is used more frequently but we prefer to use the former one designating the connection to random processes.
There is much literature devoted to this topic, in particular the classic books on the statistical analysis of time series written by Anderson [1], Box and Jenkins [2], and Brockwell and Davis [4].
To date, many goodness-of-fit tests in time series are residual-based. For example, the classic portmanteau test of Box and Pierce [3] and its improvement by Ljung and Box [15] are based on the sample autocorrelations of the residuals. In the context of goodness of fit of nonlinear time series models, the McLeod and Li [16] test is based on the sample autocorrelations of the squared residuals. Based on a spectral approach of the residuals, Chen and Deo [6] proposed some new diagnostic tests. More recently, perhaps influenced by the empirical distribution function approach in the goodnessof-fit test for independent observations, substantial developments for time series data have taken place in the form of tests based on empirical processes marked by certain residuals, see for instance Chen and Härdle [7] and Escanciano [8]. For more information and details see the references within the literature proposed.
In the model-based approach to time series analysis, estimated residuals are computed once a fitted model has been obtained from the data, and these are then tested for "whiteness", i. e. it is determined whether they behave like white noise. Tests for residual whiteness generally postulate whiteness of the residuals as the Null Hypothesis, so that significant rejections indicate model inadequacy. These tests require the computation of residuals from the fitted model, which can be quite tedious when the model does not have a finite order autoregressive representation. Also, in such cases, the residuals are not uniquely defined.
In this paper we use another approach gained from the theory of square Gaussian random variables. This theory was developed in works by Kozachenko et al. [11], [12], [13], for the investigation of stochastic processes. In the book by Buldygin and Kozachenko [5] the properties of the space of square Gaussian random variables were studied and the connection with Orlicz spaces of random variables was established. We use this theory for the construction goodness of fit tests on the expectation and covariance function for the non-centered univariate stationary Gaussian sequence and covariance function for the centered but multivariate stationary random sequence. Our tests do not require the computation Lithuanian Statistical Association, Statistics Lithuania Lietuvos statistikų sąjunga, Lietuvos statistikos departamentas ISSN 2029-7262 online 6 Some Goodness of Fit Tests for Random Sequences of residuals and can be applied to infinite order representations. This paper is the continuation of the work started in [14], which was devoted to testing the centered univariate random sequence. The paper consists of 5 sections and 2 annexes. The second section is devoted to the theory of square Gaussian random variables and contains the main definitions and results. In particular, we found the estimate for the distribution of the maximum of the quadratic form of the square Gaussian random variables. Sections 3 and 4 apply the estimator obtained in section 2 to construct different aggregate tests.
The test in section 3 was constructed for testing the aggregated hypothesis on the expectation and covariance function of the non-centered stationary Gaussian sequence. It is based on the approach used in stochastic process L 2 theory. Within this inference the process identification is made on the basis of the two main characteristics: mathematical expectation and covariance function.
In the section 4 we consider multivariate sequences. The approach analyzing the residuals dominates in the multivariate case too. See, for example, papers by Hosking [9], [10], Mahdi and McLeod [17] and the references therein. The goodness of fit test we have constructed for the centered Gaussian multivariate stationary sequence is based on fitting the covariance function.
The power properties of our tests were studied through simulations. Section 5 draws some conclusions. Some necessary mathematical calculations are relegated to the annexes at the end.

Square Gaussian random variables
Let Ξ = {γ i , i ∈ I} be a family of joint Gaussian random variables for which E γ i = 0 for all i ∈ I. Definition 1. [13] The space SG Ξ (Ω) is the space of square Gaussian random variables if any element ξ ∈ SG Ξ (Ω) can be presented as where − → γ T = (γ 1 , ..., γ r ), γ i ∈ Ξ, i = 1, r, A is a real-valued matrix; or the element ξ ∈ SG Ξ (Ω) is the square mean limit of the sequence {ξ n , n ≥ 1} of the form (1) It was proved by Buldygin & Kozachenko in [5] that SG Ξ (Ω) is a linear space. For the square Gaussian random variables the following results hold true.

Theorem 1. [11]
Let ξ T = (ξ 1 , ξ 2 , ..., ξ d ) be a random vector such that ξ i ∈ SG Ξ (Ω) and B be a symmetric semi-definite matrix. Then for all 0 < s < 1 √ 2 the following inequality is true where R(y) = 1 Theorem 2. Let {η m , 1 ≤ m ≤ M} be a sequence of random variables that can be presented as quadratic form of square Gaussian random variables (that is, η = ξ T B ξ, where B is a symmetric semi-definite matrix). Then, for any x ≥ 0 where and the function R is defined in Theorem 1.
, which is approximately the minimal point, we obtain (3) and The theorem is proved.

Testing a hypotheses on the expectation and covariance function of a univariate sequence
Using the inequality (3) it is possible to test a hypothesis on the expectation and covariance function of the noncentered univariate stationary Gaussian sequence. Hereinafter we will consider stationarity in a strict sense.
Let us consider the stationary sequence {γ(n), n ≥ 1} for which E γ(n) = a and E(γ(n) − a)(γ(n + m) − a) = B(m), m ≥ 0 is its covariance function. We assume that we have N + M consecutive observations of this random sequence. Let us choose the estimators in the following way: for the expectation for the covariance function We denote and introduce the random variables: It is easy to prove that these random variables are square Gausssian.
is actually the quadratic form of a square Gaussian random variable.
All the necessary calculations for the terms of E η(m) are included to the Annex to this section. Remark 2. If for every m B m = C −1 (m) is the inverse of matrix C(m), whose components are the covariances between the vector η(m) items then E η(m) = const for all m. But, in this case one should be careful since the matrices C(m) have to be invertible. If for significance level α and corresponding critical value ε α which can be found from the equation MW (ε α ) = α then the hypothesis H 0 should be rejected and accepted otherwise.
Remark 3. The probability of a type I error for Criterion 1 is less than or equal to α.
Example 1. Let us consider the non-centered Gaussian sequence {γ(n), n ≥ 1} whose elements can be presented according to the expression where β( j) = e −λ j , j ≥ 0, λ > 0 and {ζ k , k ∈ Z} is a sequence of independent random variables such that for all k E ζ k = 0, E ζ 2 k = 1. In this case E γ(n) = a for all n and Using the simulation study we investigated how the Criterion 1 works. We made 10 000 Monte Carlo simulations of the non-centered Gaussian stationary sequence γ(n), with a = E γ(n) and covariance function defined by (5) with fixed parameter λ. For our needs we used the the simulation methods developed in paper [18]. 1. Let us check the null hypothesis (H 0 ) that states that the stationary Gaussian sequence γ(n) has expectation a = 1 and covariance function defined by the formula (5) with parameter λ = 1 versus the alternative hypothesis (H a ) implying that the stationary Gaussian sequence γ(n) has expectation a = 1 and covariance function defined by the formula (5) with parameter λ = 0.5.
We simulated 10 000 realizations of the sequences defined by (4)  For the simulated sequences we obtained an estimate of the probability of a type I error α = 0 and the estimate for the probability of a type II error β = 0.23.
2. Let us now check the null hypothesis (H 0 ) that states that the stationary Gaussian sequence γ(n) has expectation a = 1 and covariance function defined by the formula (5) with parameter λ = 1 versus the alternative hypothesis (H a ) implying that the stationary Gaussian sequence γ(n) has expectation a = 0 and covariance function defined by the formula (5) with parameter λ = 1.
We used again the simulated 10 000 realizations of the sequence defined by (4) with parameters a = 1 & λ = 1 and another sequence with parameters a = 0.25 & λ = 1.
The required constants are the same as previously. In this case we obtained an estimate for the probability of a type I error α = 0 and the estimate for the probability of a type II error β = 0.0055.
Remark 4. It is evident that the more observations we have the more sensitive the criterion is. Finding the number N for which the null and alternative hypotheses can be distinguished is the subject of our continuing investigation.

Testing a hypotheses on the covariance function of a centered multivariate sequence
The inequality (3) can also be useful for testing a hypothesis on the covariance function of a centered multivariate random sequence.
Let us assume that the components of the multivariate random sequence − → γ (n), n ≥ 1 are jointly Gaussian, stationary (in the strict sense) sequences {γ k (n), n ≥ 1, k = 1, K} for which E γ k (n) = 0 and E γ k (n)γ l (n + m) = B kl (m), m ≥ 0 is the covariance function of these sequences. It is worth mentioning that for k = l B kk is the ordinary autocovariance function of the k-th component and when k = l B kl are the joint covariances or sometimes called cross-covariances. Hereinafter we shell use the term covariance function of the sequence − → γ (n).
We suppose that the sequence − → γ (n) is observed at points 1, 2, ..., N + M (N, M > 1). As an estimator of the covariance The estimator B kl (m) is unbiased: The random variables ∆ kl (m) = B kl (m) − B kl (m) are square Gaussian since B kl (m) can be presented as (γ k (1), ..., γ k (N)) T A (γ l (m + 1), ..., γ l (N + m)), where the matrix Let ∆(m) be a vector with components ∆ kl (m). If for significance level α and corresponding critical value ε α , which can be found from the equation MW (ε α ) = α, then the hypothesis H 0 should be rejected and accepted otherwise.
Remark 5. The probability of a type I error for the Criterion 2 is less or equal to α.  We assume that each component can be presented as moving average with coefficients β k ( j) = e −λ k j , λ k > 0, k = 1, 2, j ≥ 0 and the random variables ξ j are independent with zero mean and variance equal to 1.
If the components of − → γ (n) are not independent then the covariance function of this sequence has a form: In the case when k = l we obtain the covariance function of the k-th component: Let ∆ T (m) = (∆ 11 (m), ∆ 12 (m), ∆ 21 (m), ∆ 22 (m)) and matrix B = I 4 is the identity matrix of 4-th order. Then where Using simulations we investigated how the Criterion 2 works. We had made 10 000 Monte Carlo simulations of two sequences with covariance function defined by (6) and (7). For the simulations we used the methods described in the paper by Vasylyk et al. [18].
In this case we obtained an estimate for the probability of a type I error α = 0 and the estimate for the probability of a type II error β = 0.0104.
Remark 7. It is evident that the more observations we have the more sensitive the test is.

Conclusions
In this paper we estimated the distribution of the maximum of a random sequence which can be presented as a quadratic form of square Gaussian random variables. This result made it possible to build the criterion for testing a hypothesis on the expectation and covariance function of a non-centered univariate stationary Gaussian sequence and a hypothesis on the covariance function of a centered multivariate stationary Gaussian sequence. The simulation studies were also incorporated.
The inequality obtained in the section 2 can also be useful for testing the similar hypotheses for non-centered multivariate random sequences. Our test statistics are quite easy to compute and do not require the calculation of residuals from the fitted model. This is especially advantageous when the fitted model is not a finite order autoregressive model.
There is of course, a lot of room for improvement of the tests. Comparison with other tests and finding the number N for which the null and alternative hypotheses can be distinguishable are also very important issues for further investigation.

Acknowledgments
We are grateful to two anonymous referees for their insightful comments that have significantly improved the paper.

Annex to section 3
This Annex includes the requirements for the section 3 calculations for E η 2 a (m) and E η 2 Using the Isserlis' formula for the centered Gaussian random variables γ(n) := γ(n) − a we obtain Then Let us make the required calculation for E η 2 B (m).
Using the Isserlis' formula we obtain that

Annex to section 4
In chapter 4 we need to find the expectation of ∆ 2 kl (m) in order to calculate the value of E( ∆ T (m)B ∆(m)) (see formula (8)). Let us do it.