An Empirical Study for the Estimation of Autoregressive Hilbertian Processes by Wavelet Packet Method

In this paper wavelet packet bases are used for an estimation of the autoregressive Hilbertian processes operator. We assume that integral operator kernel can have some singular structures and estimate them by projecting functional processes on suitable bases. Linear methods for continuous-time prediction using Hilbert-valued autoregressive processes are compared with the suggested method on simulated data and on real-life data sets. Statistics of residual partial sums processes and Ex poste prediction are used to check the model.


Introduction
In many real life applications, the data and the model can be structured in several ways.In some time series it is possible to interpret the observed data as the realisation of the functional process.Recently there has been much interest in the possibility of using autoregressive Hilbertian model to predict the weather [1], harmonic levels in electrical networks [2] or cash flow in automatic teller machine networks [3].In all the mentioned papers, the projection on the finite number of principal components of the empirical covariance operator has been used to predict future functional observations.A comprehensive theoretical study of this method has been presented by Bosq [4].
Fourier transform and spline bases have been used in the above-mentioned papers.The authors used a rule of thumb to determine the number of bases.Further multivariate principal components analysis and cross-validation criteria have been used to reduce the dimensionality of the space.
Wavelet bases can be used instead of Fourier transform or spline bases for a less regular functions.Antoniadis and Sapatinas [5] in their paper proposed to exploit some wavelet bases regularity properties by suggesting three linear methods using wavelet In this paper we are concerned with the autoregressive Hilbertian process that has some singular structures in its integral kernel.For the Gaussian functional noise and autoregressive operator with some singularities, it can be assumed that the covariance operator principal components bases are not the best one for the estimation of the model parameters.
In order to deal with the autoregressive operator singularities we suggest to use wavelet packet bases (WPB) so that the most suitable bases from the dictionary of admissible bases can be found.
Recently, the best basis methods have appeared in a variety of papers in the area of image compression and denoising (see [6] for review).The arguments presented by these papers in favor of the wavelet packet bases can be applied to functional data with some singular structures as well.
The rest of the article is organised as follows.The functional autoregressive process and autoregressive operator kernel with singular structures are introduced in Section 2. The residual partial sums processes and a certain class of functionals of the residuals partial sums are introduced.These functionals serve as tests of stability and as a criterion for parameters estimation.The section concludes with the simulation results, which show that the proposed method improves the prediction results for a certain class of functional processes.
Section 3 presents the standard and non-standard forms of operators decomposition and discusses their extensions with the wavelet packet dictionaries.Beylkin et al. [7,8] have demonstrated that some operator can be presented in the non-standard form as the sparse matrix and there exist algorithms that apply those operators in order of O(N ) operations.The estimation of the operator of a first-order functional autoregressive process based on the wavelet packet transformation is presented in Section 4. The uniform moving residual sums statistics are used to test the stability and suitability of the model.
The real life application of the use of the wavelet packet transformation is presented in Section 5.The final section concludes with suggestions for future research.

Autoregressive Hilbertian and residual partial sums processes
Let H denote the Hilbert space L 2 (0, 1) with the norm x 2 = 1 0 x 2 (t)dt and with the inner product x, y = 1 0 x(t)y(t)dt.A sequence (ξ i = ξ i (t), t ∈ [0, 1]; 0 < i ≤ N ) of random variables with values in H is said to follow a Hilbert space valued autoregressive process of the first order (ARH(1)) associated with (ε, ρ) if it is stationary and such that where ) is an H white noise, and operator ρ : H → H is linear and compact [9].Row residuals ( ε i ) of the model (1) are defined in the usual way by where ρ is an estimate of the operator ρ based on ξ 1 , . . ., ξ N .
In order to test the model (1) for stability and suitability we consider the process where S(0) = 0 and Residual partial sums process weak convergence to a Brownian motion with values in the space H and in stronger topological framework e.g.Hölder space has been investigated in the paper by Laukaitis and Rackauskas [10].To test ARH(1) model stability, they suggested to use the uniform moving residual sums statistics where 0 < α < 1.The dependency of the statistic MS(N, α) from the choice of Hölder parameter α has been presented at [10].The authors demonstrated that statistics MS(N, α) is more sensitive to the changes of (1) when Hölder exponent approaches 1/2.In this paper the statistic MS(N, α) was used to test if the data are consistent with the model (1).Our method is based on the choice of the best wavelet basis and this statistic is used to test how well the model fits the data for each wavelet basis. Let We now consider three special examples of the simulation that explains the motivation behind this work.
1.The two randomly generated Dirac basis (peaks) on the surface of the kernel β (one peak on the diagonal).In that case, the autoregressive operator will have the form where {v 1 , v 2 , v 3 } is an orthonormal system in H and 0 < A < 1 and 0 < B < 1.
2. As the second example the kernels with singularities of the form log |t − s| are considered.Applications of such operators have been analyzed by Bradley [11].
For the simulation studies the following operator has been used 3. The Gaussian kernel as an example of a smooth kernel function In addition to the wavelet packet basis approach, two methods presented by Bosq [4] and Antoniadis et al. [5] are used in the simulation.
The estimation of the operator ρ is ill-posed.In order to handle it Bosq [4] suggested the projection method on the span of principal components of the covariance operator.
be the empirical covariance and cross-covariance operators and let, H iN be the span of k N eigenvectors of Γ 0 associated with the largest eigenvalues, and let π kN be the orthogonal projector on this subspace.The u ⊗ v defines the tensor product for two fixed elements u, v ∈ H and is the bounded linear operator form H to H, defined by Let us define the regularized covariance and cross-covariance estimates as follows: The regularized estimator will be Under certain assumptions on the covariance operator, this estimator is consistent(for details see [4]).For less regular space Antoniadis et al. [5] proposed the method based on the wavelet-vaguelette decomposition and which is defined as follows and where Γ * 0 and Γ * 1 are the adjoint operator of Γ 0 and Γ 1 .Autoregressive operator estimation based on the Wavelet-vaguelette decomposition shows similar prediction accuracy as the covariance operator principal component projection method [5].A recent study from Laukaitis and Rackauskas [3] confirms those findings.
Table 1 presents our simulation results for the model (1).We assume that the autoregressive operator has one of the three forms presented above.Wiener process is used as the noise component ε i in our simulations.The methods of the principal components ( 6), the wavelet-vaguelette decomposition (7) and the wavelet packet bases (see Section 4) are compared by increasing the number of samples from 10 2 to 10 5 and calculating the mean quadratic errors of the forecasted variables.All simulation has been carried in the MATLAB environment.As we can see from Table 1, while consistent, method of the principal components perform poorly when we have small number of samples and if the best predictors of the future evolution have little to do with the largest principal components.
The examples and arguments provided in the working paper of Kargin and Onatski [12] confirms that the method of functional principal components is not always the best way to estimate autoregressive operator.We see that the method of wavelet packet basis gives more accurate estimation when we deal with the low number of observations.

The standard and non-standard form of operator decomposition
Operators with singularities in the kernel has been considered in the paper by Beylkin et al. [7,8].They investigated operators representation in the so-called non-standard form and demonstrated that for some operators with singularities sparse representation with algorithm of order O(N ) can be achieved.
By using wavelet bases the function β(s, t) from ( 2) can be represented in standard and nonstandard forms.The standard form is the representation of an operator in the tensor product basis.Let {V j } j∈Z be a multiresolution approximation [6] of the function β(s, t).If there is the coarsest scale n, then we have where subspace W j+1 is supplementary of subspace V j+1 and is defined as: In the case of finitely many scales the autoregressive operator is a representation of ρ j = P j ρP j , where P j is the projection on subspace V j .Using standard form representation we can decompose autoregressive operator by the set of operators acting between subspaces of different scales: Alternatively, wavelet bases in L 2 (R 2 ) , may be constructed using the scaling function in addition to the wavelets.In that case, the triplet of functions ψ j,k (t) ψ j,l (s), ψ j,k (t)φ j,l (s), φ j,k (t)ψ j,l (s), where j, k, l ∈ Z , forms a basis of L 2 (R 2 ).Representing operators in these bases leads to the non-standard form [8] as a chain of triplets ρ = {A j B j Γ j } j∈Z acting on the subspaces V j and W j :A j : W j → W j , B j : V j → W j , Γ j : W j → V j .The operators {A j B j Γ j } j∈Z are defined as A j = Q j ρQ j , B j = Q j ρP j and Γ j = P j ρQ j .They admit a recursive definition via the relation where the operators ρ j ,ρ j : V j → V j are defined by ρ j = P j ρP j .If there is a coarsest scale n, then where ρ n = P n ρP n .If the number of the scales is finite, then the operators are organised as blocks of a matrix.
In this paper we extend the notion of the standard and non-standard integral operator representation by considering operator decomposition in the wavelet packet basis.Wavelet packet bases has been introduced by Coifman et al. [13] as a generalization of the wavelet transform basis.This generalization of the traditional wavelet filter bank structure permits the representation of a function by one of many bases, each of which is constructed by a unique ensemble of scalings and translations of the same wavelet/scaling filter pair.
The motivation for using wavelet packet bases follows from the possibility to have a whole ensemble of localized representations beyond the traditional wavelet transform.Wavelets can isolate functions behavior in both length (time) and frequency.
Following the same approach as in Section 3 both standard and non-standard forms of autoregressive operator can be introduced for wavelet packet bases (more on the functions decomposition by wavelet packet bases can be found in [13] or [6]).

Estimation of the autoregressive operator by wavelet packets
Let Γ 0 be the covariance operator of the random curves {ξ i } defined by Γ 0 = E ξ i ⊗ ξ i so that Γ 0 x, z = E ξ i , x ξ i , z for x, z ∈ H. Let Γ 1 be the cross-covariance operator for the curves {ξ i } and {ξ i−1 } defined by It is easy to see that by multiplying (1) by ξ i−1 and taking expectation we have that the following relationship holds: An intuitive way for ρ estimation is to substitute the covariance and cross-covariance operators with their estimates In the expression above we have that Γ 0 is singular and we face with an ill-posed problem.As a consequence, obtaining a consistent estimate of ρ requires regularization of the solution.One way to achieve that would be to use projection method on the span of principal components as mentioned above (6).Mallat et al. [14] suggested to estimate locally stationary process eigenfunctions of the covariance operator by searching for their close match in the dictionary of local cosines bases.The best orthogonal basis provides a means of quickly computing compact, adaptive function approximations.In the case of the functional autoregressive processes we have the possibility to chose the basis close to the principal components basis if the principal components basis is the best one or to find better if the principal components basis is not optimal basis for the prediction.
Let D = (φ γ ) γ∈N be a dictionary of waveforms.Wavelet packets and local cosine bases can be as examples of such dictionaries.
The algorithm begins by assuming that the operator kernel β can be sparsely approximated by one of the wavelet packet admissible tree bases.From the fast dynamic programming algorithm of Coifman and Wickerhauser [15] follows that the best bases can be found by a bottom-up progression.At the bottom of deeps J there is only one basis for consideration W p J because bottom nodes are not subdecomposed.In the space span by basis W p J we take the pairs (φ γ1 , φ γ2 ) k of bases that maximizes correlation of functional process {ξ i } from ( 1) Starting from k = 1 the operator ρ is estimated in the bases (φ γ1 , φ γ2 ) 1 .The statistic MS(N, α) presented above is used to check the model.If we reject the null hypothesis that the residual partial sums process behaves like white noise when we continue increasing the number of pairs (φ γ1 , φ γ2 ) k .The next step after the model (1) evaluation in the space W p J is to follow as in [15] by investigating the spaces {W p j } 0≤p<2 j .When we finish the search in the admissible tree bases we get the most sparse solution.
Also, as it has been shown in our simulation example above, there are special cases when we can have basis better then principal components one.The example below illustrates that we can find in the dictionary the basis that is close to principal components basis.

Estimation of the payment systems transactions intensity
The operational cost control as well as the financial liquidity control is the motivation for the following analysis.For the banks, it is important to forecast the next day cash flow at any given moment of time.That is why the functional data analysis is suitable for such applications.
We tackle the problem of the payments prediction intensity in Lithuania cards payment market at ATM ( automated teller machine is an electronic device that allows a bank's customers to make cash withdrawals) networks between 03.11.2003 and 10.03.2004.Fig. 2 left side displays the functional data.Each function represents the number of transactions.Fig. 2 right side displays bivariate time series that we received after fixing the values from each functional process at 12:00 AM and 24:00 PM.By using the observations of the functions Λ(t), we build the observations for their derivatives λ(t).Namely, we consider λ(t j ) = Λ(t j ) − Λ(t j−1 ), i = 1, . . ., n, j = 1, . . ., 1024.
In the process of the exploratory data analysis we used statistic MS to judge about the model ( 1) suitability.This analysis helped us to find that we must censor empirical process by eliminating two weeks of Christmas holiday period (for details see [3]).In Table 2 we show values of the statistic MS calculated for transactions processes before and after we censored data.The distribution of the statistic MS can be found in the paper by Laukaitis and Rackauskas [10].
From the results of Table 2 we can say that after adjusting and estimating autoregressive operator the residuals of transactions intensity process follows white noise behavior with 94 % confidential level.Finally, let us discuss the prediction capability of the model.Consider the prediction problem of the function λ n+1 (t), t ∈ [0, 1].For prediction we used the model (1).Different types of the forecasting errors can be considered.The most widely accepted measures of the performance of the estimator of ξ n+1 is it's mean integrated square error given by We evaluated the MISE error by choosing ten days not used in the parameters estimation(see Table 3).
It is clear that the predicted values depends on the number of principal components used (for principal components estimation method) or the number of selected bases (for wavelet packet bases estimation method) in the model.As we can see the most optimal subspace dimension for principal components method is 4 and 7 for wavelet packet bases method.

Concluding remarks
We suggested the wavelet packet bases method for functional autoregressive process estimation.As we have seen the method shows better prediction compared with the other known methods when autoregressive kernel exhibits some singular structures and when the number of observations is limited.More mathematical investigation is needed to investigate asymptotic properties of this method.We hope to provide them in the future research work.

Table 1 .
(1)n quadratic errors for one-step-ahead forecast of the ξ in model(1)

Table 2 .
Statistic MS for transactions double stochastic Poisson process data set.Case A -No censoring and no differentiation.Case B -Christmas period censored