Adapted SETAR model for lithuanian HCPI time series

We present adapted SETAR (self-exciting threshold autoregressive) model, which enables simultaneous estimation of nonlinearity and unobserved time series components. This model was tested on real Lithuanian harmonised consumer price index (HCPI) time series, covering the period from January 1996 to December 2009. The results show that adapted SETAR model is able to capture features of the real time series with complex nature. ARIMA model has also been used for the same time series for the comparison. Evaluated models and results of the comparison are presented in this work.


Introduction
Social, economic, political and other changes that occur leave structural breaks, dynamic changes, business cycle asymmetries and changes in mean of economic time series. Structural breaks may produce a short-term transient effect or a long-term change in the model structure, such as change in mean. Short-term effects -one or more outliers, can create problems with standard time series methods unless such outliers are not modified by adjusting or removing outliers (e.g., by an intervention analysis), or by using of robust methods, which automatically downweight extreme observations (e.g., by a Kalman filter). It is more difficult to deal with long-term changes, because it can affect all subsequent time series observations or to change dynamic of the time series. Such features cannot be captured by conventional linear models with constant parameters.
In recent years there has been considerable interest in non-linear modelling of economic time series. Examples of these models are threshold, smooth transition autoregressive, Markov-switching models and neural networks.
Multi regime forecasting models, which allows for a smooth transition from one linear regime to the other were proposed by Bacon and Watts [6]. Threshold autoregressive models (TAR) were introduced by Tong [7] and extensively discussed in [8][9][10]. TAR is one of nonlinear time series modelling class. A basic feature of these models is that they allow for some sort or regime-switching have been applied to describe the different dynamic behaviour of time series.
There are various forms of TAR models. Basically they are linear autoregressive models in which the linear relationships varies over regimes depending on the threshold values. If the regime is determined by the past values of the time series, the model is described as self-exciting. We explore this class of non-linear models, named the selfexciting threshold autoregressive models (SETAR models) in this paper. SETAR models are sufficiently flexible to allow different relationships to apply over separate regimes. These models are good tool for modelling time series with unstable means, variances and complicated structure.
In this paper adapted SETAR model is proposed for real time series with difficult structure, for which standard linear models did not present expected results. We suggest to test a non-linearity of such time series and if it was confirmed, then to use adapted SETAR model for these time series modelling.
The paper is organized as follows. First, in Section 2 we introduce a SETAR model. In Section 3 we provide details on the model specification and parameter estimation procedure. Out-of-sample forecasting is described in Section 4. Time series used for modelling are overviewed in Section 5. This section also contains the empirical results of non-linear modelling of Lithuanian macroeconomic indicators and out-of-sample forecasting results. And finally, Section 6 contains some concluding remarks and prepositions.

Self-exciting threshold autoregressive model
Suppose that a univariate series {y t } = {y t , t = 1, 2, . . .} follows the two-regime selfexciting threshold autoregressive model SETAR (2 p 1 p 2 ): where I(y t−d > r) = 1 if y t−d > r and zero otherwise. 1,t and 2,t are sequences of independent and identically distributed random variables. Positive integer d is delay parameter -transition variable that governs changes in regime. r is the threshold value. For a given threshold r and the position of y t−d with respect to this threshold r, the time series {y t } follows AR(p 1 ) model or an AR(p 2 ) model. The model parameters are α i,j , for i = 1, 2 and j = 1, . . . , p k , k = 1 or 2, the delay d and threshold r.

Adapted SETAR model specification and parameter estimation procedure
Class of threshold autoregressive models (TAR) has not been widely used in applications because the main problems in the analysis of SETAR models were selecting the correct www.mii.lt/NA order of the model and complicated identification of threshold value and delay parameters. Some authors have proposed different ways to avoid these problems. Currently the Akaike information criterion is usually used in practical researches. AIC is defined [9] as the sum of AIC's for the AR models in the two regimes for two-regime SETAR model. This approach is used and in this article. Usage of other criteria can be also found in literature: Bayesian information criterion (BIC) (see [11]), Bootstrap criteria (see [12]). In this work adapted SETAR model was used, which can assess the significance of unobserved time series components. Proposed adapted SETAR model can adequately capture non-linear features and eliminate an impact of "interfering" components.
Important subject in adapted SETAR modelling is choosing an appropriate model from a large set of candidate models. Proposed algorithm for model selection is presented in Fig. 1. For simplicity of presentation, but without loss of generality, the details of proposed algorithm are derived for a two-regime SETAR model in this section. The methodology used is as follows: Steps 1-4. Before considering a series appropriate for modelling, several prior corrections or adjustments may be needed. Most of real time series are affected by sudden unexpected changes, structural variations of the series that can only be observed on very long time periods, fluctuations observed during the year, which repeat themselves on a more or less regular basis from one year to the other, or by other effects, that cannot be explained by the most commonly used time series models. First of all we propose to perform separation of "interfering" time series components for real time series {Y t } = {Y t , t = 1, 2, . . . , N }: There ω t = (ω t,1 , . . . , ω t,n ) denotes n regression or intervention variables, β = (β 1 , . . . , β n ) is a vector of regression coefficients, C t denotes the matrix with columns the calendar effect variables (trading day, Easter effect, leap year effect, holidays), η is a vector of associated coefficients. I t (t j ) -an indicator variable for the possible presence of an outlier at period t j , λ j captures the transmission of the jth outlier effect and ϕ j denotes the coefficient of the outlier in the multiple regression model with k outliers. {U t,i } is an unobserved time series components (seasonal component, trend or cycle), {y t } follows a SETAR process. Parametric [13] or nonparametric methods [14] can be used for seasonal, trend or cycle component detection. Comparative analysis of these methods by using simulated series was done. This analysis showed that both methods are suitable for unobserved components detection. In order not to expand the scope of this article the details of this analysis will not be described here.
Finally, before SETAR modelling, we suggest to test non-linearity for {Y t } and if it is confirmed, then to check regime-switching nonlinearity.
Ramsey regression equation specification error test (RESET) can be chosen for nonlinearity detection. The test is devised for a general form of misspecification. This is executed by estimating the following model: where x is an exploratory variables (autoregression variables in our case), φ, φ 1 , . . . , φ k are parameters. Then testing null hypothesis whether φ 1 , . . . , φ k are zero, by a means of a F-test. If the null-hypothesis that all coefficients of the non-linear terms are zero is rejected, then the model has mis-specification. RESET is popular test for identification of general form of nonlinearity. But it can not answer to question whether this nonlinearity is threshold. Other tests for threshold nonlinearity testing must be chosen. The class SETAR(1) is the class of linear autoregressions. Thus testing for linearity (within the SETAR class of models) is a test of the null hypothesis of SETAR(1) against the alternative of SETAR(m) for some m > 1. Testing linearity against the alternative of a SETAR model is discussed in [15] and [16]. A solution is to use estimates of the SETAR model. F-statistic was proposed for testing of null hypothesis restrictions: here S j is the sum of squared residuals of SETAR(j) model (under the null hypothesis of linearity) and accordingly S k -of SETAR(k). For more details see [16]. Similarly null hypothesis of the SETAR(2) model versus alternative of SETAR(m) for some m > 2 can be tested for right form of non-linearity identification. Scatterplots are also informative tool for identification of nonlinearity and number of regimes.
Steps 5-6. User must to fix maximum model (p 1 , p 2 ) and delay (d) parameters. They can not be larger than N − 1, there N is the modelled time series length and must be such as to allow modelling of sufficient time series observations. Moreover model and delay parameters can acquire only integer values. We recommend to take attention to time series length before fixing delay parameter. Choosing of quite small d is appropriate for insufficiently long time series.
Steps 7-8. Threshold value r must be selected. The set of allowable threshold values r should be such that each regime contains enough observations for the estimator defined above to produce reliable estimates of autoregressive parameters. A popular choice of r is to require that each regime contains at least a fraction π of the observations, that is [ · ] denotes integer part. A safe choice for this fraction appears to be 0.15 [17].
Steps 9-17. These steps allow to locate the threshold value r and delay parameter d selected in the previous step. Threshold value r can vary over a set of possible values while delay parameter d has to remain fixed. Then vice versa -r must be fixed and d can vary. Parameters are identified by calculating Akaike information criterion. (AIC). AIC is used and for a model selection. AIC is defined [9] as the sum of AIC's for the AR models in the two regimes for two-regime SETAR model: thereσ 2 j , j = 1, 2 is the variance of the residuals in the jth regime. AIC must attains its minimum value for selected r and d. Steps 18-19. Once the threshold value and delay parameter are fixed, the SETAR model parameters can be estimated by using standard regression methods, for example ordinary least squares (OLS) method.
Assumption that the means and variances of variables are constant over the time within regime must be done (otherwise OLS method may give misleading inferences). Therefore unit root hypothesis must be tested within regimes before model parameters estimation. It is not necessary to test unit roots for Y t and y t , because unit root can be mistakenly identified in the presence of threshold determined regime switching (see [18]).

Out-of sample forecasting
Estimating of forecasts from nonlinear models is considerably more complicated than estimating from linear models. But there are some possibilities of out-of-sample forecasting of the nonlinear SETAR model: one-step-ahead, multi-step-ahead, the normal forecast error, the Monte Carlo method, a special case of Monte Carlo method -the Skeleton method, the Bootstrap method and others. Comprehensive presentation of these methods and forecasting results require quite a lot of space of this article. So we briefly outline only two forecasting methods: one-step-ahead and multi-step-ahead methods.
One-step-ahead method uses the only actual data for forecasting. It's means that we do not have to re-estimate the model every time when we added new data in the sample series. Denoteŷ as the optimal one-step-ahead forecast. Here F({y t }; α) is nonlinear function which follows SETAR process (1), Ω t is the history of the time series up to observation at time t.
Estimation of more than one period ahead forecast rise some problems, because the linear conditional expectation operator E can not be interchanged with the nonlinear operator F (for more details see [17]).
The optimal h-step-ahead forecast can be obtained aŝ (see [18] Miscellaneous goods and services. The natural logarithms of the original series were analysed. The data were obtained from the databases of statistics Lithuania. Permanent changes took place in Lithuania's economy over the past years. Most of Lithuanian time series have outliers, turning points, structural changes and breaks, significant seasonality. Accordingly, the time series of statistical indicators are complicated both in nature and their evaluation methods. Another problem with Lithuanian time series is that unfortunately they aren't sufficiently long.
Due to these problems, as shown in the model selection algorithm, prior treatment of time series is proposed before the SETAR modelling. Refusing of time series preadjustment can lead to model misspecification, biased parameter estimation. Important pre-adjustments are the outlier correction and the removal of calendar effects.
Most of Lithuanian HCPI time series have significant outliers. All types of outliers (additive, transitory changes, level shifts) were fixed by using specific regression variables (see [14]). Easter effect was significant only for index of Furnishings, household equipment and routine maintenance of the house. A working day effect wasn't significant for all HCPI time series. All HCPI series had significant seasonal component and following by the proposed adapted SETAR model algorithm time series were detrended and deseasonalized by using parametric method (see [13]) in the Step 3.

Empirical model selection results
In order to detect nonlinearities in HCPI, we performed the RESET test. It is composed for the null hypothesis of linearity. It tests whether non-linear combinations of the estimated values help explain the endogenous variable and if non-linear combinations of the explanatory variables have any power in explaining the endogenous variable, then the model is mis-specified. Table 1 reports the results of this test: value of RESET statistics and its p-value, powers (h) of the variables that should be included in the test and lag order under the null hypothesis of linearity. The RESET test has been computed in the modified form. The modified RESET test requires that all the initial regressors enter linearly and up to a certain power h in the auxiliary regression (for details, see [19]). Only results with the most significant RESET statistic values are shown in Table 1.
As we mentioned above, RESET test is devised for general form of misspecification and it can not detect threshold nonlinearity. For this purpose hypothesis of linear model versus SETAR(2) was tested. Value of F-statistic test for linear model versus SETAR (2) are also presented in the Table 1 In order to compare the non-linear adapted SETAR model with a linear model, we choose the ARIMA model to fit the data.
Box-Jenkins approach for building ARIMA models was used. ARIMA model identification was made depending on autocorrelation function (ACF) and partial autocorrelation function (PACF). Models were selected by using Akaike information criterion. AIC was chosen for comparability and because our sample is quite small. Under unstable conditions such as small sample and large noise levels Akaike information criterion outperforms Bayesian information criteria (BIC) (see Acquah H., 2010). Residuals of estimated models were tested for normality, Ljung-Box and Box-Pierce tests were done and ACF/PACF also checked.
Nonlinearity of analysed time series were tested and parameters of SETAR and ARIMA models were estimated using R package.
For model comparison we use the mean absolute percentage error (MAPE) and root mean square prediction error (RMSPE) www.mii.lt/NA here N is the number of observations,ŷ t is the estimated value. The results are presented in Table 2. According to our results shown in the Table 2 below, the preferred adapted SETAR model better fits the data than the preferred ARIMA model for most of HCPI time series of Lithuania. Adapted SETAR model achieved results that are better and for some series, the nonlinearity was not confirmed with a RESET test. Probably it is because RESET test is devised for a generic form of misspecification. For detailed analysis of SETAR type nonlinearities other tests must be chosen.
On purpose not to extend the scope of this article, here we show estimated models graphs only for two HCPI groups -for Recreation and culture group and for Miscellaneous goods and services group. Figures 2, 3 shows evaluated adapted SETAR and ARIMA models for every observation within the sample. The adapted SETAR model more accurately describes real data than the ARIMA model, in the sense that the deviations of estimated values are smaller.
Graphs of other HCPI groups are presented in Appendix C.
Furthermore, analysis of SETAR and ARIMA models errors showed that SETAR model is relatively more stable than the ARIMA model for real time series with complex behaviour. Range of errors of SETAR model is significantly smaller than range of ARIMA model errors. Graphs of errors are presented in Appendix D.

Out-of-sample forecasting results
As we mentioned above, one-step-ahead and multi-step-ahead methods were used for outof -sample forecasting. Data were forecasted for 12 periods (one year) ahead. The results were compared with real monthly HCPI of 2010 year. ARIMA model forecasts were also estimated for comparison. MAPE and RSMPE of out-of-sample forecasting presented in Table 3 The lowest values of MAPE and RMSPE are in bold.
Results in the Table 3 shows that there are no strong differences between the out-ofsample forecasting methods errors, but better ARIMA forecasts obtained only for one time series -Clothing and footwear. MAPE of Alcoholic beverages, tobacco time series forecasts is lower by using ARIMA, but RMSPE is lower by using SETAR multi-stepahead method. Greatest difference is seen in miscellaneous goods and services time series -errors obtained by using SETAR multi-step-ahead methods are more than five times lower then by using ARIMA method. However there is a clear dependence between lowest MAPE (or RMSPE) and F-statistic of test linear model versus SETAR (2). Linear models shows the best results for time series with lowest F-statistic values, SETAR multistep-ahead method is preferable for time series with highest F-statistics values and SETAR one-step-ahead method is valuable for rest time series (with a middle F-statistics values).  In this paper proposed adapted SETAR model for time series which can assess the significance of unobserved time series components and capture nonlinearity of time series simultaneously. Furthermore, algorithm and selection procedure for adapted SETAR modelling has been presented.
Considering the nonlinearities of Lithuanian HCPI time series the adapted SETAR model was proposed for modelling of these time series. Estimated results are compared with a standard ARIMA model. The adapted SETAR model have a good in-sample and out-of-sample fit compared to linear models and performs more accurate modelling results for most of analysed time series. A practical example shows that the proposed algorithm allows for a relatively accurate description of time series with difficult structure. The proposed model is appropriate to use in modelling of real time series with complex behaviour.
A Appendix. Models specification      C Appendix. Evaluated adapted SETAR and ARIMA models          www.mii.lt/NA