Prediction of composite indicators using combined method of extreme learning machine and locally weighted regression

Abstract. In this paper, a method of artificial neural networks (NN) is proposed as an alternative tool for the one-step-ahead prediction of composite indicators (CIs) of Lithuania’s economy. CI is composed of widely used social and economic indicators. The NN is applied for forecasting CI during the financial crisis and later periods (2008–2010) on the basis of data of earlier years (1998– 2007). In this work, the Extreme Learning Machine (ELM) algorithm is combined with locally weighted regression. The analysis shows that the prediction error of a testing sample is statistically smaller compared to Levenberg–Marquardt or ELM methods.


Introduction
Recently more and more artificial indicators have been established in global academic and research fields which reflect the development of a country's economy in various aspects.Composite indicators (CIs), which compare country performance, are a useful tool in setting policy priorities.Together with the widely used statistical indicators such as gross domestic product (GDP), price indices, the unemployment rate, and others, various artificial indicators are presented, e.g.Global Competitiveness Index (World Economic Forum), Index of Economic Freedom (Heritage Foundation and Wall Street Journal), Summary Innovation Index (European Commission), and others.The use of CI's around the world is growing year after year [1].This may be influenced by the fact that more and more researches appear which sustain a proposition that GDP does not always reflect the real development of a country, quality of life, development of technologies, and other [2].Also, when economic conditions are unstable, there is a need to evaluate and analyze additional indexes as separate statistical indicators do not always show the real economic c Vilnius University, 2012 situation.A comprehensive analysis of artificial indices and statistical data may give a more general view on the current situation.
This research work is the further analysis of the construction of the CIs of Lithuania's economic development [3].In the previous paper, a CI which shows the trends in Lithuania's economy was constructed.Different socioeconomic indicators, which are widely used and notably correlate with GDP, were used for the composition of CIs.The dynamics of CIs was compared to the trends in the main economic indicator -GDP.The analysis showed that the CI might be used as an additional tool for the investigation of the country's economic development.The impact of weights on the final results of CIs was studied.
Short-term forecasting of macroeconomic time series is one of the main stages of the analysis of economic trends in the country.Hence, the forecast of such an index (CI) allows us to make an additional economic analysis and earlier to detect possible changes in GDP trends.
Several institutions are publishing official forecasts of various macroeconomic indicators in Lithuania.E.g. at Statistics Lithuania, traditional econometric models are used for the short-term forecasting of Lithuania's GDP.These methods are based on regression analysis, where additional economic indicators -regressors -are used for the prediction of the single value added of a particular activity [4].The Bank of Lithuania regularly does research related to the macroeconomic analysis and forecasts using the structured macroeconomic model.The model-based projections are then updated taking into account expert judgement with respect to structural changes and the latest available information about the forthcoming economic shocks [5].The recently proposed methods for the forecasting of GDP are bridge models (monthly data are used) and dynamic factor models [6].The authors [6] noted that the models that are appropriate for the euro area countries are not always suitable for the new Member States.For Lithuania, for instance, the results obtained in [6] are difficult to interpret.
It should be noted that often researches are modeling economic indicators using linear models [7] which are based on parametric techniques and rather strict conditions on the data distribution.If the conditions are fulfilled, the respective model may be used; otherwise, we cannot be sure of the quality of the statistical results obtained.One can apply the standard time series models to the process if it can be expressed by means of stationary sequences [7].This assumption is likely to be violated during crisis periods.
In the last decades, artificial neural networks (NN) have become one of the most powerful tools which enable the analysis, simulation and prediction trends in large systems in various fields [8].The NN theory is based on Takens' theorem [9], which claims that it is possible to rebuild the dynamics of the process using correct time lags.The main difference, compared to the methods discussed, is that the NN models are nonlinear models which are, by definition, more powerful since they give more possibilities in the choice of the input-output relation [10].The NN have the universal approximation property: under mild conditions on the data, they can fit any data set with an arbitrary high precision, provided that there is a sufficient number of parameters in the model.However, when there are too many parameters (compared to the number of the data available), the overfitting phenomenon appears [10].
Often researches choose the NN tool as an alternative method for the analysis and forecasting of the main economic indicators such as GDP or inflation.In this paper, we refer only to the authors who obtained significant results using the NN in economic fields.
Tkacz (Bank of Canada) proposed the NN which enables forecasting the growth of Canadian GDP.The analysis showed that the NN yield statistically lower errors of forecasts for the year-over-year growth rate of real GDP relative to linear and univariate models [11].McNelis and McAdams (European Central Bank) conducted an investigation into the forecasts of inflation using the NN for the USA, Japan and the euro area.The authors showed that, in particular cases, the errors of forecasts are significantly smaller in comparison with the linear models [12].
Hence, in this paper, the NN methods were chosen for the prediction of CIs according to the theoretical framework and statistically significant results of foreign researches which forecasted different macroeconomic indicators using this approach.The first research concerning the prediction of the growth of Lithuania's economy using the NN was proposed by Jakaitienė and Tamošiūnaitė [13].They analyzed data of quarterly periodicity 1996-2002.The authors suggested to use the NN as an additional tool for the forecasting of GDP growth.Still they recommended that higher accuracy of prediction may be obtained with a larger set of observations (longer time series).
Practically, there are no comprehensive researches concerning the prediction (using the NN) of trends in Lithuania's economy after financial-economic crisis.This can be explained by the fact that most of the official socioeconomic macro data are short time series starting from 1998 [4].For example, if data is of quarterly periodicity and the period of 1998-2010 is analyzed, there are only 52 observations.This fact may complicate the implementation of the NN architecture which usually requires large data sets for fitting.
On the other hand, there are papers published where the NN are used for the analysis of short time series.Zhang and Kline constructed the NN and analyzed the characteristics of various economic time series of quarterly periodicity, where the size of the smallest samples varied from 16 to 28 observations [14].
The objective of the paper is to analyze the one-step-ahead predictions of the CI of Lithuania's economy using NN methods.The essential questions were highlighted: The novelty of the research is the proposed NN method: we put together the algorithm of the ELM (for the training of NN) and the locally weighted regression (for prediction).To our knowledge there is no comprehensive studies analyzing the prediction accuracy of macroeconomic indicators by using this combined algorithm.Also in this paper we discuss the question concerning the small samples problem as NN are often used for the large samples.
The structure of the paper is as follows.The methodology of the construction of the CI is described in Section 2. Section 3 presents methods chosen for the training and testing of NN.Section 4 gives the process of the implementation and characteristics of the models of NN.A practical case concerning the prediction of CI using the NN is described in Section 5. Section 6 gives concluding remarks.

Construction of the CI
In this section, the process of construction of the CI is described: the selection of preliminary data, data disaggregation, the selection of weights, and the aggregation of the CI.
We have defined the CI as an additional tool for country's economic analysis.The CI is defined as follows where t = 1, . . ., T is time, In this research we have used the methodology of construction the CI of Lithuania's economy proposed in [3].We will shortly describe the main features of the methodology in order to present the theoretical background.In this paper, we have extended the period of analysis (1998-2010) and constructed the artificial index with higher frequency, that is, monthly periodicity (the previous one was of quarterly periodicity, the period of 1998-2009).
We have selected the data set X for the construction of the CI of monthly periodicity.Statistical indicators (the same variables as in the mentioned paper, m = 28) are from the following fields/subfields: population and social statistics (2 indicators), industry (4 ind.), construction (4 ind.), domestic trade (6 ind.), foreign trade (4 ind.), services (3 ind.), price indexes (5 ind.).
The process of the construction of the CI can be described in the following steps.

Data disaggregation
Some statistical indicators X i , such as export or industrial production, are officially published on a monthly and quarterly basis.But we also use time series (construction, services data) which do not have statistically fixed information of monthly periodicity (available only on a quarterly basis).Here we have to solve a mathematical problem concerning the data disaggregation from low frequency (LF) (with quarterly periodicity), to higher frequency (HF) (with monthly periodicity).Two methods of time series disaggregation from LF to HF were chosen: (i) The method of Barbone, Bodo and Visco [15] was used when there was a variable which strongly correlated with the series analyzed.The residuals of the regression model are described using an autoregressive model AR(1), and the parameters are evaluated using the minimization of the sum of squared residuals.(ii) Denton method [16] was used when there was no variable with similar trends and structure.It disaggregates the time series using only its characteristics.

Preliminary analysis of the data
Economic time series are often affected by seasonal fluctuations, which may be observed each year at roughly the same time (in our case, this is the same month).Such statistical data may give misleading information on the real dynamics of the time series.Therefore seasonal adjustment (TRAMO/SEATS method [17]) was applied to the all time series.
The data corrected for outliers (additive outliers, level shifts and transitory changes): the number of outliers of every time series should not exceed 5 per cent.Then the all selected indicators X i were standardized (mean µ i = 0 and variance σ 2 i = 1).Exactly these indicators are used in (1).

Selection of weights and aggregation
In the previous work indicators' weights ϕ were evaluated using factor analysis methods [18].We chose the same set of ϕ that had been evaluated using the factor rotation method Varimax [19] and improved using the Nikoletti [20] method.In general, this method gives the highest weights to those indicators that strongly correlate with the corresponding factor.It is supposed that the chosen weights ϕ in (1) do not depend on the time t.

Methods of training and testing of the NN
In this section, we describe the methods that are used for training and testing the NN and the one-step-ahead prediction of the CI: Levenberg-Marquardt, Extreme Learning Machine (ELM) and the proposed method combined of ELM and locally weighted regression.
Let us describe the approximation problem of the single hidden layer of the feedforward NN (SLFNs).For T arbitrary distinct samples (z t , y t ), t = 1, . . ., T , where z t = [z t1 , . . ., z tn ] ∈ R n and y t = [y t1 , . . ., y tk ] ∈ R k , standard SLFNs with N hidden neurons and activation functions are mathematically modeled as where w = [w 1 , . . ., w n ] is the vector of weights connecting the th hidden neuron and the input neurons, α = [α 1 , . . ., α k ] is the vector of weights connecting the th hidden neuron and the output neurons, and b is the threshold of the th hidden neuron.
T is the number of training samples and g • : R → R k is a basic activation function.Symbol denotes the matrix transposition.

Classical training method for backpropagation NN
For the implementation of the SLFNs, standard methods were chosen.For the training of SLFNs we chose Levenberg-Marquardt [21] method.It is based on gradient methods and has some disadvantages, e.g. the procedure is time consuming, there exists problem of local minimum, and other [22].The weights and the value of the bias of the layer were calculated using Nguyen-Widrow initialization method [23].The chosen restriction on the training process was the number of epochs equal to 300.

Extreme learning machine
ELM is a progressive learning algorithm of the SLFNs, which randomly chooses the input weights and analytically determines the output weights of SLFNs [22,24].Differently from the traditional learning algorithms, this algorithm tends to reach not only the smallest training error but also the smallest norm of weights (among all the least-squares solutions).Bartlett's theory [25] on the generalization performance of the feedforward NN states that "the size of the weights is more important than the size of the network".In theory, this algorithm tends to provide the best generalization performance and minimum training error at an extremely fast learning speed.

Combined method of ELM and LW regression
The locally weighted (LW) regression is described in [26][27][28][29].In this paper we propose the method that puts together the ELM method and LW regression.This type of NN, when it is used together with the LW regression, is known as locally weighted NN.The advantage of the locally weighted NN is that closer attention is paid to the regression curve rather than to coefficients, i.e. whether the regression curve accurately replicates the real data.The weakest place of the weighed NN is large time needed to calculate the optimal parameters.However, this is compensated by the accuracy of the method, compared with standard NN.
The algorithm of the proposed method can be divided into several stages.First, the sample is divided into training and testing samples: Ω 1 = {(z t , y t ), t = 1, . . ., T 1 }, T 1 < T , is the training sample and Ω 2 = {(z t , y t ), t = T 1 + 1, . . ., T } is the testing sample.
Stage 2. The estimates ( w , b ), = 1, . . ., N of the coefficients (w , b ), = 1, . . ., N of the neurons of the hidden layer (evaluated at the first stage) are kept fixed and, for each new observation (the observation which does not belong to the training samples), only the weights α of the neurons of output layer are recalculated ( = 1, . . ., N ).Thus, during the process, only local information is used.
More precisely, for given query data z τ , T 1 + 1 ≤ τ ≤ T, (τ is the time of the query) the new estimates α are found using the LW regression: Here σ is a bandwidth and L < T 1 is a size of sliding window.
Stage 3. The new estimates α , = 1, . . ., N , obtained in the previous stage by solving ( 4) are used in ( 5) for the prediction of y q :

Process of the NN implementation
In this section, we will describe the process of the NN implementation and the main its characteristics: the transformation of data, the architecture of NN, measures of the prediction of accuracy.
According to [12], there is no a priori way to determine which scaling function is the most suitable for the NN.This depends on the characteristics of the data.A reasonable strategy is to estimate the model with different types of scaling functions and then to find out which one gives the best performance.Two linear scaling transformations, f v , v = 1, 2, are considered.
All models which were applied to the prediction of CI, have the same architecture of NN with the following features.The SLFNs model has one hidden layer.The number of nodes N of the hidden layer was chosen 5, 10, 15, 20.The number of inputs was chosen m = 28.The sigmoid (logistic) function was employed as the basic activation function g • , (g The following methods for training the SLFNs in sample Ω 1 and prediction for sample Ω 2 were used: M 1 : The Levenberg-Marquardt method was applied to training; the evaluated characteristics were used for prediction.M 2 : The ELM method was applied to training; the evaluated characteristics were used for prediction.M 3 : The combined method: the ELM method was used for generating the NN coefficients; the values of predictions for the each query z τ were evaluated online using the LW regression (4).The size of sliding window was L = 35, the bandwidth σ = 1.
The experiment can be represented by where CI is the predicted CI, X is the selected data set.
The technique of ensemble averaging [8] was applied when a number of different NN share a common input, and their individual outputs are combined to produce the overall output value.
This technique is used when there are K different models of NN which produce a similar error, and in this case it is difficult to choose the best model.The output value of the ensemble (of the tth data point) is obtained by using a simple average of all individual outputs.The mean square error of the ensemble (of the testing sample) is not greater than the error of the separate model of the NN [8].
In our case, the prediction of the ensemble yields a better solution compared to the predictions of separate NN.We chose the size of the ensemble K = 50: the experiment was performed 50 times using the different methods (M ∈ {M 1 , M 2 , M 3 }) and the different transformations (v = 1, 2).
The following measures of the NN prediction accuracy were used: (i) The root mean square error: , (ii) The mean absolute percentage error: The positive predictive value, in per cent: (iv) The estimates of mean E(ε) and variance Var(ε) of the errors The accuracy of training was tested using only RMSE.
Beyond the analysis of the errors, we investigated the phenomenon of the random walk (RW).In the case of RW, it is possible to use simpler methods than the models of NN for the forecasting of the series.There are papers that show that the one-step-ahead forecasts y(t) (obtained using the NN)of the economic series behave like a RW process (see, e.g.[30]).It means that y(t+1) = y(t)+ (t) where t is time, { (t)} are uncorrelated and homoscedastic random increments with zero mean.R. Site and J. Site [30] analyzed Standard & Poor 500 financial time series using the Time Delay NN [31,32] and Elman recurrent NN [33].The authors compared RW prediction errors, NN prediction errors, and other.They found that the NN predicts like an RW process for all parameters they tried.
We applied the standard Student's t-test to test whether the predicted process is the RW.The following increments were considered: (i) the estimated increments by applying the NN: CI(t + 1) − CI(t), (ii) the actual increments: CI(t + 1) − CI(t).

Analysis of the results
The periods of training and prediction were 1998-2007 (T 1 = 120) and 2008-2010 (T − T 1 = 36), respectively.Figure 1 shows that the dynamics of training data totally differs from the trends in the data during then financial crisis (the period from the second half of 2008 until the start of 2010).The main attention was given to the measures of prediction accuracy for the period 2008-2010.The crucial question is how efficiently the NN can be applied in crisis periods to predict an unexpected decline in the economy and its recovery signs.From the huge number of the results obtained from different models (6), we will present only the main and the most important ones (Tables 1, 2).
A short explanation of the content of the tables is as follows.The M 1 , M 2 and M 3 denote the method of training and testing, the numbers in the brackets refer to the number of neurons of the hidden layer.The RMSE of training data is marked "_train" otherwise the accuracy measures are applied to the predicted data.In the tables descriptive statistics (average; maximum and minimum denoted, respectively, by "_max" and "_min") of 50 repetitions of the each model in (6) are presented.The RMSE_2008 is the error of the prediction for the 2008.
When comparing the transformation f 1 versus f 2 , the former has smaller RMSE for both training and testing sets.The opposite conclusion holds only for the PPV of the classical model (M 1 (5)).
The classical method M 1 gives the largest errors of RMSE; the MAPE is greater than 20 per cent, while M 3 (20) is only 3.66 per cent (Table 1).Concerning the methods of training and testing, the combined method of ELM and LW regression (M 3 ) distinguishes for its smallest RMSE (both of training and prediction) and MAPE values.The PPV of the combined method is similar to that of the ELM (M2).The RMSE of model M 3 (20) is 0.053 in the year 2008 and 0.027 in the next year when the decline in the economy has slowed down (Table 1).This can be explained by the fact that during earlier periods we cannot identify such decline in the economy; therefore, it is impossible to train the NN to determine a sudden shift.The RMSE of two models M 3 (15) and M 3 (20) are shown on Fig. 2 (the number of the experiment is n * ).It can be seen that only two values of the model M 3 (20) (the numbers of experiments are n * = 1, 9) are larger than 0.04.The mean values of errors ε for models M 1 (5), M 2 (20) and M 3 (20) are presented in Fig. 3. Table 1 gives the values of mean estimates.Models M 1 and M 2 produce a similar mean value of errors though the estimated variance Var(ε) is larger compared for the model M 3 ( Var(ε) = 0.001).In summary, the experiment showed that the proposed method of NN gives the smallest values of RMSE and MAPE and the largest value of PPV when the hidden layer has 20 nodes and the transformation f 1 is applied.
The analysis of the predictions of the model M 3 (20) concerning the RW process produced the following results (p-values of 2-tailed t-test): the sequence of predictions CI(t), t = 1, . . ., T , is not an RW process (with p-value = 0.007; the actual CI (1998-2010) is not an RW process (p-value = 0.027), but some parts of the index are RW processes are; e.g. the process of the first 120 elements is not RW (p-value = 0.000), the process of the first 140 elements is RW (p-value = 0.104), the process of the last 120 elements is not RW (p-value = 0.022).
The accuracy of the NN predictions was compared with that of standard AR(p) models [7].The same set of the indicators was chosen with the linear scale transformation f 1 and applied the AR(p) models to each of them.The values of CI were obtained by aggregating the series of the AR(p) predictions and the same measures of accuracy were evaluated (Table 1).The best value of the RMSE of AR(p) was 0.091 when p = 1 and then increased with the order p. Therefore we restricted the further analysis to the cases p ≤ 4. The PPV did not change much and was approximately equal to 64.7 per cent for p = 1, 2, 3 and 61.8 per cent for p = 4.When increasing the model order p the respective value of MAPE was not improved and was around 7 per cent.In all the considered cases the measures of accuracy for AR(p) were worse compared to corresponding measures of the proposed method and similarly to the models M 1 or M 2 (Table 1).

Conclusions
In this paper, the problem of forecasting an artificial index CI of Lithuania's economy proposed in [3] during the crisis and later period 2008-2010 is investigated.The locally weighted NN, where the extreme learning machine (ELM) method is used for training and the locally weighted (LW) regression is used for the prediction, is proposed as a possible solution.This type of the NN has an advantage since more attention is paid to the linear parametric part of the NN thereby increasing the efficiency of the method.
The empirical analysis showed that the error of the CI predictions obtained by the proposed method is statistically smaller compared with the Levenberg-Marquardt and ELM methods as well as with standard AR(p) models.It was found that for the considered data the best prediction accuracy yields the linear scaling transformation to the interval [0, 1] and the NN with 20 hidden neurons.
Analysis of the results based on various accuracy measures (RMSE, MAPE and PPV, see Section 4) suggests that the proposed method may be used for data of rather small sample size and during periods when the dynamics of time series may have unexpected changes.
For forecasting economic changes and prediction of the Gross domestic product in periods of economic instability additional carefully validated synthetic indexes of economy are necessary.
(i) To analyze the accuracy of predictions using different data transformation methods at the initial stage and the NN architecture (the number of neurons of a hidden layer).(ii) To analyze errors using different methods of training and prediction of the NN.(iii) To verify whether the combined NN method proposed increases the CI forecasting accuracy.

Table 1 .
Measures of prediction accuracy for the linear scale transformation f1.

Table 2 .
Measures of prediction accuracy for the linear scale transformation f2.