CONFIDENCE INTERVALS IN OFFICIAL STATISTICS : THE CASE OF LITHUANIA Andrius Č iginas

Abstract. In the paper, the application of confidence intervals in the surveys of official statistics is discussed. It is noticed that there are situations where at the first sight natural normal distribution-based confidence intervals are not suitable. We demonstrate it by examples taken from Lithuanian statistical surveys. We also discuss an Edgeworth expansion and a bootstrap method as an alternative to the normal approximation.


Introduction
A very common parameter estimated, for example, in Lithuanian surveys of official statistics is the sum of measurements of a certain variable of interest in a finite population of enterprises or individuals.A typical sample design used to form a sample is stratified simple random sampling, where, e.g. in cases of surveys of enterprises, the strata are naturally formed by economic activity and by sizes of enterprises.The next thing which is always important in the estimation process and which must be controlled is the quality of estimates.There are several very common ways to present it.The first way is the estimate of the coefficient of variation (or variance), the second one, which is somewhat more informative, is the estimate of the confidence interval.In this paper, we discuss the use of the latter one.
The application of the traditional normal confidence interval is based on the assumption that the estimator θ ˆ has a normal or approximately normal distribution.Here ) (θ Var is the variance of θ ˆ and α z denotes α -quantile of the standard normal distribution.In case of large-scale surveys (where sample sizes are large), the normality assumption is quite natural because, theoretically, many estimators (i.e.not only estimators of sums) are asymptotically normal under quite mild assumptions on the population.In particular, for simple random samples without replacement (one-stratum case) the central limit theorem, in the case of the sample mean, was proved in [5].For the case of classical linear combinations of stratum means (sums) in stratified sampling: we refer to e.g.[

Numerical examples
Example 1.We illustrate the use of (1) in the following numerical example.Let the population of interest be two strata of medium-size enterprises from the survey of construction.The sizes of the strata are  It is seen from Fig. 3 and Fig. 4 that the population values of investment in tangible fixed assets are much more asymmetric compared to the values of the first population.Recall (e.g. from [5]) a Lindeberg-type Erdős-Rényi condition, which also means that the survey population should not be very asymmetric.

Edgeworth and bootstrap approximations
The two well-known second-order approximations to (3) are Edgeworth expansions and bootstrap approximations.
In particular, the one-term (short) Edgeworth expansion is of the form where ) (x Φ and ) (x denote the distribution and density of the standard normal random variable, and λ is, in fact, (an approximation to) the standardized third cumulant of the estimator of interest.The Edgeworth correction term, added to the normal approximation in (4), reflects the skewness of the distribution of the estimator and thus an asymmetry of the population.In the case of the estimator ( 2), the one-term Edgeworth expansion was studied in [1]; see Confidence intervals in official statistics: the case of Lithuania also [4].It is important to note that the theory in [1] (and [4]) also holds for other common estimators: ratio and regression estimators in stratified samples.It is well known that typically , while in the case of the short Edgeworth expansion (4), in many situations, holds.Clearly, the parameter λ is an unknown characteristic of the population and should be estimated.Then, with the consistent estimate λ ˆ of λ , (5) holds in probability.
The second universal method is the bootstrap.There are several bootstrap variants considered in literature for the case of samples without replacement.Some of these methods are reviewed in [4].In the present paper, we discuss one method, which is proposed in [4].The practical realization of this method is the following.For each   It is seen that, applying a bootstrap approximation, there is a very small risk to get worse estimates of quantiles than the normal quantiles.Note that, by the obtained results, the corresponding bootstrap confidence interval

..
We aim to estimate the population sum of the number of employees (here we use the data for the fourth quarter of 2011).Since the values of this study variable are known for Confidence intervals in official statistics: the case of Lithuania all units of the population (from administrative data sources), we evaluate the distribution of the estimator (2) and compare it with the normal distribution.Specifically, we estimate the distribution Monte Carlo method, i.e. by drawing independently C stratified samples from the population and creating the empirical distribution from the standardized observations It is seen from Fig.1, and the normality tests show that, in this case, the distribution (3) is close to the standard normal.Thus, the use of α z in (1) has a background.

Fig. 1 .Fig. 2 .
Fig. 1.Distribution of the estimator in the construction survey Example 2. To show a different situation, the data are taken from the survey on investment.The parameter of interest is the population sum of investment in tangible fixed assets.We form an artificial population from the sample data of several strata (data for the fourth quarter of 2011).The size of the whole new population is 665 = N , and the total sample size is chosen to be 200 = n .We evaluate the distribution (3) of the estimator (2) in the same way as in the previous example, and now results are different: the normality tests and Fig. 2 show that the distribution is not close to the standard normal.Next, the evaluated quantiles 2 / α q

Fig. 3 .Fig. 4 .
Fig. 3. Number of employees in the population of the construction survey it also captures the skewness of the distribution of the estimator.Example 2 (continued).We apply the bootstrap approximation) see how it works, we draw 80 stratified samples from the population and, for each of them, we calculate ) order to see the efficiency of the bootstrap method, we present histograms for both quantiles (see Fig.5).

Fig. 5 .
Fig. 5. Normal and the evaluated ) (x F quantiles and histograms of bootstrap quantiles Υ .Then the union of the empirical strata is one empirical population.Next, in the same way as in the evaluation of the distribution (3) by Monte Carlo method, we draw R samples (called resamples) from the empirical population and calculate standardized values of the estimator of interest.Denote them by