Geodesic distances in the maximum likelihood estimator of intrinsic dimensionality

While analyzing multidimensional data, we often have to reduce their dimensionality so that to preserve as much information on the analyzed data set as possible. To this end, it is reasonable to find out the intrinsic dimensionality of the data. In this paper, two techniques for the intrinsic dimensionality are analyzed and compared, i.e., the maximum likelihood estimator (MLE) and ISOMAP method. We also propose the way how to get good estimates of the intrinsic dimensionality by the MLE method.


Introduction
In the exploratory data analysis, we often confront with real-life data that are of a very high-dimensionality.However, these data are usually not truly high-dimensional, i.e., they are only embedded in a high-dimensional space, but can be efficiently summarized in a space of much lower dimensionality, such as a nonlinear manifold.It means that these data points locate on some manifold of lower dimensionality or they are close to that manifold.Recently, there has been a surge of interest in manifold learning methods (locally linear embedding (LLE) [1,2], ISOMAP [3], Laplacian eigenmaps (LE) [4], Hessian LLE (HLLE) [5], local tangent space analysis (LTSA) [6], and others [7]), which focus on finding a nonlinear low-dimensional projection of manifold-type high-dimensional data.The manifold learning methods require at least two parameters to be determined: the intrinsic dimensionality d of the high-dimensional data and the number k of the nearest neighbours.Improper values of these parameters greatly influence the results.The ways to select the value of the parameter k are proposed in [8,9].The dimensionality of the projection is a key parameter for manifold learning methods.On one hand, a large c Vilnius University, 2011 value chosen of the intrinsic dimensionality d amplifies noise effects, while a low value leads to overlaps in mapping results (excessively reduced) [10].It is noted in [11] that if the dimensionality d is too small, important data features are "collapsed" onto the same dimensionality, and if the dimensionality is too large, the projections become noisy and, in some cases, unstable.Therefore, the problem is to disclose the exact dimensionality of that manifold, i.e., the intrinsic dimensionality d of the analyzed data.
At first, the term of manifold needs to be defined.A manifold is an abstract topological mathematical space, in which the area of each point is similar to the Euclidean space, however the global structure of a manifold is more complex.A line and a circle are onedimensional manifolds.A plane and the surface of a ball, a torus are two-dimensional manifolds, etc.The area of each point on the one-dimensional manifold is similar to a line segment.The area of each point on the two-dimensional manifold is similar to a plane segment.
The intrinsic dimensionality of a data set is usually defined as the minimal number of parameters or latent variables necessary to describe the data [7].Latent variables are still often called as degrees of freedom of a data set [3,7].Let the dimensionality of the analyzed data be n.High-dimensional data sets can have meaningful low-dimensional structures hidden in the observation space, i.e., the data are of a low intrinsic dimensionality d n.Principal component analysis (PCA) is the most-known dimensionality reduction method that integrates an estimator of the intrinsic dimensionality.However, the model of PCA is linear, meaning that the estimator works only for manifolds containing linear dependencies (i.e., linear subspaces).For more complex manifols, PCA gives at best an estimate of the global dimensionality of an object [7].
Due to the increased interest in dimensionality reduction and manifold learning, several approaches have been proposed in order to estimate the intrinsic dimensionality of a data set X in the last decade [11][12][13][14][15][16].Techniques for intrinsic dimensionality estimation can be subdivided into two main groups: estimators based on the analysis of local properties of the data and estimators based on the analysis of global properties of the data.
Six techniques for intrinsic dimensionality estimation are overlooked in [17].Local intrinsic dimensionality estimators are based on the observation that the number of data points, covered by a hypersphere around a data point with radius r, grows proportional to r d , where d is the intrinsic dimensionality of the data manifold around that data point.As a result, the intrinsic dimensionality d can be estimated by measuring the number of data points, covered by a hypersphere with a growing radius r.Three local estimators for intrinsic dimensionality -the correlation dimension estimator, the nearest neighbour dimension estimator, and the maximum likelihood estimator -are described in short in [17].Whereas local estimators for intrinsic dimensionality compute the average over local estimates of intrinsic dimensionality, global estimators consider the data as a whole when estimating the intrinsic dimensionality.Van der Maaten (2007) overlooks these global intrinsic dimensionality estimators: the eigenvalue-based estimator, the packing number estimator, and the geodesic minimum spanning tree estimator.
In [18], the maximum likelihood estimator of the intrinsic dimensionality is applied to the real problem, i.e., to the issue of determining the number of pure components in a mixture from Raman spectroscopy data.Authors show how the estimate of the intrinsic dimensionality corresponds to the number of pure components.Having an accurate estimate of the number of pure components, it saves time in component extraction and etc.Other possible application is given in Section 4 to find the number of degrees of freedom of motion of the object in a set of photographs.
In this paper, we also analyze the maximum likelihood estimator (MLE) and explore, which distances -Euclidean or geodesic -must be evaluated between data points in the MLE algorithm in order to get the true estimate of the intrinsic dimensionality.One of the nonlinear manifold learning methods, i.e., ISOMAP, is analysed as well, because of its ability to find out the intrinsic dimensionality of data.Disadvantages of this method in estimating the intrinsic dimensionality are disclosed.

The maximum likelihood estimator of intrinsic dimensionality
The maximum likelihood estimator [11] belongs to the class of the local estimators for intrinsic dimensionality.The detailed algorithm of MLE is provided in [11].In this paper, only the idea is suggested.
Let the analyzed data consist of m n-dimensional points X i = (x i1 , . . ., x in ), i = 1, . . ., m (X i ∈ R n ).MLE finds the intrinsic dimensionality d MLE of the data set X.According to the authors, in practice, it is more convenient to fix the number of neighbours k rather than the radius r of the hypersphere.Therefore, in this paper, we provide an algorithm that is related with the number of the nearest neighbours.
The MLE algorithm [11] has two control parameters: k 1 and k 2 (k 1 < k 2 ) -the numbers of the nearest neighbours for each data point.The values of these parameters are chosen.The algorithm has the following steps: 1.The number k 2 of the nearest neighbours for each data point X i is found.
2. The estimate of the intrinsic dimensionality (d MLE ) is calculated by the maximum likelihood estimator (MLE) according to the formula: where Here d(X i, X ij ) is the Euclidean distance from the point X i to the j-th nearest neighbour X ij , i.e., it represents the radius of the smallest hypersphere with the centre X i that covers j neighbouring datapoints.In [11], it is shown that one could divide by k − 2 rather than k − 1 to make the estimator asymptotically unbiased.
It is clear from equation ( 3) that the estimate depends on the parameter k as well as on the point X i .Levina and Bickel (2004) assume that all the data points come from the same manifold, and therefore they average the estimated dimensions over all observations (m is the number of data points) (2).According to the authors, the choice of k clearly affects the estimate.In general, for MLE to work well, the hypersphere should be small and simultaneously contain enough points.Levina and Bickel choose the value of the parameter k automatically: in some heuristic way they simply average over a range of small to moderate values k = k 1 , . . ., k 2 to get the final estimate (1).According to experimental investigations, Levina and Bickel recommend the values of k 1 = 10 and k 2 = 20.However, these estimates are valid for some fixed data sets only.
Since it is not known how to choose the values of the parameters k 1 and k 2 in general case, in this paper, by analyzing the MLE algorithm, we use only one control parameter k, i.e., the number of the nearest neighbours for each data point, instead of two control parameters k 1 and k 2 .The MLE algorithm is explored by evaluating two types of distances: Euclidean and geodesic.In both cases, the values d k (2) of MLE are calculated with different values k of the nearest neighbours.In such a way, dependences of the estimate of intrinsic dimensionality of the data on the number k of the nearest neighbours are obtained.We choose such a value d k of MLE that is stable in a long interval of k.Levina et al. (2007) suggest to select the value of k equal to 20 on the basis of a dataset with known number of pure components in a mixture from Raman spectroscopy data.
A data set of uncoloured pictures of a rotated duckling [19] (samples of pictures are shown in Fig. 1(f)).The data are comprised of uncoloured pictures of the same object (a duckling), obtained by gradually rotated a duckling at the 360 • angle.Each picture is digitized, i.e., a data point is a vector that consists of colour parameters of pixels, and, therefore, it is of a very large dimensionality.The number of pictures (data points) is m = 72.The images have 128 × 128 greyscale pixels, therefore the dimensionality of points, characterizing each picture in a multidimensional space, is n = 16384.
A data set of coloured pictures of a rotated cup [19] (samples of pictures are shown in Fig. 1(g)).The data are comprised of coloured pictures of the same object (a cup), obtained by gradually rotated a cup at the 180 • angle.Each picture is digitized, i.e., a data point is a vector that consists of colour parameters of pixels, and, therefore, it is of a very large dimensionality.The number of pictures (data points) is m = 35.The images have 128 × 128 colour pixels, therefore the dimensionality of points, characterizing each picture in a multidimensional space, is n = 49152.
A data set of photos of a person's face [3] (example images are shown in Fig. 1(h)).The data consist of many photos of a person's face observed in different poses (left-andright pose, up-and-down pose) and lighting conditions, in no particular order.Each picture is digitized, i.e., a data point is a vector that consists of colour parameters of pixels, and, therefore, it is of a very large dimensionality.The number of photos (data points) is m = 698.The images have 64 × 64 colour pixels, therefore the dimensionality of points that characterize each photo in a multidimensional space is n = 4096.

Experimental exploration of MLE
In this section, the MLE method is explored experimentally, while Euclidean or geodesic distances are evaluated between data points.For brevity, we denote the MLE method as MLEe, if Euclidean distances are used, and MLEg, if geodesic distances are used.The estimates of the intrinsic dimensionality, obtained by MLEe and MLEg, are denoted as d * MLEe and d * MLEg , respectively.The geodesic distance is the length of the shortest path between two points along the surface of a manifold.Here the Euclidean distances are used when calculating the length of the shortest path.In order to compute the geodesic distances between the points X 1 , X 2 , . . ., X m , it is necessary to set some number of the nearest points (neighbours) of each point X i on the manifold.The search of the neighbours of each point X i can be organized in two ways: (1) by the fixed number k geod of the nearest points from X i , (2) by all the points within some fixed radius of a hypershere, the center of which is the point X i .When the neighbours are derived, a weighted graph over the points is constructed: each point X i is connected with its neighbours; the weights of edges are Euclidean distances between the point X i and its neighbours.Using one of the algorithms for the shortest path distance in the graph, the shortest path lengths between the pairs of all points are computed.These lengths are estimates of the geodesic distances between the points.
The first investigation is performed with the points of the 2-dimensional S-shaped and 8-shaped manifolds and with the points of the 1-dimensional manifolds: a circle and a semicircle.The estimates of the intrinsic dimensionality d of the data were calculated by MLE with various values of the control parameter k, k ∈ [3, 100].After applying both variants of MLE, the true results are obtained, i.e., d * MLEe = d * MLEg = 2 for all k in the case of 2-dimensional manifolds (k geod = 5, k geod is the number of the nearest neighbours chosen when geodesic distances are calculated), and d * MLEe = d * MLEg = 1 for all k in the case of 1-dimensional manifolds (k geod = 2).
However, after investigating such manifolds as a helicoid (Fig. 2), a helix (Fig. 3) and a spiral (Fig. 4), it became clear, that MLEe provides wrong results with many values of the parameter k.Meanwhile, in the case of MLEg (k geod = 5 in the case of the helicoid, and k geod = 2 in the case of the helix and the spiral), the true results are obtained with k ∈ [5,200].
An advantage of MLEg over MLEe became also evident while investigating the highdimensional data, obtained after digitizing real pictures (uncoloured pictures of a rotated duckling, coloured pictures of a rotated cup, and photos of a person's face observed in different poses) (Figs.5-7).The intrinsic dimensionality of these data, obtained by MLEg (k geod = 2 in the case of pictures of a rotated duckling, k geod = 3 in the case of pictures of a rotated cup, k geod = 5 in the case of photos of a person's face), is equal to the number of degrees of freedom of a possible motion of the object observed.Since a duckling or a cup were gradually rotated at a certain angle in the same plane, i.e., without turning the object itself, these data have only one degree of freedom, i.e., the intrinsic dimensionality of these data is equal to 1.A person's face analyzed in [3] has two directions of motion (two poses): left-and-right pose and up-and-down pose.Therefore, the high-dimensional data corresponding to these pictures have two degrees of freedom, i.e., the intrinsic dimensionality of these data is equal to 2. However, the intrinsic dimensionality of these data, obtained by MLEe, is not equal to the number of degrees of freedom of a possible motion of the observed object.Thus, MLEe fails in indentifying the true intrinsic dimensionality.

Isometric feature mapping (ISOMAP)
ISOMAP can be assigned to the group of multidimensional scaling methods.This method is designed for dimensionality reduction as well as for visualization of multidimensional data [3].Using ISOMAP, an assumption that the points of the initial space are located on a lower-dimensional manifold is made.Geodesic distances are used as a measure of proximity between the points analyzed in the ISOMAP.In order to compute the geodesic distances between the points X 1 , X 2 , . . ., X m , it is necessary to set some number of the nearest points (neighbours) of each point X i on the manifold.In the further experiments, we use the fixed number k geod of the nearest points from X i .
The ISOMAP algorithm can be generalized as follows: 1.The neighbours of each point are derived in the input multidimensional space.
2. The geodesic distance between the pairs of all the points are computed; a dissimilarity matrix is formed.
3. The projection of multidimensional points to a lower-dimensional space is obtained by multidimensional scaling.
Since ISOMAP is designed to analyse manifold-type high-dimensional data, it was selected to investigate the intrinsic dimensionality of the data as one of the MDS-type methods.
In [3], it is provided the possibility to determine the dimensionality of the manifold, i.e., the intrinsic dimensionality d of the initial data by using ISOMAP.ISOMAP provides error curve that can be "eyeballed" to estimate dimensionality [11].The interval of the projection space d * is chosen for data projection, for example, d * = 1, . . ., 10, and the residual variance [3] is computed for each d * .
This quantitative measure illustrates how well the distance information is preserved.It is defined as DxDy , where ρ DxDy is the standard linear correlation coefficient, taken over all the entries of D x and D y , where D y is the matrix of Euclidean distances between the pairs of points in the low-dimensional space, and D x is the matrix of geodesic distances between the pairs of points (the graph distance matrix) in the high-dimensional space, respectively.The lower the residual variance, the better the high-dimensional data are represented in the embedded space.
However, the ISOMAP method has some drawbacks [7].When the manifold to be embedded is not developable, ISOMAP yields disappointing results.In this case, the guarantee of determining a global optimum does not really matter, since actually the model and its associated error function are not appropriate anymore.Another problem encountered when running ISOMAP is the practical computation of the geodesic distances.The approximations given by the graph distances may be very rough, and their quality depends on both the data (number of points, noise) and the method parameter k geod .Badly chosen value of this parameter may totally jeopardize the quality of the dimensionality reduction (data projection as well as the intrinsic dimensionality).

Experimental exploration of ISOMAP to estimate the intrinsic dimensionality
The first investigation is performed with the points of the 2-dimensional S-shaped manifold.The ISOMAP method has been run for 10 times by selecting a different dimensionality of spaces for data projection, i.e., d * = 1, . . ., 10.Each time, after obtaining data projections in a space of lower dimensionality, the residual variance was calculated.The dependence of the residual variance on the projection space d * has been obtained.The lower the value of the residual variance, the more precise projections of multidimensional data, i.e., geodesic distances between multidimensional data points and their projections, have been preserved.We can see in Fig. 8, that in the case of the S-shaped manifold, the value of the residual variance decreases a great deal, if d * = 2.Although small values of the residual variance (almost zero) are obtained, if d * ≥ 2, but only the first value of the interval d * ∈ [2, 10] is taken as the intrinsic dimensionality in [3].Thus, the most precise data projections are obtained if data are transferred to a 2-dimensional space, i.e., the intrinsic dimensionality of these data is equal to 2, which is the truth.The analogical investigation is performed with the points of the 2-dimensional manifolds: 8-shaped manifold (Fig. 9) and helicoid (Fig. 10).In both cases, the values of the residual variance considerably decrease, if d * = 2.However, the relative value of the residual variance with d * = 2 and d * = 3 decreases up to 65,8% in the case of the helicoid and by 47,5% in the case of the 8-shaped manifold.Thus a question arises, which projection space and the intrinsic dimensionality of the data thereby, is more suitable?It cannot be specified strictly by ISOMAP.After investigating a spiral, it has been noticed, that the ISOMAP method (k geod = 2) does not define the intrinsic dimensionality d of these data (the best projection space) at all, because the value of the residual variance is equal to 0 with all d * = 1, . . ., 10.It means that any integer number from the interval [1,10] can be the value of the parameter d * .Unfortunately, it is not the truth, because the intrinsic dimensionality d = 1 in the case of a spiral.
Afterwards we investigated the closed 1-dimensional manifolds (a helix, a circle) that have neither the beginning nor the end.It became clear, that, in this case, the true intrinsic dimensionality d = 1 of these data is increased by 1 by ISOMAP, i.e., d * = 2 (Fig. 11, Fig. 12).However, if a curve is unclosed, for example, a semicircle, then the intrinsic dimensionality of the data is set true by this method, i.e., d = d * = 1 (Fig. 13).As the values of the residual variance in the graph are very small (equal almost zero), maybe it is risky to draw some conclusions.However, the first lowest value is obtained with d * = 1.
The previous fact is validated by the following investigation with high-dimensional data points, corresponding to real pictures of a rotated duckling.If the object (a duckling) has been gradually rotated by the 360 • angle, the data points are located on a circle.However, the estimate of the intrinsic dimensionality of these data obtained by ISOMAP is equal to 2 (Fig. 14).But if a duckling is rotated at the 180 • angle little by little, then  the data points locate themselves on a semicircle.In this case, the intrinsic dimensionality of the data, obtained by ISOMAP, is equal to 1 (Fig. 15).In both cases, the highdimensional data points, corresponding to real pictures of a rotated duckling, lie on a 1-dimensional manifold (a curve).Therefore, the true intrinsic dimensionality of the data is 1.The last investigation is performed with the high-dimensional data points that correspond to coloured pictures.It has been noticed that the ISOMAP method increases the intrinsic dimensionality by 1, if colours and lighting dominate in the pictures.The intrinsic dimensionality is 1 of the high-dimensional data, obtained by digitizing the coloured pictures of a rotated cup, because these data points are located on a 1-dimensional manifold (a semicircle) as well.Besides, they have one degree of freedom of a motion.However, we can see from Fig. 16 that the dimensionality obtained by ISOMAP is equal to 2. An analogous situation is obtained by analyzing the high-dimensional data that correspond to photos of a person's face observed in different poses (left-and-right, up-and-down).Due to the lighting in the photos, the dimensionality obtained by ISOMAP is 3, but not 2 (Fig. 17).

Conclusions
Real-life data are often hardly understandable because of their high-dimensionality.Therefore, the ability to find the intrinsic dimensionality of a data set is very useful.Several methods for estimating the intrinsic dimensionality are proposed in the literature.
In this paper, we have analysed two methods for the intrinsic dimensionality: the maximum likelihood estimator (MLE) and the ISOMAP method.The obtained results are generalized in Table 1.We have shown that, in order to get true estimates by MLE, it is necessary to evaluate geodesic distances between data points in this algorithm (d * MLEg = d in all the cases).If the Euclidean distances are used in MLE, sometimes we can get false estimates of the intrinsic dimensionality.
ISOMAP is a nonlinear manifold learning method, but it has the ability to define the intrinsic dimensionality of data, too.Disadvantages of this method became evident in the estimation of the intrinsic dimensionality while analyzing closed manifolds and coloured pictures.In these cases, ISOMAP has failed to estimate the intrinsic dimensionality of the data, because it increased the true intrinsic dimensionality of these data by 1, as compared with the maximum likelihood estimator.

Fig. 2 .Fig. 3 .Fig. 4 .
Fig. 2. Dependences of the estimate d * of the intrinsic dimensionality on k, obtained after analyzing the points of the helicoid by MLE.

Fig. 5 .
Fig. 5. Dependences of the estimate d * of the intrinsic dimensionality on k, obtained after analyzing the data points, corresponding to uncoloured pictures of a rotated duckling, by MLE (k ∈ [3, 71]).

Fig. 6 .
Fig. 6.Dependences of the estimate d * of the intrinsic dimensionality on k, obtained after analyzing the data points, corresponding to the coloured pictures of a rotated cup, by MLE (k ∈ [3, 34]).

Fig. 7 .
Fig. 7. Dependences of the estimate d * of the intrinsic dimensionality on k, obtained after analyzing the data points, corresponding to the photos of a person's face, by MLE (k ∈ [3, 600]) (k geod = 5).

Fig. 8 .
Fig. 8. Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points of the S-shaped manifold by ISOMAP (k geod = 5).

Fig. 9 .
Fig. 9. Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points of the 8-shaped manifold by ISOMAP (k geod = 5).

Fig. 10 .
Fig. 10.Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points of the helicoid by ISOMAP (k geod = 5).

Fig. 11 .
Fig. 11.Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points of the helix by ISOMAP (k geod = 2).

Fig. 12 .
Fig. 12. Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points of the circle by ISOMAP (k geod = 2).

Fig. 13 .
Fig. 13.Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points of the semicircle by ISOMAP (k geod = 2).

Fig. 14 .
Fig. 14.Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points, corresponding to the pictures of a rotated duckling at 360 • , by ISOMAP (k geod = 2).

Fig. 15 .
Fig. 15.Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points, corresponding to the pictures of a rotated duckling at 180 • , by ISOMAP (k geod = 2).

Fig. 16 .
Fig. 16.Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points, corresponding to coloured pictures of a rotated cup, by ISOMAP (k geod = 3).

Fig. 17 .
Fig. 17.Dependence of the residual variance on the dimensionality of the projection space d * obtained after analysing the points, corresponding to the photos of a person's face, by ISOMAP (k geod = 5).

Table 1 .
The values of the intrinsic dimensionality, obtained by different method.