Difference between revisions of "Dimensionality Reduction Using Random Projection"
(→Random Projection (RP)) |
(→Random Projection (RP)) |
||
Line 29: | Line 29: | ||
Orthogonalizing the random vectors is required to preserve the similarities between the original vectors in the low-dimensional space. In high enough dimensions, however, it is possible to save computation time by avoiding the orthogonalization step without affecting much the quality of the projection matrix. This is because, in high-dimensional spaces, there exist a much larger number of almost orthogonal vectors than orthogonal vectors. Therefore, high-dimensional vectors having random directions are very likely to be close to orthogonal. | Orthogonalizing the random vectors is required to preserve the similarities between the original vectors in the low-dimensional space. In high enough dimensions, however, it is possible to save computation time by avoiding the orthogonalization step without affecting much the quality of the projection matrix. This is because, in high-dimensional spaces, there exist a much larger number of almost orthogonal vectors than orthogonal vectors. Therefore, high-dimensional vectors having random directions are very likely to be close to orthogonal. | ||
− | A simpler algorithm has been proposed by Achlioptas <ref name="Ref12">D. Achlioptas, âDatabase-friendly random projections,â ACM Symposium on the Principles of Database Systems, pp. 274â281, 2001.</ref> | + | A simpler algorithm has been proposed by Achlioptas <ref name="Ref12">D. Achlioptas, âDatabase-friendly random projections,â ACM Symposium on the Principles of Database Systems, pp. 274â281, 2001.</ref> or approximating the random matrix, yielding significant computational savings during the computation of R and the projections <math>R\Gamma_{i}</math>. The algorithm produces a sparse random matrix with elements belonging in -1, 0, 1, therefore, simplifying the computations considerably. |
+ | |||
+ | RP has been applied on various types of problems including information retrieval, machine learning and optimization (see <ref name="Ref10">N. Goel, G. Bebis, A. Nefian, "Face Recognition Experiments with Random Projection", SPIE Defense and Security Symposium (Biometric Technology for Human Identification), Orlando, FL, March 28 - April 1, 2005.</ref> ). Although it is based on a simple idea, RP has demonstrated good performance in a number of applications, yielding results comparable to conventional dimensionality reduction techniques, such as PCA, while having much lower computational requirements. | ||
== Using RP for Face Recognition == | == Using RP for Face Recognition == |
Revision as of 13:27, 16 July 2011
Contents
Dimensionality Reduction
Typically, problems arise when performing pattern recognition tasks in high-dimensional spaces (i.e., âcurse of dimensionalityâ). Significant improvements can be achieved by first mapping the data into a lower-dimensional sub-space. Applying dimensionality reduction to some vector <math>x=[x_{1},x_{2},...,x_{N}]^{T}</math> in an N-dimensional space yields another vector <math>y=[y_{1},y_{2},...,y_{K}]^{T}</math> in an K-dimensional space where <math> K << N </math> Dimensionality reduction techniques using linear transformations have been very popular in determining the intrinsic dimensionality of the manifold as well as extracting its principal directions (i.e., basis vectors).
The most prominent method in this category is Principal Component Analysis (see module on Face Recognition Using PCA). PCA determines the basis vectors by finding the directions of maximum variance in the data and it is optimal in the sense that it minimizes the error between the original image and the one reconstructed from its low-dimensional representation. This is equivalent to retaining as much as possible of the variation present in the original data. Its success has triggered significant research in pattern recognition and many powerful dimensionality reduction techniques (e.g., Probabilistic PCA, Linear Discriminant Analysis (LDA) Independent Component Analysis (ICA), Local Feature Analysis (LFA), Kernel PCA) have been proposed for finding appropriate low-dimensional pattern representations[1].
Random Projection (RP) is a powerful method for dimensionality reduction [2] [3] [4] [5]. It represents a computationally simple and efficient method that preserves the structure of the data without introducing significant distortion. Its most important property is that it is a general data reduction method. In RP, the original high-dimensional data is projected onto a low dimensional subspace using a random matrix whose columns have unit length. In contrast to other methods, such as PCA, that compute a low-dimensional subspace by optimizing certain criteria (e.g., PCA finds a subspace that maximizes the variance in the data), RP does not use such criteria, and therefore, it is data independent. Moreover, it represents a computationally simple and efficient method that preserves the structure of the data without introducing significant distortion. There exist a number of theoretical results supporting that RP preserves approximately pairwise distances of points in Euclidean space, volumes and affine distances, and the structure of data (e.g., clustering).
Random Projection (RP)
We summarize below the main ideas behind RP. Let us suppose that we are given a set of vectors <math>\Gamma _{i}, i=1,2,...,M</math>, in a p-dimensional space. RP can be used to transform <math>\Gamma _{i}</math> to a lower dimension d, with d<<p, via the following transformation:
<math> \tilde{\Gamma}_{i}=R\Gamma _{i} </math>
where R is orthonormal and its columns are realizations of independent and identically distributed (i.i.d.) zero-mean normal variables, scaled to have unit length. RP is motivated by the Johnson-Lindenstrauss lemma [6] that states that a set of M points in a high dimensional Euclidean space can be mapped down onto a <math>d \geq O(logM/ \epsilon ^{2})</math> dimensional subspace such that the distances between the points are approximately preserved (i.e., not distorted more than a factor of <math>1 \pm \epsilon</math>, for any <math>0<\epsilon <1</math>). There is experimental evidence suggesting that the lower bound for d is not very tight and that, in practice, lower d values give good results.
The entries of the matrix R can be calculated using the following algorithm:
Step 1. Set each entry of the matrix to an i.i.d. N(0; 1) value.
Step 2. Orthogonalize the d rows of the matrix (e.g., using Gram-Schmidt orthogonalization algorithm).
Step 3. Normalize the rows of the matrix to unit length (i.e., important for preserving similarities in the low dimensional space).
Orthogonalizing the random vectors is required to preserve the similarities between the original vectors in the low-dimensional space. In high enough dimensions, however, it is possible to save computation time by avoiding the orthogonalization step without affecting much the quality of the projection matrix. This is because, in high-dimensional spaces, there exist a much larger number of almost orthogonal vectors than orthogonal vectors. Therefore, high-dimensional vectors having random directions are very likely to be close to orthogonal.
A simpler algorithm has been proposed by Achlioptas [7] or approximating the random matrix, yielding significant computational savings during the computation of R and the projections <math>R\Gamma_{i}</math>. The algorithm produces a sparse random matrix with elements belonging in -1, 0, 1, therefore, simplifying the computations considerably.
RP has been applied on various types of problems including information retrieval, machine learning and optimization (see [5] ). Although it is based on a simple idea, RP has demonstrated good performance in a number of applications, yielding results comparable to conventional dimensionality reduction techniques, such as PCA, while having much lower computational requirements.
Using RP for Face Recognition
Experimental Results
Project
References and Resources
- ↑ W. Zhao, R. Chellappa, P.J. Phillips, and A. Rosenfeld, "Face recognition: A literature survey," ACM Computing Surveys, vol. 35, no. 4, pp. 399â458, 2003.
- ↑ S. Dasgupta, "Experiments with random projection," Uncertainty in Artificial Intelligence, 2000.
- ↑ E. Brigham and H. Maninila, "Random projection in dimensionality reduction: applications to image and text data," International Conference on Knowledge Discovery and Data Mining, pp. 245â250, 2001.
- ↑ D. Fradkin and D. Madigan, "Experiments with random projection for machine learning," ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 571â522, 2003.
- ↑ 5.0 5.1 N. Goel, G. Bebis, A. Nefian, "Face Recognition Experiments with Random Projection", SPIE Defense and Security Symposium (Biometric Technology for Human Identification), Orlando, FL, March 28 - April 1, 2005.
- ↑ W. Johnson and J. Lindenstrauss, âExtensions of lipschitz mapping into a hilbert spaceâ, Conference on Modern Analysis and Probability, pp. 189â206, 1984.
- ↑ D. Achlioptas, âDatabase-friendly random projections,â ACM Symposium on the Principles of Database Systems, pp. 274â281, 2001.