Face Recognition Using Principal Components Analysis (PCA)

From MeliWiki
Jump to: navigation, search

<Contributed to MeLi Wiki by Professor George Bebis, Department of Computer Science, University of Nevada, Reno>


Introduction to Face Recognition

Face recognition is a key biometric technology with a wide range of potential applications related to national security and safety including surveillance, information security, access control, identity fraud, gang tracking, banking and finding missing children. Faces are highly dynamic and can vary considerably in their orientation, lighting, scale and facial expression, therefore face recognition is considered a difficult problem to solve. Previous studies on face recognition fall into one of two main categories [1]: feature-based and holistic-based. Feature-based methods identify faces by extracting landmarks, or local features, from an image of the subject's face. Methods using geometrical relationships such as the relative position, size, and/or shape of the eyes, nose, cheekbones, and jaw fall into this category [2]. However, facial features are not always easy to extract and discard textural information like the "smoothness" of faces or hair style that might contain strong identity information. This observation has led to holistic-based methods which actually use features extracted from the whole image (i.e., global features) [3]. The main drawback of holistic-based approaches is that they are sensitive to occlusion; however, they can tolerate noise better than feature-based approaches.

Considerable progress has been made in template-based face recognition research over the last decade, especially with the development of powerful models of face appearance [1]. These models represent faces as points in high-dimensional spaces and employ dimensionality reduction to find a more meaningful representation. The key observation is that although face images can be regarded as points in a high-dimensional space, they often lie on a manifold (i.e., subspace) of much lower dimensionality, embedded in the high-dimensional image space [4]. The main issue is how to properly define and determine a low-dimensional subspace of face appearance in a high-dimensional image space.

Dimensionality reduction techniques using linear transformations have been very popular in determining the intrinsic dimensionality of the manifold as well as extracting its principal directions (i.e., basis vectors). The most prominent method in this category is Principal Component Analysis PCA [5]. PCA determines the basis vectors by finding the directions of maximum variance in the data and it is optimal in the sense that it minimizes the error between the original image and the one reconstructed from its low-dimensional representation. PCA has been very popular in face recognition, especially with the development of the method of eigenfaces [3]. Its success has triggered significant research in the area of face recognition and many powerful dimensionality reduction techniques (e.g., Probabilistic PCA, Linear Discriminant Analysis (LDA) Independent Component Analysis (ICA), Local Feature Analysis (LFA), Kernel PCA) have been proposed for finding appropriate low-dimensional face representations [1].


Principal Component Analysis (PCA)

Typically, problems arise when performing recognition in high-dimensional spaces (i.e., “curse of dimensionality”). Significant improvements can be achieved by first mapping the data into a lower-dimensional sub-space. Applying dimensionality reduction to some vector <math>\mathbf{x}=[x_1, x_2, ..., x_N]^T</math> in an <math>N</math>-dimensional space yields another vector <math>\mathbf{y}=[y_1, y_2, ..., y_K]^T</math> in an <math>K</math>-dimensional space where <math>K<N</math>. In principle, dimensionality reduction leads to information loss; the goal of PCA is to reduce the dimensionality of the data while preserving as much information as possible in the original data. This is equivalent to retaining as much as possible of the variation present in the original data [5]. In this context, PCA computes a linear transformation <math>\mathbf{T}</math> that maps data from a high dimensional space to a lower dimensional sub-space as shown below:

<math> \begin{cases} y_1=t_{11}x_1+t_{12}x_2+\cdots+t_{1N}x_N \\ y_2=t_{21}x_1+t_{22}x_2+\cdots+t_{2N}x_N \\

   \cdots  \\

y_K=t_{K1}x_1+t_{K2}x_2+\cdots+t_{KN}x_N

\end{cases}</math>

or <math>\mathbf{y}=\mathbf{Tx}</math> where

<math>\mathbf{T} = \begin{bmatrix}
t_{11} & t_{12} & \cdots & t_{1N}\\ 
t_{21}& t_{22} & \cdots & t_{2N}\\ 
\vdots & \vdots  & \ddots  & \vdots  \\ 
t_{K1}& t_{K2} & \cdots  & t_{KN}
\end{bmatrix}</math>

The optimum transformation <math>\mathbf{T}</math> is the one that minimizes <math>\left \| x-y \right \|</math>. According to the theory of PCA (see [5]), the optimum low-dimensional space can be defined by the “best” eigenvectors of the covariance matrix of the data (i.e., the eigenvectors corresponding to the largest eigenvalues of the covariance matrix; also refereed to as “principal components”).

Suppose <math>\mathbf{I}_1, \mathbf{I}_2, \cdots, \mathbf{I}_M</math> is a set of <math>M</math>, <math>N \times 1</math> vectors; we provide below the main steps of PCA analysis:

Step 1: Compute the average vector <math>\mathbf{\bar{I}}=\frac{1}{M}\sum_{i=1}^{M}\mathbf{I}_i</math>

Step 2: Normalize vectors by subtracting the average vector: <math>\mathbf{\Phi}_i=\mathbf{I}_i-\mathbf{\bar{I}}</math>

Step 3: Form the matrix <math>\mathbf{A}=[\mathbf{\Phi}_1,\mathbf{\Phi}_2 ,\cdots,\mathbf{\Phi}_M ] </math> ( <math>N \times M</math> matrix )

Step 4: Compute the covariance matrix: <math>\mathbf{C}=\frac{1}{M}\sum_{n=1}^{M}\mathbf{\Phi}_n\mathbf{\Phi}_n^T=\mathbf{AA}^T</math> (<math>N \times N</math> matrix; characterizes variance of the data). See [5] for a review on covariance matrices.

Step 5: Compute the eigenvalues <math>\lambda_1, \lambda_2, \cdots , \lambda_N</math> and eigenvectors <math>\mathbf{u}_1, \mathbf{u}_2, \cdots , \mathbf{u}_N</math> of <math>\mathbf{C}</math> (assume that <math>\lambda_1 > \lambda_2 > \cdots > \lambda_N</math>). See [5] for a review on eigenvalues and eigenvectors.

Since <math>\mathbf{C}</math> is symmetric, <math>\mathbf{u}_1, \mathbf{u}_2, \cdots , \mathbf{u}_N</math> form a set of basis vectors, that is, any vector <math>\mathbf{I}</math> in the same space can be written as a linear combination of the eigenvectors); using normalized vectors, we have:

<math>\mathbf{I}-\mathbf{\bar{I}}=y_1\mathbf{u}_1+y_2\mathbf{u}_2+\cdots +y_N\mathbf{u}_N=\sum_{i=1}^{N}y_i\mathbf{u}_i</math>

Step 6 (dimensionality reduction): Represent each vector <math>\mathbf{I}</math> by keeping only the terms corresponding to the largest <math>K</math> eigenvalues:

<math>\mathbf{\hat{I}}-\mathbf{\bar{I}}=y_1\mathbf{u}_1+y_2\mathbf{u}_2+\cdots +y_K\mathbf{u}_K=\sum_{i=1}^{K}y_i\mathbf{u}_i</math>

where <math>K < N</math>; in this case, <math>\mathbf{\hat{I}}</math> approximates <math>\mathbf{I}</math> such that <math>\left \| \mathbf{I}-\mathbf{\hat{I}} \right \|</math> is minimum.

Therefore, the linear transformation <math>\mathbf{T}</math> implied by PCA is defined by the principal components of the covariance matrix:

<math>\mathbf{T} = \begin{bmatrix}
u_{11} & u_{21} & \cdots & u_{K1}\\ 
u_{12}& u_{22} & \cdots & u_{K2}\\ 
\vdots & \vdots  & \ddots  & \vdots  \\ 
u_{1N}& u_{2N} & \cdots  & u_{KN}
\end{bmatrix}</math>


PCA can be interpreted geometrically to better understand how it works (see Figure 1). PCA projects the data along the directions where the data varies the most. These directions are determined by the eigenvectors of the covariance matrix corresponding to the largest eigenvalues. The magnitude of the eigenvalues corresponds to the variance of the data along the eigenvector directions.

Figure 1. Geometric interpretation of PCA

To decide how many principal components to keep (i.e., value of <math>K</math>), the following criterion can be used:

<math>\frac{\sum_{i=1}^{K}\lambda_i}{\sum_{i=1}^{N}\lambda_i}>t</math>

where <math>t</math> is a threshold (e.g., 0.8 or 0.9). The value of <math>t</math> determines the amount of information to be preserved in the data. Once the value of t has been specified, <math>K</math> can be determined. It can be shown that the error due to the dimensionality reduction step is given by:

<math>error = \frac{1}{2}\sum_{i=K+1}^{N}\lambda_i </math>

It should be mentioned that the principal components are dependent on the units used to measure the original variables as well as on the range of values they assume. Therefore, we should always standardize the data prior to using PCA. A common standardization method is to transform all the data to have zero mean and unit standard deviation:

<math>\frac{x_i-\mu}{\sigma}</math>

where <math>\mu</math> and <math>\sigma</math> are the mean and standard deviation of the data.


The eigenface approach

The eigenface approach uses PCA to represent faces in a low dimensional subspace spanned by the “best” eigenvectors (i.e., eigenfaces) of the covariance matrix of the face images. Although the methodology is the same, there are some practical issues that need special consideration. Suppose that we are given <math>M</math> training face images <math>\mathbf{I}_1, \mathbf{I}_2, \cdots, \mathbf{I}_M</math>, each of size <math>N \times N</math>. We describe below the main steps of applying PCA on face images.

Step 1: Represent each <math>N \times N</math> face image <math>\mathbf{I}_i</math> as a one-dimensional, <math>N^2 \times 1</math> vector <math>\mathbf{\Gamma_i}</math>. This can be done by simply stacking the rows of the image, one after the other (see Figure 2). Note: the images must have been scaled and aligned with each other first.

Figure 2. Vector representation of face images.

Step 2: Compute the average face <math>\mathbf{\Psi}=\frac{1}{M}\sum_{i=1}^{M}\mathbf{\Gamma}_i</math>

Step 3: Normalize each <math>\mathbf{\Gamma}_i</math> by subtracting the average face <math>\mathbf{\Phi}_i=\mathbf{\Gamma}_i-\mathbf{\Psi}</math>

Step 4: Form the matrix <math>\mathbf{A}=\left [ \mathbf{\Phi}_1, \mathbf{\Phi}_2, \cdots, \mathbf{\Phi}_M \right ]</math> (<math>N^2 \times M</math> matrix)

Step 5: Compute the covariance matrix: <math>\mathbf{C}=\frac{1}{M}\sum_{n=1}^{M}\mathbf{\Phi}_n\mathbf{\Phi}_n^T=\mathbf{AA}^T</math> (<math>N^2 \times N^2</math> matrix; characterizes variance of faces). According to the PCA approach, we would need to compute the eigenvectors <math>\mathbf{u}_i</math> of <math>\mathbf{AA}^T</math>. The matrix <math>\mathbf{AA}^T</math> is very large (i.e., <math>N^2 \times N^2</math>); therefore, it is not practical to compute its eigenvectors. Instead, we will consider the eigenvectors <math>\mathbf{v}_i</math> of the matrix <math>\mathbf{A}^T\mathbf{A}</math> which is much smaller (i.e., <math>M \times M</math>). Then, we will compute the eigenvectors of <math>\mathbf{AA}^T</math> from the eigenvectors of <math>\mathbf{A}^T\mathbf{A}</math>.

Step 6: Compute the eigenvectors <math>\mathbf{v}_i</math> of <math>\mathbf{AA}^T</math>.

We can easily show the relation between <math>\mathbf{u}_i</math> and <math>\mathbf{v}_i</math>. Since <math>\mathbf{v}_i</math> are the eigenvectors of <math>\mathbf{A}^T\mathbf{A}</math>, they satisfy <math>\mathbf{A}^T\mathbf{A}\mathbf{v}_i=\mu_i\mathbf{v}_i</math> where <math>\mu_i</math> are the corresponding eigenvalues. If we multiply both sides of the above equation by <math>\mathbf{A}</math>, we have <math>\mathbf{AA}^T\mathbf{A}\mathbf{v}_i=\mathbf{A}\mu_i\mathbf{v}_i</math> or <math>\mathbf{CA}\mathbf{v}_i=\mathbf{A}\mu_i\mathbf{v}_i</math> or <math>\mathbf{C}\mathbf{u}_i=\mu_i\mathbf{u}_i</math>. Therefore, both <math>\mathbf{AA}^T</math> and <math>\mathbf{A}^T\mathbf{A}</math> have the same eigenvalues while their eigenvectors are related through <math>\mathbf{u}_i=\mathbf{Av}_i</math>. It should be noted that <math>\mathbf{AA}^T</math> can have up to <math>N^2</math> eigenvectors while <math>\mathbf{A}^T\mathbf{A}</math> can have up to <math>M</math> eigenvectors. It can be shown that the eigenvectors of <math>\mathbf{A}^T\mathbf{A}</math> correspond to the “best” <math>M</math> eigenvectors of <math>\mathbf{AA}^T</math> (i.e., the eigenvectors corresponding to the largest eigenvalues).

Step 7: Compute the eigenvectors <math>\mathbf{u}_i</math> of <math>\mathbf{AA}^T</math> using <math>\mathbf{u}_i=\mathbf{Av}_i</math>. Note: normalize <math>\mathbf{u}_i</math> such that <math>\left \| \mathbf{u}_i \right \|=1</math>

Step 8 (dimensionality reduction): Represent each face <math>\mathbf{\Gamma}</math> by keeping only the terms corresponding to the largest <math>K</math> eigenvalues:

<math>\mathbf{\hat{\Gamma}}-\mathbf{\Psi}=y_1\mathbf{u}_1+y_2\mathbf{u}_2+\cdots+y_K\mathbf{u}_K=\sum_{i=1}^{K}y_1\mathbf{u}_i</math>

Figure 3 below provides a visualization of the eigenface approach. In the first row, we show a set of eigenfaces (i.e., eigenvectors corresponding to large eigenvalues; the term “eigenface” comes from the fact that the eigenvectors look like ghostly faces). The second row shows a new face, experessed as a linear combination of the eigenfaces.

Figure 3. Visualization of the eigenface approach: each face can be represented as a linear combination of the eigenfaces.

Using PCA, each face image <math>\mathbf{\Gamma}</math> can be represented in a lower dimensional space, using the coefficients of the linear expansion:

<math>\mathbf{\Omega}=\begin{bmatrix}

y_1\\ y_2\\ \vdots \\ y_K

\end{bmatrix}</math>

To perform face recognition, first we represent each training face in a lower dimensional space using PCA:

<math>\mathbf{\Omega}_i=\begin{bmatrix}

y_{i1}\\ y_{i2}\\ \vdots \\ y_{iK}

\end{bmatrix}, i=1,\cdots,M</math>

Given an unknown <math>N \times N</math> face image <math>\mathbf{I}</math> (aligned in the same way as the training faces), we apply the following steps for recognition purposes:

Step 1: Represent <math>\mathbf{I}</math> as a one-dimensional, <math>N^2 \times 1</math> vector <math>\mathbf{\Gamma}</math>

Step 2: Normalize <math>\mathbf{\Gamma}</math> by subtracting the average face <math>\mathbf{\Phi}=\mathbf{\Gamma}-\mathbf{\Psi}</math>

Step 3: Project <math>\mathbf{\Phi}</math> onto the PCA space (i.e., eigenspace)

<math>\mathbf{\Phi}=y_1\mathbf{u}_1+y_2\mathbf{u}_2+\cdots+y_K\mathbf{u}_K=\sum_{i=1}^{K}y_i\mathbf{u}_i</math> where <math>y_i=\mathbf{u}_i^T\mathbf{\Phi}</math>

Step 4: Find the closest training face <math>\mathbf{\Phi}_i</math> to the unknown face <math>\mathbf{\Phi}</math>

<math>e_r=\min_l\left \|\mathbf{\Omega}-\mathbf{\Omega}_l \right \|</math>

where <math>e_r</math> is the minimum error.

Step 5: If <math>e_r<T_r</math>, where <math>T_r</math> is a threshold, the face <math>\mathbf{\Gamma}</math> is recognized as face <math>\mathbf{\Gamma}_i</math>

The error <math>e_r</math> is called “distance within the face space”. Typically, the Euclidean distance is used to compute the error, however, it has been shown that the Mahalanobis distance, shown below, works better:

<math>\left \| \mathbf{\Omega}-\mathbf{\Omega}_l \right \|=\sum_{i=1}^{K}\frac{1}{\lambda_i}\left ( y_{i}-y_{li} \right )^2</math>


Experimental Results

We have considered two different databases in our experiments: the ORL and the CVL databases [6]. The ORL database contains 400 face images from 40 subjects with 10 frontal exposures of each assuming different facial expressions, lighting, and slight orientation changes. The CVL database contains 798 images from 114 subjects with 7 exposures of each assuming different facial expressions and orientations. We only used the frontal exposures of each subject in our experiments (i.e., 3 exposures assuming different facial expressions, that is, 342 images). In each experiment, we divided the database under experimentation into three subsets: training, gallery, and test. We have performed two types of experiments: (a) the training set contains images of the same subjects as in the gallery set, and (b) the training and gallery sets contain different subjects. Note that in the description of the eigenface method, we assumed that the training and gallery sets were the same. However, the second scenario is more realistic when we do not have a representative set of images for training. In this case, we would need to use images of other subjects to create a representative training set and compute the eigenspace. The test set was used to evaluate the performance of the eigenface approach. Specifically, given a test face, the closest match approach performs recognition by finding the nearest face from the gallery set. The recognition accuracy is computed as the ratio of the faces recognized correctly from the test set over the total number of faces in the test set. For comparison purposes, we have compared the performance of PCA with a competitive dimensionality reduction technique called Random Projection (RP) [7]. Since it has been reported that RP performs better when averaging the results over several RPs, we also report results using majority voting on a relatively small ensemble of five different random projections.

Figures 4(a)-(c) show the results using the ORL database assuming subjects with the same identity both in the training and gallery sets while Figures 4(d)-(f) show the case of subject having different identity. In the first case, we built the training set using 3 images from each subject (i.e., 120 images) while in the second case, we built the training set using the images of 13 subjects (i.e., 130 images). The images of the rest 27 subjects (i.e., 270 images) were used to create the gallery and test sets. To test the sensitivity of each method, we also varied the number of images per subject in the gallery set, (i.e., Figure 4(a) - 3 images per subject, Figure 4(b) - 2 images per subject, Figure 4(c) - 1 image per subject, Figure 4(d) - 5 images per subject, Figure 4(e) - 4 images per subject, and Figure 1(f) - 3 images per subject). In each graph, the blue line corresponds to PCA while the red line corresponds to RP. The green line corresponds to RP using majority voting.

Our first observation is that PCA in general performs better when the identity of the subjects in the gallery set is the same to the identity of the subjects in the training set. Comparing RP with PCA, our results show that PCA performs better than RP mostly for low dimensions (i.e., 20-30). This result is consistent with previous studies where it has been reported that RP compares favorably with PCA for moderate or higher number of dimensions. The difference in performance becomes smaller and smaller as the number of dimensions increases.

The results using different subjects in the gallery set than in the training set are more interesting. In this case, it is obvious that RP and PCA have very close performances. In fact, RP seems to be doing slightly better (i.e., 1%-2%) than PCA for higher dimensions. This is mainly because PCA is data dependent while RP is data independent. Overall, both methods seem to be affected by the number of images per subject in the gallery set. Their performance degrades as the number of images per subject in the gallery set decreases. The degradation in the performance of the two methods has been emphasized by the fact we increase the size of the test set as we decrease the size of the gallery set (i.e., the images removed from the gallery set are added to the test set).

Figure 4. Experiments using the ORL database, closest match, and majority voting. (a)-(c) Same subjects in the training and gallery sets, (d)-(f) Different subjects in the training and gallery sets. The proportion of subjects in the gallery and test sets varies as shown. The blue line corresponds to PCA using closest match, the red line corresponds to RP using closest match, and the green line corresponds to RP using majority voting.

The results for the CVL database are shown in Figure 5. Figure 5(a) corresponds to using subjects with the same identity both in the training and gallery sets while Figures 5(b) and (c) correspond to different subjects. In Figure 5(a), we built the training, gallery, and test sets by splitting the CVL database into three equal parts, keeping 1 image per subject for each part. In Figures 5(b) and (c) we built the training set using the images of 34 subjects (i.e., 102 images). Figure 5(b) corresponds to storing 2 images per subject in the gallery set while Figures 5(c) corresponds to storing 1 image per subject in the gallery set. The results for the CVL database are quite consistent with those for the ORL database. Figure 5(b) presents an interesting case where RP seems to be doing better than PCA (i.e., 3%-5%) using 50 dimensions and higher.

Figure 5. Experiments using the CVL database, closest match, and majority voting. (a) Same subjects in the training and gallery sets (b)-(c) Different subjects in training and gallery sets. The proportion of subjects in the gallery and test sets varies as shown. The blue line corresponds to PCA using closest match, the red line corresponds to RP using closest match, and the green line corresponds to RP using majority voting.


Project

In this project, you will implement PCA-based face recognition and study the effect of several factors on identification performance. For your experiments, you will use a subset of the AR face database (50 subjects, 10 images per subject) [6]. Each face image is stored as an individual Portable Gray Map (PGM) image file. The file name is defined as follows: the first two digits represent the user index and the next two digits denote the image index for the same user. For example, image 0204.pgm is the 4th image of user 2. Each image is a <math>42 \times 42 \left ( = 1764 \right )</math> gray scale image. For different users, the images with the same index image number are captured under similar conditions, including lighting and expression.

Experiments

(a) Implement PCA-based face recognition. For each user, use images No. 3 through 6 to construct the training dataset, images No. 7 to 10 to construct the gallery set, and images No. 1 and 2 for testing. So, there would be 400 images for training and 100 images for testing.

(a.I) Compute the PCA space using the training dataset. Show
  • The average face
  • The 10 eigenfaces corresponding to the 10 largest eigenvalues
  • The 10 eigenfaces corresponding to the 10 smallest eigenvalues
(a.II) Choose the top 50 eigenvectors (eigenfaces) as the basis. Project both the gallery and test images onto the basis after subtracting the average to obtain the PCA representation of each image. Compute the Mahalanobis distance between the coefficients vectors for each pair of gallery and query images as the matching distance; you will obtain a <math>400 \times 100</math> matching distance matrix (400 training images, 100 query images). For each test image, there will be 400 matching distances obtained by matching the test with each image in the gallery dataset. Choose the smallest matching distance; if the associated subject is the same as that of the query image, then it is considered as a correct match, otherwise an incorrect match. For the test database, count the number of correct matches and divide it by the total number (100) to report the identification accuracy.
(a.III) Select two subjects randomly. Show their gallery images and test images, separately.
(a.IV) Show 5 test images which are correctly matched, along with the corresponding best matched gallery images.
(a.V) Show 5 test images which are incorrectly matched, along with the corresponding mismatched gallery images. (If the number of incorrectly matched test images is less than 5, report all you get.)

(b) Choose different numbers of eigenvectors as the basis, e.g., 5, 10, 20, 30, 40, 50. Conduct the experiment for each basis with different number of eigenvectors, and compute the identification accuracy using the same procedure as in a.II). Plot the curve of identification accuracy vs. number of eigenvectors.

(c) For each user, use images No. 1 through 4 to construct the training dataset, images No. 5 to 8 to construct the gallery set, and images No. 9 and 10 for testing.

  • Repeat experiment (a) but only report the identification accuracy as described in (a.II).
  • Select the same two subjects as those in (a.III); show training and query images separately.
  • If there are significant differences in terms of identification accuracy between experiments (c) and (a), explain why? If there is no significant difference, what does it imply?


Wiki Assessment

Pre-test questions

  1. What is the goal of face recognition?
  2. What are some practical applications of face recognition?
  3. Could you devise a simple algorithm for face recognition?
  4. What factors might affect recognition performance?
  5. What might be some problems associated with very high dimensional data?

Post-test questions

  1. What is the purpose of dimensionality reduction?
  2. What criterion does PCA use in finding a low-dimensional space?
  3. Describe the main steps of the PCA approach.
  4. How do we choose the number of principal components?
  5. What practical problems might exist when applying PCA for face recognition? How do we deal with them?
  6. Describe the main steps of face recognition using PCA.


References and Resources

  1. 1.0 1.1 1.2 W. Zhao, R. Chellappa, P.J. Phillips, and A. Rosenfeld, “Face recognition: A literature survey,” ACM Computing Surveys, vol. 35, no. 4, pp. 399–458, 2003.
  2. R. Brunelli and T. Poggio, "Face Recognition: Features versus Templates", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 10, pp. 1042-1052, 1993.
  3. 3.0 3.1 M. Turk, and A. Pentland: Eigenfaces for recognition. Cognitive Neuroscience, vol. 3 pp. 71–86, 1991.
  4. B. Moghaddam, “Principal manifolds and probabilistic subspaces for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 6, pp. 780–788, 2002.
  5. 5.0 5.1 5.2 5.3 5.4 L. Smith, “A tutorial on Principal Components Analysis”, http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf (last accessed on 12/29/2009).
  6. 6.0 6.1 Face Recognition Home Page: http://www.face-rec.org/
  7. N. Goel, G. Bebis, A. Nefian, "Face Recognition Experiments with Random Projection", SPIE Defense and Security Symposium (Biometric Technology for Human Identification), Orlando, FL, March 28 - April 1, 2005.