Automatic Facial Expression Recognition: Space-Classifier Combinations.

Author: Humayra Binte Ali

Ali, Humayra Binte, 2017 Automatic Facial Expression Recognition: Space-Classifier Combinations., Flinders University, School of Computer Science, Engineering and Mathematics

Terms of Use: This electronic version is (or will be) made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. You may use this material for uses permitted under the Copyright Act 1968. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact with the details.


Facial expression recognition is a broad research domain in machine learning. Principal Component Analysis (PCA), Independent Component Analysis(ICA), Non-negative matrix factorization (NMF) and HOG (Histogram of Oriented Gradients) are well-established techniques for image analysis. In this thesis, we propose a facial expression recognition system, which is based on NMF, HOG, PCA and ICA for feature extraction. For classification, we implement Euclidian Distance (ED), Support Vector Machine (SVM) and ELM (Extreme Learning Machine) classifiers. Every feature has been passed to each of the classifiers to find the performance of the feature and classifier combinations. As we are using PCA, ICA and NMF which are mainly applied for dimension reduction and HOG work as SIFT descriptors, we will use the term ’space’ for the feature extraction processes. Altogether we investigate the performance of sixteen space and classifier combinations to make a comparison of the FER system. Our proposed approach has been tested on both CK and JAFFE datasets.

There is a considerable debate over whether it is best to use whole or part based image analysis. So in our proposed system, we implement the FER system using both whole face and part face based recognition systems. In our experimental setup, first, we detect the three face parts (eyes, nose and mouth) using cascaded object detection by setting regions using a systematic trial and error basis.

For the extraction of facial features, we apply the commonly used PCA and ICA with the more plausible NMF and also the SIFT (Scale-invariant feature transform) descriptor like feature, HOG. As PCA, ICA and NMF work by reducing the total feature space, so in this thesis, we will consider the features produced by PCA, ICA, NMF and HOG as ‘Space’. The classifiers we implement here are the following: Euclidian Distance (ED), Support Vector Machine (SVM), Extreme Learning Machine (ELM) and Extreme Learning Machine Kernel (ELM-Kernel). As every Space is fed to every classifier, so the total comparison is among sixteen space+classifier combinations. These space-classifier combinations are, PCA+ED, PCA+ELM, PCA+ ELM kernel, PCA+SVM, ICA+ED, ICA+ELM, ICA+ ELM kernel, ICA+SVM, NMF+ED, NMF+ELM, NMF+ ELM kernel, NMF+SVM, HOG+ED, HOG+ELM, HOG+ ELM kernel as HOG+SVM.

Potentially a subset of all the three facial parts (eyes, nose and mouth) of the face is better in terms of processing time and accuracy for identifying an expression. To prove whether three facial parts can perform better to express any certain emotions or vice versa, we implement a 3X10-fold R-K cross-validation, where ’R’ is for repeated cross-validation. From the investigation, it is proved that for some space-classifier combinations three main facial parts perform better than the full face based FER and also vice versa. Also our prediction is any subset of the three facial parts can still perform better. To analyze this issue, we carefully design a 10x10 Nested Cross-Validation (N-CV) approach to tune the space-classifier combinations for each subset of the facial parts and also for the full face. We analyzed the results of the Result Analysis chapter.

We use a set of three facial regions and ensure each part is of similar size. For our proposed RK-CV method we segment the faces into three regions, eyes, nose and mouth and we consider all these three face parts to classify expressions. For the N-CV approach, we take the features for the whole face, eyes, nose, mouth, nose+ mouth, eyes+ mouth, eyes+nose, and eyes+nose+mouth. These features are made for all the seven basic expressions.

The recognition rate can be seen to be much better using the whole face decomposition and comparison, but this comes at an increased computational cost. We formulate a table which shows the influence of different facial parts for emoting a specific expression. To validate our results, we tested each expression individually, projecting it onto the whole set feature spaces trained against the whole training dataset, which has a mixture of all seven expressions.

Keywords: Automatic facial expression recognition, facial expression recognition, Principal Component Analysis (PCA), Independent Component Analysis(ICA), Non-negative matrix factorization (NMF), HOG (Histogram of Oriented Gradients), image analysis, facial expression recognition system, feature extraction, Euclidian Distance (ED), Support Vector Machine (SVM), Extreme Learning Machine (ELM), Extreme Learning Machine Kernel (ELM-Kernel), Nested Cross-Validation (N-CV) , Cross-Validation, Subspace learning algorithm, machine learning algorithm for facial expression recognition.

Subject: Computer Science thesis, Engineering thesis, Mathematics thesis

Thesis type: Doctor of Philosophy
Completed: 2017
School: School of Computer Science, Engineering and Mathematics
Supervisor: Professor David M W Powers