Read My Poker Face

Read My Poker Face

After spending some time watching some youtube tutorials and reading some articles I have gather some information concerning Convolutional Networks and applied it to my situation:
Regarding the batch size and learning rate I tried different values, a size of 32 with a learning rate of 0.0001 gave me the best results for the reduced data model.
Adding a batch normalization layer in the CNN layer before the activation function prevents the values to spread from each other and this helps improving the performance and speed of the training process.
A dropout layer before the fully connected layers. A dropout layer helps preventing overfitting which, since I am working now with a reduced data set, is easy to happen. Also Andrew Ng in one of his videos, explain that in neural networks that work with images, a dropout layer is used almost by default. It is a common practice. 
Hi:) Thanks for visiting my blog. My name is Rafael and I am a former software developer turned into a Data Scientist/Machine Learning Engineer. My interest in the field began a couple of years ago but it wasn't until last february (2019) that I immerse myself in it. Poker Face When I first started teaching, the most common complaint people had about me was that I didn't smile or laugh. I think people caught subtle motions of my face throughout the first couple of months, but I let a little of that go in the classroom.
Results for the model trained with the reduced dataset :
(ignore y_pred column, % is computed dividing right/ones)
To try and check all these improvements I used the reduced data for timing reasons. I improved about 4% the accuracy on the public test data set regarding to the very first model I trained. No bad. We can see that almost all classes have a better prediction with exception of class 6:Neutral which got 10% worse and class 1:Disgust which remained unpredicted.
The improvement is not amazing, but it is something. My guess is that two big issues are happening in here: 
Size of the training data is small.
The model is too simple to learn such a complicated thing as facial emotion.
Before I move to a more complex CNN I will try these little improvement in on my model trained with the entire data set and see if they deliver a similar improvement.

Results for the model trained with the entire dataset:

Results for the model trained with the entire dataset:
(ignore y_pred column, % is computed dividing right/ones)

So good news! The changes I added on the reduced data set model improved the model trained with the entire dataset in a similar way (about 4%) . Again in this situation all classes had an improvement in their prediction (yesss! Dominos casino opening hours operation. class 1:Disgust is not zero anymore :)) but class 6:Neutral lost about 10%.
Can't Read My Poker FaceThose are mi little improvements for now. I gonna get a well deserved coffee.