Saturday, March 18, 2017

Classifying Dogs vs Cats on a Regular Laptop with 2GB GPU and 90% Accuracy

Machine learning ecosystem has evolved a lot during recent years.
I am amazed that I could run a very sophisticated experiment of classifying dogs vs cats with 90% accuracy on my regular laptop laptop.
It has 2GB NVidia GPU card and 8GB RAM.
Just in 2012 the state of art result of the dogs vs cats classification was 80%.

I ran it based on an excellent course provided by (
The competition is organized by Kaggle:

Here's an overview of the approach taken to achieve 90% accuracy.
First, retrieve a publicly available model VGG16, which was prepared by scientists for image recognition competition (for ImageNet). Then remove last layer out of it and replace with Yes / No layer for recognizing cats vs dogs. The remaining layers were set as non trainable. Then run learning process for such model.

The main libraries used here are Keras with Tensorflow backend.

Full code is available on website. Here in an overview of the most important parts.
Training code:

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()

# Import our class, and instantiate
import vgg16; reload(vgg16)
from vgg16 import Vgg16
vgg = Vgg16()

path = "data/dogscats/"
#path = "data/dogscats/sample/"
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size)
vgg.finetune(batches), val_batches, nb_epoch=1)'vgg2.h5')

The code uses vgg.finetune call to update the last layer of the model. Here's how it looks like:

model = self.model
        for layer in model.layers: layer.trainable=False
        model.add(Dense(num, activation='softmax'))

Next, it trains model using call and saves result to vgg2.h5 file. 

I had to put a few tweaks to the model related to device placement for Tensorflow so it could fit in GPU memory. The last few layers were placed on CPU. Here's the code:

      model = self.model = Sequential()
        model.add(Lambda(vgg_preprocess, input_shape=(3,224,224), output_shape=(3,224,224)))

        with tf.device('/gpu:0'):
            self.ConvBlock(2, 64)
            self.ConvBlock(2, 128)
            self.ConvBlock(3, 256)
            self.ConvBlock(3, 512)
            self.ConvBlock(3, 512)

        with tf.device('/cpu:0'):
            model.add(Dense(1000, activation='softmax'))

        fname = 'vgg16.h5'
        model.load_weights(get_file(fname, self.FILE_PATH+fname, cache_subdir='models'))

Here's the result of a learning process:

23000/23000 [==============================] - 2103s - loss: 0.5482 - acc: 0.8676 - val_loss: 0.4194 - val_acc: 0.9060

The training process completed in 35 minutes with 90% accuracy on validation set. 

I'm very positively surprised that such powerful machine learning tools are available these days and are runnable on regular computers. Moreover the approach presented by is very interesting and resembles natural evolution of intelligence by adding new layers.