Saturday, March 18, 2017

Classifying Dogs vs Cats on a Regular Laptop with 2GB GPU and 90% Accuracy


Machine learning ecosystem has evolved a lot during recent years.
I am amazed that I could run a very sophisticated experiment of classifying dogs vs cats with 90% accuracy on my regular laptop laptop.
It has 2GB NVidia GPU card and 8GB RAM.
Just in 2012 the state of art result of the dogs vs cats classification was 80%.

I ran it based on an excellent course provided by fast.ai (http://course.fast.ai/).
The competition is organized by Kaggle:
https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition

Here's an overview of the approach taken to achieve 90% accuracy.
First, retrieve a publicly available model VGG16, which was prepared by scientists for image recognition competition (for ImageNet). Then remove last layer out of it and replace with Yes / No layer for recognizing cats vs dogs. The remaining layers were set as non trainable. Then run learning process for such model.

The main libraries used here are Keras with Tensorflow backend.

Full code is available on fast.ai website. Here in an overview of the most important parts.
Training code:

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.allow_soft_placement=True
config.log_device_placement=True
set_session(tf.Session(config=config))

# Import our class, and instantiate
import vgg16; reload(vgg16)
from vgg16 import Vgg16
vgg = Vgg16()

batch_size=16
path = "data/dogscats/"
#path = "data/dogscats/sample/"
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)
vgg.model.save('vgg2.h5')

The code uses vgg.finetune call to update the last layer of the model. Here's how it looks like:

model = self.model
        model.pop()
        for layer in model.layers: layer.trainable=False
        model.add(Dense(num, activation='softmax'))

Next, it trains model using vgg.fit call and saves result to vgg2.h5 file. 

I had to put a few tweaks to the model related to device placement for Tensorflow so it could fit in GPU memory. The last few layers were placed on CPU. Here's the code:

      model = self.model = Sequential()
        model.add(Lambda(vgg_preprocess, input_shape=(3,224,224), output_shape=(3,224,224)))

        with tf.device('/gpu:0'):
            self.ConvBlock(2, 64)
            self.ConvBlock(2, 128)
            self.ConvBlock(3, 256)
            self.ConvBlock(3, 512)
            self.ConvBlock(3, 512)

        with tf.device('/cpu:0'):
            model.add(Flatten())
            self.FCBlock()
            self.FCBlock()
            model.add(Dense(1000, activation='softmax'))

        fname = 'vgg16.h5'
        model.load_weights(get_file(fname, self.FILE_PATH+fname, cache_subdir='models'))


Here's the result of a learning process:

23000/23000 [==============================] - 2103s - loss: 0.5482 - acc: 0.8676 - val_loss: 0.4194 - val_acc: 0.9060

The training process completed in 35 minutes with 90% accuracy on validation set. 

I'm very positively surprised that such powerful machine learning tools are available these days and are runnable on regular computers. Moreover the approach presented by fast.ai is very interesting and resembles natural evolution of intelligence by adding new layers.