"What we observe is not nature itself, but nature exposed to our method of questioning." -- Werner Heisenberg
We have built enough classifier to discriminate 2-dimensional points. Let's try to discriminate higher dimensional points. More specifically, let's try to discriminate 28x28 gray-scale images:
 
These are 28x28 gray-scale images each represents a handwritten digit. It is called The MMNIST Dataset. Here is the Wikipedia entry. The original dataset contains 60,000 train examples, 10,000 test examples. It is fair to say this is one of the most famous Machine Learning dataset of all time.
We can build a simple Deep Neural Network to discriminate these images, however, we are going to learn a new technique, called Convolutional Neural Networks (CNN) in this tutorial. CNNs are being used very commonly in the literature for a variety of problems, and they are initially being used for images with a big success.
Let's do some imports first, to get ready.
import collections
import numpy as np
import matplotlib.pyplot as plt
import PIL
import tensorflow as tf
Historically, in image processing, people have used various filters for various tasks. For example, some of the known filters are:
Here is the 3x3 Sobel operator:
$$ S_x = \begin{bmatrix} -1 & 0 & +1 \\ -2 & 0 & +2 \\ -1 & 0 & +1 \\ \end{bmatrix}, \quad S_y = \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ +1 & +2 & +1 \\ \end{bmatrix} $$and, for the sake of completeness, we can combine both using:
$$ \sqrt{ \text{Conv}(S_x, A) ^2 + \text{Conv}(S_y, A)^2 } $$Let's play with the Sobel operator on the following image:
 
im = PIL.Image.open('static_images/valve_original.png')
pix = im.load()
num_array = np.zeros((im.size[0], im.size[1], 3))
for i in range(im.size[0]):
  for j in range(im.size[1]):
    average = sum(pix[i,j])/3
    num_array[i,j] = (average, average, average)
kernel1 = np.array([[-1, -2, -1],
	    [0,  0, 0],
	    [1, 2, 1]])
kernel2 = np.array([[-1, 0, 1],
	    [-2, 0, 2],
	    [-1, 0, 1]])
for i in range(1, im.size[0]-1):
  for j in range(1, im.size[1]-1):
    value1 = sum(sum(num_array[i-1:i+2, j-1:j+2, 0] * kernel1))
    value2 = sum(sum(num_array[i-1:i+2, j-1:j+2, 0] * kernel2))
    threshold = 60
    value = max(int((value1 ** 2 + value2 ** 2) ** 0.5), threshold)
    if value == threshold:
      value = 0
    pix[i,j] = (value, value, value)
im.save('image.png')
 
  Let's discuss the convolution operation in 2-dimensions. It is basically moving a kernel on an the image and recording values on the corresponding pixels.
Convolutional animations are taken from:
Blue maps are inputs, and cyan maps are outputs.
|  |  |  |  | 
| No padding, no strides | Arbitrary padding, no strides | Half padding, no strides | Full padding, no strides | 
|  |  |  | |
| No padding, strides | Padding, strides | Padding, strides (odd) | 
Some terminology:
Let's go over more examples for (1) Convolution operation and (2) Max pooling over this technical document.
Some simple ideas why CNNs work:
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
print('x_train.shape: %s' % str(x_train.shape))
print('y_train.shape: %s' % str(y_train.shape))
print('x_test.shape: %s' % str(x_test.shape))
print('y_test.shape: %s' % str(y_test.shape))
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy'])
model.fit(x_train, y_train, epochs=1, verbose=0)
print(model.metrics_names)
print(model.evaluate(x_train, y_train, verbose=2))
print(model.evaluate(x_test, y_test, verbose=2))
x_train.shape: (60000, 28, 28) y_train.shape: (60000,) x_test.shape: (10000, 28, 28) y_test.shape: (10000,) ['loss', 'acc'] [0.09440224265828728, 0.9731666666666666] [0.10339117852300406, 0.9693]
Original Le-net 5 architecture is here (taken from the paper):
 
We are not going to build the exact same network with Le-net5, however it is inspired by Le-net5 heavily.
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
img_rows, img_cols = 28, 28
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
model = tf.keras.models.Sequential([
     # 28x28x1
     tf.keras.layers.Convolution2D(filters=6, kernel_size=(5,5), activation = 'relu', input_shape=input_shape),
     # 24x24x6
     tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
     # 12x12x6
     tf.keras.layers.Convolution2D(filters=16, kernel_size=(5,5), activation = 'relu'),
     # 8x8x16
     tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
     # 4x4x16
     tf.keras.layers.Flatten(),
     tf.keras.layers.Dense(128, activation = 'relu'),
     tf.keras.layers.Dropout(0.2),
     tf.keras.layers.Dense(84, activation = 'relu'),
     tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=1, verbose=0)
print(model.metrics_names)
print(model.evaluate(x_train, y_train, verbose=2))
print(model.evaluate(x_test, y_test, verbose=2))
['loss', 'acc'] [0.06508412354442601, 0.9805833333333334] [0.0623367296718061, 0.98]
Here I want to link couple interesting papers. Not all of these papers are necessarily exclusively about CNNs.
The first couple papers I want to link is about backpropagation. Apparently, the history of backpropagation goes really back, as early as 1840s. However, I find these couple papers interesting in the context of Neural Networks:
Couple papers on CNNs:
Fairly new review paper on Deep Learning:
Kaggle competition on MNIST digits:
Some last notes:
Here are some famouse network architectures, would be useful to take a look: