TensorFlow

TensorFlow is an open source library originally started by the Google Brain Team designed to assist with machine learning. A tensor is a coordinate independent system of related vectors. Practically it tends to be an array or an array of arrays. Rank (aka order) of a tensor is the number of indices needed to specify a component of the array. The flow part of the name refers to the flow control structure of the system which is not necessarily linear but instead constructs a graph of interconnected nodes which does things in its own sensible order according to the graph (a bit like Make).

Annoyances

There seems to be some overly aggressive warning labels compiled into the version I have. They complain that SSE3 exists on the CPU but that the compiled package fails to take advantage of it. Ok short of recompiling, to turn off that noise, but this in your shell environment.

export TF_CPP_MIN_LOG_LEVEL=2

General Strategy

First a graph must be constructed. Once it is set up, a session is started which operates on the subgraph specified in the run() function.

Here’s a very simple but illustrative example from the official Gettting Started Guide.

Example Showing Linear Regression
import tensorflow as tf
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W * x + b
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(linear_model, {x:[1,2,3,4]}))

y = tf.placeholder(tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))

Types

  • tf.uint8

  • tf.float32

Data

  • tf.constant(3.0,tf.float32) - immutable while running.

  • tf.placeholder(tf.float32) - like a function argument; and feed_dict is like a parameter list that fills it.

  • tf.Variable([-.3],tf.float32) - a trainable parameter. Takes an initial value and a type. Requires init= tf.global_variables_initializer(); sess.run(init)

  • tf.assign - updates tf.Variables()

  • tf.global_variables_initializer() - required to get declared variables loaded into the system.

  • tf.truncated_normal() - generate random numbers from a normal

  • tf.random_normal() - ? Pure normal distribution?

  • tf.zeros() - seems to pad out everything with zeros

Operations

  • tf.add()

  • tf.subtract()

  • tf.multiply()

  • tf.div() Or tf.divide() → integer, float ??

  • tf.matmul(a,b) - Matrix multiplication. (dot product?)

  • tf.log()

  • tf.reduce_sum() aka Σ

  • tf.equal() - ? Like ==?

  • tf.nn.softmax() - Calculates the softmax function which is used to squash a K-dimensional vector Z of arbitrary real values to a K-dimensional vector σ(Z) of real values in the range (0, 1) that add up to 1.

    σ(Z)[n] = exp(Z[n]) / sum( [exp(k) for k in Z )

Optimization

  • Gradient descent is an "optimizer". There are others.

  • Gradient descent modifies each variable according to the magnitude of the derivative of loss (function value) with respect to that variable. Maybe like some kind of D part of PID controllers? Ok, probably not.

  • Optimizers usually do this stuff automatically:

    optimizer= tf.train.GradientDescentOptimizer(0.01)
    train= optimizer.minimize(loss)

Machine Learning

  • tf.nn.relu() - rectified linear units f(x)= max(0,x). Increases non-linearity with a very computationally inexpensive trick. Think of the rectify as in a (half-wave) rectifier in electronics. This is a quick way to add complexity.

  • tf.nn.sigmoid() - y = 1 / (1 + exp(-x)) - approximates a step function.

  • tf.contrib.learn - high-level TensorFlow library that simplifies the mechanics of machine learning.

    • running training loops

    • running evaluation loops

    • managing data sets

    • managing feeding

    • defines many common models

  • tf.nn.conv2d()

  • tf.nn.bias_add()

  • tf.nn.max_pool() - reduces input set (rough image scaling really). Takes a subgrid of values and returns just the maximum value to a new grid that has only the size suggested by the number of sampling subgrids. This introduces non-linearity, reduces computation, and maybe adds some feature location invariance. Average pooling makes more sense, but max pooling has just been found to work better. Go figure.

  • tf.nn.reshape()

  • tf.nn.dropout(hidden_layer,probability_of_keeping) - a regularization technique that randomly removes pathways to create a consensus of an ensemble of weakened networks. Ensures that one dominant path doesn’t get over developed.

    • training use 0.5 probability to start with.

    • testing use 1 probability to maximize potency.

  • Multinomial Logistic Classification System

    • X input leads to →

    • Linear model (WX + b) weights and bias leads to →

    • Logit (Y) leads to →

    • Softmax S(Y), probabilities leads to →

    • Cross Entropy D(S,L) where L is [0,0,1,0,0,0]=one hot encoding leads to →

    • Comparison between OHE and Cross Entropy

  • loss function is to compare the Y generated (say from W*X+b) with the Y you wished it had generated (i.e. the right answer). The loss function measures how far apart the current model is from the provided data (the training set?).

    • Softmax - choose from k mutually exclusive options, actionable probabilities. Reduces influence of extreme values but doesn’t lose them.

    • Sigmoid cross-entropy - with probabilities [0,1]

    • Euclidean - for real valued labels (can be any value).

Helper

  • tf.train.Saver.save(sess,./model.ckpt) - provides a way to save any tf.Variable state to prevent tedious rerunning. model.ckpt.meta contains TensorFlow graph.

  • tf.reset_default_graph() - clears any stale tensors and operations.

  • tf.image.{en,de}code_{gif,png,jpeg,image} - handles images natively.

  • tf.image.resize_images - scales images to new dimensions.

    image = tf.image.decode_jpeg(...)
    resized_image = tf.image.resize_images(image, [299, 299])
  • tf.image - Does other helpful image stuff like this. Details.

    • cropping

    • flipping

    • rotating

    • transposing

    • colorspace conversions

    • bounding boxes

    • denoising

Examples

The Simplest TensorFlow Program
import tensorflow as tf
with tf.Session() as sess:
    print( sess.run( tf.constant('Hello World!') ) )
Usage of tf.nn.conv2d()
import tensorflow as tf

# Output depth
k_output = 64

# Image Properties
image_width = 10
image_height = 10
color_channels = 3

# Convolution filter
filter_size_width = 5
filter_size_height = 5

# Input/Image
input = tf.placeholder(
    tf.float32,
    shape=[None, image_height, image_width, color_channels])

# Weight and bias
weight = tf.Variable(tf.truncated_normal(
    [filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)
A More Complete Example
#!/usr/bin/python3
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

def batches(batch_size, features, labels):
    assert len(features) == len(labels)
    outout_batches= list()
    sample_size= len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i= start_i + batch_size
        batch= [features[start_i:end_i], labels[start_i:end_i]]
        outout_batches.append(batch)
    return outout_batches

def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    """Print cost and validation accuracy of an epoch"""
    current_cost= sess.run( cost,
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy= sess.run( accuracy,
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i, current_cost, valid_accuracy))

n_input= 784  # MNIST data input (img shape: 28*28)
n_classes= 10  # MNIST total classes (0-9 digits)
mnist = input_data.read_data_sets('../../datasets/mnist', one_hot=True) # Import MNIST data

# The features are already scaled and the data is shuffled
train_features=  mnist.train.images
valid_features=  mnist.validation.images
test_features=  mnist.test.images

train_labels=  mnist.train.labels.astype(np.float32)
valid_labels=  mnist.validation.labels.astype(np.float32)
test_labels=  mnist.test.labels.astype(np.float32)

# Features and Labels
features= tf.placeholder(tf.float32, [None, n_input])
labels= tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights= tf.Variable(tf.random_normal([n_input, n_classes]))
bias= tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits= tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
learning_rate= tf.placeholder(tf.float32)
cost= tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer= tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction= tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy= tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init= tf.global_variables_initializer()
batch_size= 128
epochs= 20
learn_rate= 0.01

train_batches= batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)
    for epoch_i in range(epochs): # Training cycle
        for batch_features, batch_labels in train_batches: # Loop over all batches
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)
        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)
    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})
print('Test Accuracy: {}'.format(test_accuracy))

Keras

Keras is a library that helps you bodge together a fancy pants neural network with less fuss than using TensorFlow directly.

Here is the simplest example of using Keras I could find. It predicts diabetes likelihood from a set of measured patient attributes.

#!/home/xedu/miniconda3/envs/carnd-term1/bin/python
# Obtain data here:
# http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data
from keras.models import Sequential
from keras.layers import Dense
import numpy
# == Load Dataset
dataset= numpy.loadtxt("pima-indians-diabetes.csv",delimiter=",")
# == Establish X and Y
X,Y= dataset[:,0:8],dataset[:,8]
# == Create Model
model= Sequential()
model.add(Dense(12,input_dim=8, activation='relu'))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
# == Compile Model
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
# == Fit Model
model.fit(X,Y,nb_epoch=130,batch_size=10) #,validation_split=0.2)
# == Evaluate Model
scores= model.evaluate(X,Y)
print("\nModel Success: %s: %.2f%%" % (model.metrics_names[1],scores[1]*100))

This is an example program from CarND which shows many of the important features of using Keras. It is inspired by something like this.

import pickle
import numpy as np
import tensorflow as tf
tf.python.control_flow_ops = tf
# == Imports for Keras
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Convolution2D
from keras.layers.pooling import MaxPooling2D
from sklearn.preprocessing import LabelBinarizer

# == Data Wrangling
with open('small_train_traffic.p', mode='rb') as f:
    data = pickle.load(f)
X_train, y_train = data['features'], data['labels']
with open('small_test_traffic.p', 'rb') as f:
    data_test = pickle.load(f)
X_test,y_test = data_test['features'], data_test['labels']

# == Build the Final Test Neural Network in Keras
model = Sequential()
model.add(Convolution2D(32, 3, 3, input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(.5))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dense(5))
model.add(Activation('softmax'))

# == Preprocess Data
X_normalized = np.array(X_train / 255.0 - 0.5 )

label_binarizer = LabelBinarizer()
y_one_hot = label_binarizer.fit_transform(y_train)

model.compile('adam', 'categorical_crossentropy', ['accuracy'])
history = model.fit(X_normalized, y_one_hot, nb_epoch=10, validation_split=0.2)

X_normalized_test = np.array(X_test / 255.0 - 0.5 )
y_one_hot_test = label_binarizer.fit_transform(y_test)

print("Testing")
metrics = model.evaluate(X_normalized_test,y_one_hot_test)
for metric_i in range(len(model.metrics_names)):
    metric_name = model.metrics_names[metric_i]
    metric_value = metrics[metric_i]
    print('{}: {}'.format(metric_name, metric_value))

Keras Callbacks

Callbacks allow you to set up things that will happen during the course of the training (perhaps at other times too). A good example is checkpoints. If you want the state of your model to be saved whenever there is a new one or whenever an improved one is discovered, callbacks can help. This can be useful if there is some kind of task that you just need to get through. It doesn’t matter if the validation of epoch 4 says it’s worse than epoch 5. If the values of epoch 4’s weights can solve the challenge problem and epoch 5 can’t, if you have a copy of epoch 4’s model, you’re done.

from keras.callbacks import ModelCheckpoint
savefiles= "weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint= ModelCheckpoint(savefiles,monitor='val_acc',
                            verbose=1,save_best_only=False,mode='max')
callbacks_list= [checkpoint]

Keras Optimizers

from keras import optimizers
  • keras.optimizers.SGD

  • keras.optimizers.RMSprop

  • keras.optimizers.Adagrad

  • keras.optimizers.Adadelta

  • keras.optimizers.Adam

  • keras.optimizers.Adamax

  • keras.optimizers.Nadam

  • keras.optimizers.TFOptimizer - Wrap the tensorflow native optimizers.

Keras Visualization