TensorFlow
TensorFlow is an open source library originally started by the Google Brain Team designed to assist with machine learning. A tensor is a coordinate independent system of related vectors. Practically it tends to be an array or an array of arrays. Rank (aka order) of a tensor is the number of indices needed to specify a component of the array. The flow part of the name refers to the flow control structure of the system which is not necessarily linear but instead constructs a graph of interconnected nodes which does things in its own sensible order according to the graph (a bit like Make).
-
A good introduction to TensorFlow from the project.
-
TensorBoard? Some kind of tool to make plots of the tensor graph?
-
Beginner example. Or try a fancier example.
-
TensorBoard is a suite of visualization tools. It can make plots of the tensor graph if it’s simple enough and plot tons of other useful things.
-
TFRecords - Tensorflow’s preferred data wrangling format.
Installation Masochism
On a stock Debian 10 system I was able to install a tractable Tensorflow like this.
sudo apt install python3-pip
pip3 install tensorflow-gpu
Though not part of Tensorflow per se, these were also required for my project.
sudo apt install python3-numpy python3-scipy python3-pil python3-sklearn
You might have different requirements.
GPU And CUDA
Unfortunately, Debian does not natively handle Nvidia’s CUDA swamp very well. You can use one of Tristan’s methods to overcome this or just to familiarize yourself with the problem. But there is an easier way which is to use Miniconda and have it set up everything one needs.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh # Agree to the init stuff.
cp .bashrc .bashrc-conda # Preserve Conda's preferred init.
vi ~/.bashrc # Fix your real one by deleting Conda stuff at the bottom.
. ~/.bashrc-conda # Go ahead and source the Conda one when you want Conda action.
conda list # Is it working? Cool.
conda update -n base -c defaults c # Update the conda system? Probably can't hurt.
# Create profile.
conda create -n tfgpu # Here "tfgpu" is your profile name. Up to you.
conda activate tfgpu
# Install GPUized Tensorflow itself.
conda install tensorflow-gpu
# Install other common dependencies for normal Tensorflow projects.
conda install cv2
conda install opencv
conda install matplotlib
conda install pillow
# Test tf. Should get something like "Default GPU Device: /device:GPU:0"
python3 -c "import tensorflow; print('Default GPU Device: {}'.format(tensorflow.test.gpu_device_name()))" 2>/dev/null
# Try it out for real.
time python3 train.py --model ResNet99 --dataset ./W200 # Try it out for real.
Annoyances
There seems to be some overly aggressive warning labels compiled into the version I have. They complain that SSE3 exists on the CPU but that the compiled package fails to take advantage of it. Ok short of recompiling, to turn off that noise, put this in your shell environment.
export TF_CPP_MIN_LOG_LEVEL=2
Or from within a Python program.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
Crossing The Starting Line
First make sure you can even think about using TensorFlow. If this works without error, you have reached the starting line.
python -c 'import tensorflow as tf; print(tf.__version__)'
GPU?
if not tf.test.gpu_device_name(): print('GPU not detected! This could take a while.') else: print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
Here is another set of similar checks that might be handy.
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU'))) print(tf.__version__) tf.test.is_gpu_available( cuda_only=True, min_cuda_compute_capability=None )
General Strategy
First a graph must be constructed. Once it is set up, a session is
started which operates on the subgraph specified in the run()
function.
When you build the graph you must specify each Variable. These tell TensorFlow what will be a parameter to be "learned". In other words, a "Variable" is a weight that will be updated as the training proceeds.
Here’s a very simple but illustrative example from the official Gettting Started Guide.
import tensorflow as tf W = tf.Variable([.3], tf.float32) b = tf.Variable([-.3], tf.float32) x = tf.placeholder(tf.float32) linear_model = W * x + b init = tf.global_variables_initializer() sess.run(init) print(sess.run(linear_model, {x:[1,2,3,4]})) y = tf.placeholder(tf.float32) squared_deltas = tf.square(linear_model - y) loss = tf.reduce_sum(squared_deltas) print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
Types
-
tf.uint8
-
tf.float32
Data
-
tf.constant(3.0,tf.float32) - immutable while running.
-
tf.placeholder(tf.float32) - like a function argument; and feed_dict is like a parameter list that fills it.
-
tf.Variable([-.3],tf.float32) - a trainable parameter. Takes an initial value and a type. Requires
init= tf.global_variables_initializer(); sess.run(init)
-
tf.assign - updates tf.Variables()
-
tf.global_variables_initializer() - required to get declared variables loaded into the system.
-
tf.truncated_normal() - generate random numbers from a normal
-
tf.random_normal() - ? Pure normal distribution?
-
tf.zeros() - seems to pad out everything with zeros
Operations
-
tf.add()
-
tf.subtract()
-
tf.multiply()
-
tf.div() Or tf.divide() → integer, float ??
-
tf.matmul(a,b) - Matrix multiplication. (dot product?)
-
tf.log()
-
tf.reduce_sum() aka Σ
-
tf.equal() - ? Like ==?
-
tf.nn.softmax() - Calculates the softmax function which is used to squash a K-dimensional vector Z of arbitrary real values to a K-dimensional vector σ(Z) of real values in the range (0, 1) that add up to 1.
σ(Z)[n] = exp(Z[n]) / sum( [exp(k) for k in Z )
Optimization
-
Gradient descent is an "optimizer". There are others.
-
Gradient descent modifies each variable according to the magnitude of the derivative of loss (function value) with respect to that variable.
-
Optimizers usually do this stuff automatically:
optimizer= tf.train.GradientDescentOptimizer(0.01) train= optimizer.minimize(loss)
Machine Learning
-
tf.nn.relu() - rectified linear units f(x)= max(0,x). Increases non-linearity with a very computationally inexpensive trick. Think of the rectify as in a (half-wave) rectifier in electronics. This is a quick way to add complexity.
-
tf.nn.sigmoid() - y = 1 / (1 + exp(-x)) - smoothly approximates a step function.
-
tf.contrib.learn - high-level TensorFlow library that simplifies the mechanics of machine learning.
-
running training loops
-
running evaluation loops
-
managing data sets
-
managing feeding
-
defines many common models
-
-
tf.nn.conv2d()
-
tf.nn.bias_add()
-
tf.nn.max_pool() - reduces input set (rough image scaling really). Takes a subgrid of values and returns just the maximum value to a new grid that has only the size suggested by the number of sampling subgrids. This introduces non-linearity, reduces computation, and maybe adds some feature location invariance. Average pooling makes more sense, but max pooling has just been found to work better. Go figure.
-
tf.nn.reshape()
-
tf.nn.dropout(hidden_layer,probability_of_keeping) - a regularization technique that randomly removes pathways to create a consensus of an ensemble of weakened networks. Ensures that one dominant path doesn’t get over developed.
-
training use 0.5 probability to start with.
-
testing use 1 probability to maximize potency.
-
-
Multinomial Logistic Classification System
-
X input leads to →
-
Linear model (WX + b) weights and bias leads to →
-
Logit (Y) leads to →
-
Softmax S(Y), probabilities leads to →
-
Cross Entropy D(S,L) where L is [0,0,1,0,0,0]=one hot encoding leads to →
-
Comparison between OHE and Cross Entropy
-
-
loss function is to compare the Y generated (say from W*X+b) with the Y you wished it had generated (i.e. the right answer). The loss function measures how far apart the current model is from the provided data (the training set?).
-
Softmax - choose from k mutually exclusive options, actionable probabilities. Reduces influence of extreme values but doesn’t lose them.
-
Sigmoid cross-entropy - with probabilities [0,1]
-
Euclidean - for real valued labels (can be any value).
-
Helper
-
tf.train.Saver.save(sess,./model.ckpt) - provides a way to save any tf.Variable state to prevent tedious rerunning.
model.ckpt.meta
contains TensorFlow graph. -
tf.reset_default_graph() - clears any stale tensors and operations.
-
tf.image.{en,de}code_{gif,png,jpeg,image} - handles images natively.
-
tf.image.resize_images - scales images to new dimensions.
image = tf.image.decode_jpeg(...) resized_image = tf.image.resize_images(image, [299, 299])
-
tf.image - Does other helpful image stuff like this. Details.
-
cropping
-
flipping
-
rotating
-
transposing
-
colorspace conversions
-
bounding boxes
-
denoising
-
Examples
import tensorflow as tf with tf.Session() as sess: print( sess.run( tf.constant('Hello World!') ) )
import tensorflow as tf # Output depth k_output = 64 # Image Properties image_width = 10 image_height = 10 color_channels = 3 # Convolution filter filter_size_width = 5 filter_size_height = 5 # Input/Image input = tf.placeholder( tf.float32, shape=[None, image_height, image_width, color_channels]) # Weight and bias weight = tf.Variable(tf.truncated_normal( [filter_size_height, filter_size_width, color_channels, k_output])) bias = tf.Variable(tf.zeros(k_output)) # Apply Convolution conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME') # Add bias conv_layer = tf.nn.bias_add(conv_layer, bias) # Apply activation function conv_layer = tf.nn.relu(conv_layer)
#!/usr/bin/python3 from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf import numpy as np def batches(batch_size, features, labels): assert len(features) == len(labels) outout_batches= list() sample_size= len(features) for start_i in range(0, sample_size, batch_size): end_i= start_i + batch_size batch= [features[start_i:end_i], labels[start_i:end_i]] outout_batches.append(batch) return outout_batches def print_epoch_stats(epoch_i, sess, last_features, last_labels): """Print cost and validation accuracy of an epoch""" current_cost= sess.run( cost, feed_dict={features: last_features, labels: last_labels}) valid_accuracy= sess.run( accuracy, feed_dict={features: valid_features, labels: valid_labels}) print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format( epoch_i, current_cost, valid_accuracy)) n_input= 784 # MNIST data input (img shape: 28*28) n_classes= 10 # MNIST total classes (0-9 digits) mnist = input_data.read_data_sets('../../datasets/mnist', one_hot=True) # Import MNIST data # The features are already scaled and the data is shuffled train_features= mnist.train.images valid_features= mnist.validation.images test_features= mnist.test.images train_labels= mnist.train.labels.astype(np.float32) valid_labels= mnist.validation.labels.astype(np.float32) test_labels= mnist.test.labels.astype(np.float32) # Features and Labels features= tf.placeholder(tf.float32, [None, n_input]) labels= tf.placeholder(tf.float32, [None, n_classes]) # Weights & bias weights= tf.Variable(tf.random_normal([n_input, n_classes])) bias= tf.Variable(tf.random_normal([n_classes])) # Logits - xW + b logits= tf.add(tf.matmul(features, weights), bias) # Define loss and optimizer learning_rate= tf.placeholder(tf.float32) cost= tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels)) optimizer= tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost) # Calculate accuracy correct_prediction= tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1)) accuracy= tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) init= tf.global_variables_initializer() batch_size= 128 epochs= 20 learn_rate= 0.01 train_batches= batches(batch_size, train_features, train_labels) with tf.Session() as sess: sess.run(init) for epoch_i in range(epochs): # Training cycle for batch_features, batch_labels in train_batches: # Loop over all batches train_feed_dict = { features: batch_features, labels: batch_labels, learning_rate: learn_rate} sess.run(optimizer, feed_dict=train_feed_dict) # Print cost and validation accuracy of an epoch print_epoch_stats(epoch_i, sess, batch_features, batch_labels) # Calculate accuracy for test dataset test_accuracy = sess.run( accuracy, feed_dict={features: test_features, labels: test_labels}) print('Test Accuracy: {}'.format(test_accuracy))
Keras
Keras is a library that helps you bodge together a fancy pants neural network with less fuss than using TensorFlow directly.
Here is the simplest example of using Keras I could find. It predicts diabetes likelihood from a set of measured patient attributes.
#!/home/xedu/miniconda3/envs/carnd-term1/bin/python # Obtain data here: # http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data from keras.models import Sequential from keras.layers import Dense import numpy # == Load Dataset dataset= numpy.loadtxt("pima-indians-diabetes.csv",delimiter=",") # == Establish X and Y X,Y= dataset[:,0:8],dataset[:,8] # == Create Model model= Sequential() model.add(Dense(12,input_dim=8, activation='relu')) model.add(Dense(8,activation='relu')) model.add(Dense(1,activation='sigmoid')) # == Compile Model model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy']) # == Fit Model model.fit(X,Y,nb_epoch=130,batch_size=10) #,validation_split=0.2) # == Evaluate Model scores= model.evaluate(X,Y) print("\nModel Success: %s: %.2f%%" % (model.metrics_names[1],scores[1]*100))
This is an example program from CarND which shows many of the important features of using Keras. It is inspired by something like this.
import pickle import numpy as np import tensorflow as tf tf.python.control_flow_ops = tf # == Imports for Keras from keras.models import Sequential from keras.layers.core import Dense, Activation, Flatten, Dropout from keras.layers.convolutional import Convolution2D from keras.layers.pooling import MaxPooling2D from sklearn.preprocessing import LabelBinarizer # == Data Wrangling with open('small_train_traffic.p', mode='rb') as f: data = pickle.load(f) X_train, y_train = data['features'], data['labels'] with open('small_test_traffic.p', 'rb') as f: data_test = pickle.load(f) X_test,y_test = data_test['features'], data_test['labels'] # == Build the Final Test Neural Network in Keras model = Sequential() model.add(Convolution2D(32, 3, 3, input_shape=(32, 32, 3))) model.add(MaxPooling2D((2, 2))) model.add(Dropout(.5)) model.add(Activation('relu')) model.add(Flatten()) model.add(Dense(128)) model.add(Activation('relu')) model.add(Dense(5)) model.add(Activation('softmax')) # == Preprocess Data X_normalized = np.array(X_train / 255.0 - 0.5 ) label_binarizer = LabelBinarizer() y_one_hot = label_binarizer.fit_transform(y_train) model.compile('adam', 'categorical_crossentropy', ['accuracy']) history = model.fit(X_normalized, y_one_hot, nb_epoch=10, validation_split=0.2) X_normalized_test = np.array(X_test / 255.0 - 0.5 ) y_one_hot_test = label_binarizer.fit_transform(y_test) print("Testing") metrics = model.evaluate(X_normalized_test,y_one_hot_test) for metric_i in range(len(model.metrics_names)): metric_name = model.metrics_names[metric_i] metric_value = metrics[metric_i] print('{}: {}'.format(metric_name, metric_value))
Keras Callbacks
Callbacks allow you to set up things that will happen during the course of the training (perhaps at other times too). A good example is checkpoints. If you want the state of your model to be saved whenever there is a new one or whenever an improved one is discovered, callbacks can help. This can be useful if there is some kind of task that you just need to get through. It doesn’t matter if the validation of epoch 4 says it’s worse than epoch 5. If the values of epoch 4’s weights can solve the challenge problem and epoch 5 can’t, if you have a copy of epoch 4’s model, you’re done.
from keras.callbacks import ModelCheckpoint
savefiles= "weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint= ModelCheckpoint(savefiles,monitor='val_acc',
verbose=1,save_best_only=False,mode='max')
callbacks_list= [checkpoint]
Keras Optimizers
An excellent overview of gradient descent algorithms.
from keras import optimizers
-
keras.optimizers.SGD
-
keras.optimizers.RMSprop
-
keras.optimizers.Adagrad
-
keras.optimizers.Adadelta
-
keras.optimizers.Adam
-
keras.optimizers.Adamax
-
keras.optimizers.Nadam
-
keras.optimizers.TFOptimizer - Wrap the tensorflow native optimizers.