Pytorch VS Tensorflow | Liangliang Zheng

Right now Pytorch and Tensorflow are the extremely popular AI frameworks , but AI researchers may find it a little bit tangled when it comes to the question that which framework to use. So rather than choose one of them to learn, why not use both of them since they will come in handy later on. So I’m going to introduce both of them from the perspective of vanilla structure and API.

Pytorch

A PyTorch Tensor is conceptionally similar to a numpy array: it is an n-dimensional grid of numbers, and like numpy PyTorch provides many functions to efficiently operate on Tensors.

all of the packages we import in this blog for pytorch part:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np

Image data is typically stored in a Tensor shape x = N * C * H * W

N is the number of datapoints
C is the number of channels
H is the height of the intermediate feature map in pixels
W is the height of the intermediate feature map in pixels

When we process the fully connected layer, we need to flatten the C * H *W values into a single vector per image

1 def flatten(x): 2 N = x.shape[0] # read in N, C, H, W 3 return x.view(N, -1) # “flatten” the C * H * W values into a single vector per image

Three-layer network Implement a vanilla structure of three-layer netwok, and the architecture will be as follows:

A convolutional layer (with bias) with channel_1 filters, each with shape KW1 x KH1, and zero-padding of two
ReLU nonlinearity
A convolutional layer (with bias) with channel_2 filters, each with shape KW2 x KH2, and zero-padding of one
ReLU nonlinearity
Fully-connected layer with bias, producing scores for C classes.

Nomally, the function contains 2 parameters, which are input x and params, and the params are specified based on how many layers and what type of architecture you’re using.

Notice that this architecture includes 2 convolutional layer, we need the conv2d function from torch.nn.functional.conv2d<p align="center"> screenshot </p>

And the core functions are conv2d,relu and mm

def three_layer_convnet(x, params):
  """
  Performs the forward pass of a three-layer convolutional network with the
  architecture defined above.

  Inputs:
  - x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
  - params: A list of PyTorch Tensors giving the weights and biases for the
    network; should contain the following:
     - conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
       for the first convolutional layer
     - conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
       convolutional layer
     - conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
       weights for the second convolutional layer
     - conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
       convolutional layer
     - fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
       figure out what the shape should be?
     - fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
       figure out what the shape should be?
   
   Returns:
   - scores: PyTorch Tensor of shape (N, C) giving classification scores for x
   """
   conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
   scores = None
   ################################################################################
   # TODO: Implement the forward pass for the three-layer ConvNet.                #
   ################################################################################
   conv1 = F.conv2d(x, weight=conv_w1, bias=conv_b1, padding=2)
   relu1 = F.relu(conv1)
   conv2 = F.conv2d(relu1, weight=conv_w2, bias=conv_b2, padding=1)
   relu2 = F.relu(conv2)
   relu2_flat = flatten(relu2)
   scores = relu2_flat.mm(fc_w) + fc_b
   #pass
   ################################################################################
   #                                 END OF YOUR CODE                             #
   ################################################################################
   return scores

Pytorch Initialization :

random_weight(shape) initializes a weight tensor with the Kaiming normalization method.(normally do it with weights)

zero_weight(shape) initializes a weight tensor with all zeros. Useful for instantiating bias parameters.(normally do it with biases)

def random_weight(shape):
  """
  Create random Tensors for weights; setting requires_grad=True means that we
  want to compute gradients for these Tensors during the backward pass.
  We use Kaiming normalization: sqrt(2 / fan_in)
  """
  if len(shape) == 2:  # FC weight
      fan_in = shape[0]
  else:
       fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
   # randn is standard normal distribution generator. 
   w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
   w.requires_grad = True
   return w

def zero_weight(shape):
   return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)

# create a weight of shape [3 x 5]
# you should see the type `torch.cuda.FloatTensor` if you use GPU. 
# Otherwise it should be `torch.FloatTensor`
random_weight((3, 5))

PyTorch: Check Accuracy

When checking accuracy we don’t need to compute any gradients; as a result we don’t need PyTorch to build a computational graph for us when we compute scores. To prevent a graph from being built we scope our computation under a torch.no_grad() context manager.

def check_accuracy_part2(loader, model_fn, params):
  """
  Check the accuracy of a classification model.
  
  Inputs:
  - loader: A DataLoader for the data split we want to check
  - model_fn: A function that performs the forward pass of the model,
    with the signature scores = model_fn(x, params)
  - params: List of PyTorch Tensors giving parameters of the model
   
   Returns: Nothing, but prints the accuracy of the model
   """
   split = 'val' if loader.dataset.train else 'test'
   print('Checking accuracy on the %s set' % split)
   num_correct, num_samples = 0, 0
   with torch.no_grad():
       for x, y in loader:
           x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
           y = y.to(device=device, dtype=torch.int64)
           scores = model_fn(x, params)
           _, preds = scores.max(1)
           num_correct += (preds == y).sum()
           num_samples += preds.size(0)
       acc = float(num_correct) / num_samples
       print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))

PyTorch: Training Loop

The final step is to train the model , firstly move the data to proper device and then compute the loss , then using SGD to compute the gradients. then call the check accuracy function to print out the accuracy

def train_part2(model_fn, params, learning_rate):
  """
  Train a model on CIFAR-10.
  
  Inputs:
  - model_fn: A Python function that performs the forward pass of the model.
    It should have the signature scores = model_fn(x, params) where x is a
    PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
    model weights, and scores is a PyTorch Tensor of shape (N, C) giving
     scores for the elements in x.
   - params: List of PyTorch Tensors giving weights for the model
   - learning_rate: Python scalar giving the learning rate to use for SGD
   
   Returns: Nothing
   """
   for t, (x, y) in enumerate(loader_train):
       # Move the data to the proper device (GPU or CPU)
       x = x.to(device=device, dtype=dtype)
       y = y.to(device=device, dtype=torch.long)

       # Forward pass: compute scores and loss
       scores = model_fn(x, params)
       loss = F.cross_entropy(scores, y)

       # Backward pass: PyTorch figures out which Tensors in the computational
       # graph has requires_grad=True and uses backpropagation to compute the
       # gradient of the loss with respect to these Tensors, and stores the
       # gradients in the .grad attribute of each Tensor.
       loss.backward()

       # Update parameters. We don't want to backpropagate through the
       # parameter updates, so we scope the updates under a torch.no_grad()
       # context manager to prevent a computational graph from being built.
       with torch.no_grad():
           for w in params:
               w -= learning_rate * w.grad

               # Manually zero the gradients after running the backward pass
               w.grad.zero_()

       if t % print_every == 0:
           print('Iteration %d, loss = %.4f' % (t, loss.item()))
           check_accuracy_part2(loader_val, model_fn, params)
           print()

To sum up, the whole process will be 1. Initialize hidden layer size and learning rate,weights 2. Passing data and params( in train function) to three_layer_convnet 3. After computing the scores,then calculate the cross entropy loss and start backward part and upgrating weights(SGD) 4. finally print out the accuracy

Module API: 2-layer network:

Barebone PyTorch requires that we track all the parameter tensors by hand. This is fine for small networks with a few tensors, but it would be extremely inconvenient and error-prone to track tens or hundreds of tensors in larger networks.

To use the Module API, follow the steps below:

Subclass nn.Module. Give your network class an intuitive name like TwoLayerFC.
In the constructor __init__(), define all the layers you need as class attributes. Layer objects like nn.Linear and nn.Conv2d are themselves nn.Module subclasses and contain learnable parameters, so that you don’t have to instantiate the raw tensors yourself. nn.Module will track these internal parameters for you. Refer to the doc to learn more about the dozens of builtin layers. Warning: don’t forget to call the super().__init__() first!
In the forward() method, define the connectivity of your network. You should use the attributes defined in __init__ as function calls that take tensor as input and output the “transformed” tensor. Do not create any new layers with learnable parameters in forward()! All of them must be declared upfront in __init__.

Example for following architecture:

Convolutional layer with channel_1 5x5 filters with zero-padding of 2
ReLU
Convolutional layer with channel_2 3x3 filters with zero-padding of 1
ReLU
Fully-connected layer to num_classes classes

and all of the functions are from nn.Module, in the init funcution , we setup the layers information, and there are kaiming_normal and constant initilization function in the nn.Module

class ThreeLayerConvNet(nn.Module):
  def __init__(self, in_channel, channel_1, channel_2, num_classes):
      super().__init__()
      ########################################################################
      # TODO: Set up the layers you need for a three-layer ConvNet with the  #
      # architecture defined above.                                          #
      ########################################################################
      self.conv1 = nn.Conv2d(in_channel,channel_1,kernel_size = 5,padding =2,bias=True)
      nn.init.kaiming_normal_(self.conv1.weight)
       nn.init.constant_(self.conv1.bias,0)
       
       self.conv2 = nn.Conv2d(channel_1,channel_2,kernel_size = 3,padding = 1,bias = True)
       nn.init.kaiming_normal_(self.conv1.weight)
       nn.init.constant_(self.conv1.bias,0)
       
       self.fc = nn.Linear(channel_2*32*32,num_classes)
       nn.init.kaiming_normal_(self.fc.weight)
       nn.init.constant_(self.fc.bias, 0)
       
       #pass
       ########################################################################
       #                          END OF YOUR CODE                            # 
       ########################################################################

   def forward(self, x):
       scores = None
       ########################################################################
       # TODO: Implement the forward function for a 3-layer ConvNet. you      #
       # should use the layers you defined in __init__ and specify the        #
       # connectivity of those layers in forward()                            #
       ########################################################################
       relu1 = F.relu(self.conv1(x))
       relu2 = F.relu(self.conv2(relu1))
       scores = self.fc(flatten(relu2))
       #pass
       ########################################################################
       #                             END OF YOUR CODE                         #
       ########################################################################
       return scores

Module API: Check Accuracy This version is slightly different from the one in part II. You don’t manually pass in the parameters anymore.

def check_accuracy_part34(loader, model):
  if loader.dataset.train:
      print('Checking accuracy on validation set')
  else:
      print('Checking accuracy on test set')   
  num_correct = 0
  num_samples = 0
  model.eval()  # set model to evaluation mode
  with torch.no_grad():
       for x, y in loader:
           x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
           y = y.to(device=device, dtype=torch.long)
           scores = model(x)
           _, preds = scores.max(1)
           num_correct += (preds == y).sum()
           num_samples += preds.size(0)
       acc = float(num_correct) / num_samples
       print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

Module API : Training Loop

We also use a slightly different training loop. Rather than updating the values of the weights ourselves, we use an Optimizer object from the torch.optim package, which abstract the notion of an optimization algorithm and provides implementations of most of the algorithms commonly used to optimize neural networks.

def train_part34(model, optimizer, epochs=1):
  """
  Train a model on CIFAR-10 using the PyTorch Module API.
  
  Inputs:
  - model: A PyTorch Module giving the model to train.
  - optimizer: An Optimizer object we will use to train the model
  - epochs: (Optional) A Python integer giving the number of epochs to train for
  
   Returns: Nothing, but prints model accuracies during training.
   """
   model = model.to(device=device)  # move the model parameters to CPU/GPU
   for e in range(epochs):
       for t, (x, y) in enumerate(loader_train):
           model.train()  # put model to training mode
           x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
           y = y.to(device=device, dtype=torch.long)

           scores = model(x)
           loss = F.cross_entropy(scores, y)

           # Zero out all of the gradients for the variables which the optimizer
           # will update.
           optimizer.zero_grad()

           # This is the backwards pass: compute the gradient of the loss with
           # respect to each  parameter of the model.
           loss.backward()

           # Actually update the parameters of the model using the gradients
           # computed by the backwards pass.
           optimizer.step()

           if t % print_every == 0:
               print('Iteration %d, loss = %.4f' % (t, loss.item()))
               check_accuracy_part34(loader_val, model)
               print()

Sum up the Module API: 1.initialize learning rate and chennel_1,passing then through model and initilize weights and declare the architecture 2. passing values to optim.SGD, 3. training them

Pytorch Sequential API

Part III introduced the PyTorch Module API, which allows you to define arbitrary learnable layers and their connectivity.

For simple models like a stack of feed forward layers, you still need to go through 3 steps: subclass nn.Module, assign layers to class attributes in __init__, and call each layer one by one in forward(). Is there a more convenient way?

Fortunately, PyTorch provides a container Module called nn.Sequential, which merges the above steps into one. It is not as flexible as nn.Module, because you cannot specify more complex topology than a feed-forward stack, but it’s good enough for many use cases.

Three Layers: Using Sequential API

Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
ReLU
Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
ReLU
Fully-connected layer (with bias) to compute scores for 10 classes

channel_1 = 32
channel_2 = 16
learning_rate = 1e-2

model = None
optimizer = None

################################################################################
# TODO: Rewrite the 2-layer ConvNet with bias from Part III with the           #
# Sequential API.                                                              #
################################################################################
#pass
model = nn.Sequential(
   nn.Conv2d(3,channel_1,kernel_size=5,padding=2),
   nn.ReLU(),
   nn.Conv2d(channel_1,channel_2,kernel_size=3,padding=1),
   nn.ReLU(),
   Flatten(),
   nn.Linear(channel_2*32*32,10)
)

optimizer = optim.SGD(model.parameters(),lr=learning_rate,
                    momentum=0.9,nesterov=True)
################################################################################
#                                 END OF YOUR CODE 
################################################################################

train_part34(model, optimizer)

Using training_part34 and Sequential API, it’s super easy to set to the layers and transported the data to be trained: Finally the accuracy result will be:

Iteration 0, loss = 2.2939
Checking accuracy on validation set
Got 140 / 1000 correct (14.00)

Iteration 100, loss = 1.4576
Checking accuracy on validation set
Got 471 / 1000 correct (47.10)

Iteration 200, loss = 1.3825
Checking accuracy on validation set
Got 466 / 1000 correct (46.60)

Iteration 300, loss = 1.5948
Checking accuracy on validation set
Got 524 / 1000 correct (52.40)

Iteration 400, loss = 1.2816
Checking accuracy on validation set
Got 513 / 1000 correct (51.30)

Iteration 500, loss = 1.3663
Checking accuracy on validation set
Got 530 / 1000 correct (53.00)

Iteration 600, loss = 1.1300
Checking accuracy on validation set
Got 545 / 1000 correct (54.50)

Iteration 700, loss = 1.2276
Checking accuracy on validation set
Got 542 / 1000 correct (54.20)

* * *

Tensorflow

In this Tensorflow introduction, we gonna do the same structure as we do in the introduction of Pytorch

screenshot

All of the packages we imported:

import os
import tensorflow as tf
import numpy as np
import math
import timeit
import matplotlib.pyplot as plt

%matplotlib inline

Barebone Tensorflow:

We can see this in action by defining a simple flatten function that will reshape image data for use in a fully-connected network.

In TensorFlow, data for convolutional feature maps is typically stored in a Tensor of shape N x H x W x C where:

N is the number of datapoints (minibatch size)
H is the height of the feature map
W is the width of the feature map
C is the number of channels in the feature map

Notice that this is a little different from pytorch.

Three_layer_convnet

def three_layer_convnet(x, params):
  """
  A three-layer convolutional network with the architecture described above.
  
  Inputs:
  - x: A TensorFlow Tensor of shape (N, H, W, 3) giving a minibatch of images
  - params: A list of TensorFlow Tensors giving the weights and biases for the
    network; should contain the following:
    - conv_w1: TensorFlow Tensor of shape (KH1, KW1, 3, channel_1) giving
       weights for the first convolutional layer.
     - conv_b1: TensorFlow Tensor of shape (channel_1,) giving biases for the
       first convolutional layer.
     - conv_w2: TensorFlow Tensor of shape (KH2, KW2, channel_1, channel_2)
       giving weights for the second convolutional layer
     - conv_b2: TensorFlow Tensor of shape (channel_2,) giving biases for the
       second convolutional layer.
     - fc_w: TensorFlow Tensor giving weights for the fully-connected layer.
       Can you figure out what the shape should be?
     - fc_b: TensorFlow Tensor giving biases for the fully-connected layer.
       Can you figure out what the shape should be?
   """
   conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
   scores = None
   ############################################################################
   # TODO: Implement the forward pass for the three-layer ConvNet.            #
   ############################################################################
   x_padded = tf.pad(x,[[0,0],[2,2],[2,2],[0,0]],'CONSTANT')
   conv1 = tf.nn.conv2d(x_padded,conv_w1,[1,1,1,1],padding='VALID')+conv_b1
   relu1 = tf.nn.relu(conv1)
   x_padded_1 = tf.pad(relu1,[[0,0],[1,1],[1,1],[0,0]],'CONSTANT')
   conv2 = tf.nn.conv2d(x_padded_1,conv_w2,[1,1,1,1],padding='VALID')+conv_b2
   relu2 = tf.nn.relu(conv2)
   fc_x = flatten(relu2)
   h = tf.matmul(fc_x, fc_w) + fc_b
   scores = h
   #pass
   ############################################################################
   #                              END OF YOUR CODE                            #
   ############################################################################
   return scores

All of the functions are from tf.nn . From the above code you may find it very similar to pytorch, but we need to declear the padded form in tf.pad then pass them in tf.nn.conv2d function, and the stride parameter would be like [1,1,1,1]

Training step:

Compute the loss
Compute the gradient of the loss with respect to all network weights
Make a weight update step using (stochastic) gradient descent.

Note that the step of updating the weights is itself an operation in the computational graph - the calls to tf.assign_sub in training_step return TensorFlow operations that mutate the weights when they are executed. There is an important bit of subtlety here - when we call sess.run, TensorFlow does not execute all operations in the computational graph; it only executes the minimal subset of the graph necessary to compute the outputs that we ask TensorFlow to produce. As a result, naively computing the loss would not cause the weight update operations to execute, since the operations needed to compute the loss do not depend on the output of the weight update. To fix this problem, we insert a control dependency into the graph, adding a duplicate loss node to the graph that does depend on the outputs of the weight update operations; this is the object that we actually return from the training_step function. As a result, asking TensorFlow to evaluate the value of the lossreturned from training_step will also implicitly update the weights of the network using that minibatch of data.

def training_step(scores, y, params, learning_rate):
  """
  Set up the part of the computational graph which makes a training step.

  Inputs:
  - scores: TensorFlow Tensor of shape (N, C) giving classification scores for
    the model.
  - y: TensorFlow Tensor of shape (N,) giving ground-truth labels for scores;
    y[i] == c means that c is the correct class for scores[i].
   - params: List of TensorFlow Tensors giving the weights of the model
   - learning_rate: Python scalar giving the learning rate to use for gradient
     descent step.
     
   Returns:
   - loss: A TensorFlow Tensor of shape () (scalar) giving the loss for this
     batch of data; evaluating the loss also performs a gradient descent step
     on params (see above).
   """
   # First compute the loss; the first line gives losses for each example in
   # the minibatch, and the second averages the losses acros the batch
   losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=scores)
   loss = tf.reduce_mean(losses)

   # Compute the gradient of the loss with respect to each parameter of the the
   # network. This is a very magical function call: TensorFlow internally
   # traverses the computational graph starting at loss backward to each element
   # of params, and uses backpropagation to figure out how to compute gradients;
   # it then adds new operations to the computational graph which compute the
   # requested gradients, and returns a list of TensorFlow Tensors that will
   # contain the requested gradients when evaluated.
   grad_params = tf.gradients(loss, params)
   
   # Make a gradient descent step on all of the model parameters.
   new_weights = []   
   for w, grad_w in zip(params, grad_params):
       new_w = tf.assign_sub(w, learning_rate * grad_w)
       new_weights.append(new_w)

   # Insert a control dependency so that evaluting the loss causes a weight
   # update to happen; see the discussion above.
   with tf.control_dependencies(new_weights):
       return tf.identity(loss)

you need to be familiar with the function tf.nn.sparse_softmax_cross_entropy_with_logits Tensorflow : Trainning Loop

def train_part2(model_fn, init_fn, learning_rate):
  """
  Train a model on CIFAR-10.
  
  Inputs:
  - model_fn: A Python function that performs the forward pass of the model
    using TensorFlow; it should have the following signature:
    scores = model_fn(x, params) where x is a TensorFlow Tensor giving a
    minibatch of image data, params is a list of TensorFlow Tensors holding
     the model weights, and scores is a TensorFlow Tensor of shape (N, C)
     giving scores for all elements of x.
   - init_fn: A Python function that initializes the parameters of the model.
     It should have the signature params = init_fn() where params is a list
     of TensorFlow Tensors holding the (randomly initialized) weights of the
     model.
   - learning_rate: Python float giving the learning rate to use for SGD.
   """
   # First clear the default graph
   tf.reset_default_graph()
   is_training = tf.placeholder(tf.bool, name='is_training')
   # Set up the computational graph for performing forward and backward passes,
   # and weight updates.
   with tf.device(device):
       # Set up placeholders for the data and labels
       x = tf.placeholder(tf.float32, [None, 32, 32, 3])
       y = tf.placeholder(tf.int32, [None])
       params = init_fn()           # Initialize the model parameters
       scores = model_fn(x, params) # Forward pass of the model
       loss = training_step(scores, y, params, learning_rate)

   # Now we actually run the graph many times using the training data
   with tf.Session() as sess:
       # Initialize variables that will live in the graph
       sess.run(tf.global_variables_initializer())
       for t, (x_np, y_np) in enumerate(train_dset):
           # Run the graph on a batch of training data; recall that asking
           # TensorFlow to evaluate loss will cause an SGD step to happen.
           feed_dict = {x: x_np, y: y_np}
           loss_np = sess.run(loss, feed_dict=feed_dict)
           
           # Periodically print the loss and check accuracy on the val set
           if t % print_every == 0:
               print('Iteration %d, loss = %.4f' % (t, loss_np))
               check_accuracy(sess, val_dset, x, scores, is_training)

Barebones TensorFlow: Check Accuracy

def check_accuracy(sess, dset, x, scores, is_training=None):
  """
  Check accuracy on a classification model.
  
  Inputs:
  - sess: A TensorFlow Session that will be used to run the graph
  - dset: A Dataset object on which to check accuracy
  - x: A TensorFlow placeholder Tensor where input images should be fed
  - scores: A TensorFlow Tensor representing the scores output from the
     model; this is the Tensor we will ask TensorFlow to evaluate.
     
   Returns: Nothing, but prints the accuracy of the model
   """
   num_correct, num_samples = 0, 0
   for x_batch, y_batch in dset:
       feed_dict = {x: x_batch, is_training: 0}
       scores_np = sess.run(scores, feed_dict=feed_dict)
       y_pred = scores_np.argmax(axis=1)
       num_samples += x_batch.shape[0]
       num_correct += (y_pred == y_batch).sum()
   acc = float(num_correct) / num_samples
   print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))

I will omit the initilization part because it’s similar to the pytorch part. To sum up, the process of training the passing values are the same as they do in the pytorch, but in the tensorflow we need to use placeholder and sess.run to make it work,and tbh, tensorflow it’s a little bit difficult to get started at the very begining comparing to pytorch.

Keras Model API

Implementing a neural network using the low-level TensorFlow API is a good way to understand how TensorFlow works, but it’s a little inconvenient - we had to manually keep track of all Tensors holding learnable parameters, and we had to use a control dependency to implement the gradient descent update step. This was fine for a small network, but could quickly become unweildy for a large complex model.

Fortunately TensorFlow provides higher-level packages such as tf.keras and tf.layers which make it easy to build models out of modular, object-oriented layers; tf.train allows you to easily train these models using a variety of different optimization algorithms.

Keras Model API: Three-Layer ConvNet

Convolutional layer with 5 x 5 kernels, with zero-padding of 2
ReLU nonlinearity
Convolutional layer with 3 x 3 kernels, with zero-padding of 1
ReLU nonlinearity
Fully-connected layer to give class scores

class ThreeLayerConvNet(tf.keras.Model):
  def __init__(self, channel_1, channel_2, num_classes):
      super().__init__()
      ########################################################################
      # TODO: Implement the __init__ method for a three-layer ConvNet. You   #
      # should instantiate layer objects to be used in the forward pass.     #
      ########################################################################
      initializer = tf.variance_scaling_initializer(scale=2.0)
      self.conv1 = tf.layers.Conv2D(channel_1,[5,5],strides=1, 
                               padding="valid", activation=tf.nn.relu,
                               kernel_initializer = initializer)
       self.conv2 = tf.layers.Conv2D(channel_2,[3,3],strides=1, 
                               padding="valid", activation=tf.nn.relu,
                               kernel_initializer = initializer)
       self.fc1 = tf.layers.Dense(num_classes,kernel_initializer=initializer)
       #pass
       ########################################################################
       #                           END OF YOUR CODE                           #
       ########################################################################

Training Loop:

We need to implement a slightly different training loop when using the tf.keras.Model API. Instead of computing gradients and updating the weights of the model manually, we use an Optimizer object from the tf.train package which takes care of these details for us. You can read more about

def train_part34(model_init_fn, optimizer_init_fn, num_epochs=1):
  """
  Simple training loop for use with models defined using tf.keras. It trains
  a model for one epoch on the CIFAR-10 training set and periodically checks
  accuracy on the CIFAR-10 validation set.
  
  Inputs:
  - model_init_fn: A function that takes no parameters; when called it
    constructs the model we want to train: model = model_init_fn()
   - optimizer_init_fn: A function which takes no parameters; when called it
     constructs the Optimizer object we will use to optimize the model:
     optimizer = optimizer_init_fn()
   - num_epochs: The number of epochs to train for
   
   Returns: Nothing, but prints progress during trainingn
   """
   tf.reset_default_graph()    
   with tf.device(device):
       # Construct the computational graph we will use to train the model. We
       # use the model_init_fn to construct the model, declare placeholders for
       # the data and labels
       x = tf.placeholder(tf.float32, [None, 32, 32, 3])
       y = tf.placeholder(tf.int32, [None])
       
       # We need a place holder to explicitly specify if the model is in the training
       # phase or not. This is because a number of layers behaves differently in
       # training and in testing, e.g., dropout and batch normalization.
       # We pass this variable to the computation graph through feed_dict as shown below.
       is_training = tf.placeholder(tf.bool, name='is_training')
       
       # Use the model function to build the forward pass.
       scores = model_init_fn(x, is_training)

       # Compute the loss like we did in Part II
       loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=scores)
       loss = tf.reduce_mean(loss)

       # Use the optimizer_fn to construct an Optimizer, then use the optimizer
       # to set up the training step. Asking TensorFlow to evaluate the
       # train_op returned by optimizer.minimize(loss) will cause us to make a
       # single update step using the current minibatch of data.
       
       # Note that we use tf.control_dependencies to force the model to run
       # the tf.GraphKeys.UPDATE_OPS at each training step. tf.GraphKeys.UPDATE_OPS
       # holds the operators that update the states of the network.
       # For example, the tf.layers.batch_normalization function adds the running mean
       # and variance update operators to tf.GraphKeys.UPDATE_OPS.
       optimizer = optimizer_init_fn()
       update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
       with tf.control_dependencies(update_ops):
           train_op = optimizer.minimize(loss)

   # Now we can run the computational graph many times to train the model.
   # When we call sess.run we ask it to evaluate train_op, which causes the
   # model to update.
   with tf.Session() as sess:
       sess.run(tf.global_variables_initializer())
       t = 0
       for epoch in range(num_epochs):
           print('Starting epoch %d' % epoch)
           for x_np, y_np in train_dset:
               feed_dict = {x: x_np, y: y_np, is_training:1}
               loss_np, _ = sess.run([loss, train_op], feed_dict=feed_dict)
               if t % print_every == 0:
                   print('Iteration %d, loss = %.4f' % (t, loss_np))
                   check_accuracy(sess, val_dset, x, scores, is_training=is_training)
                   print()
               t += 1

Finally :

Keras Sequential API

Here you should use tf.keras.Sequential to reimplement the same three-layer ConvNet architecture used in Part II and Part III. As a reminder, your model should have the following architecture:

Convolutional layer with 16 5x5 kernels, using zero padding of 2
ReLU nonlinearity
Convolutional layer with 32 3x3 kernels, using zero padding of 1
ReLU nonlinearity
Fully-connected layer giving class scores

You should initialize the weights of the model using a tf.variance_scaling_initializer as above.

def model_init_fn(inputs, is_training):
  model = None
  ############################################################################
  # TODO: Construct a three-layer ConvNet using tf.keras.Sequential.         #
  ############################################################################
  input_shape = (32, 32, 3)
  channel_1, channel_2, num_classes = 32, 16, 10
  initializer = tf.variance_scaling_initializer(scale=2.0)
  layers = [
       # 'Same' padding acts similar to zero padding of 2 for this input
       tf.layers.Conv2D(channel_1,[5,5],strides=1, 
                               padding="same", activation=tf.nn.relu,
                               kernel_initializer = initializer,input_shape=(32, 32,3)),
       tf.layers.Conv2D(channel_2,[3,3],strides=1, 
                               padding="same", activation=tf.nn.relu,
                               kernel_initializer = initializer),
       tf.layers.Flatten(input_shape=input_shape),
       tf.layers.Dense(num_classes, kernel_initializer=initializer),
   ]
   model = tf.keras.Sequential(layers)
   #pass
   ############################################################################
   #                            END OF YOUR CODE                              #
   ############################################################################
   return model(inputs)

learning_rate = 5e-4
def optimizer_init_fn():
   optimizer = None
   ############################################################################
   # TODO: Complete the implementation of model_fn.                           #
   ############################################################################
   optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9, use_nesterov=True)

   ############################################################################
   #                           END OF YOUR CODE                               #
   ############################################################################
   return optimizer

train_part34(model_init_fn, optimizer_init_fn)

To be honest, I personally prefer pytorch because it is more succinct and simple in syntax. In contrast, tensorflow is very grammatically complex and needs to be written repeatedly to write such as sess.run and placeholder to run the whole code. In tensorflow’s Sequential API, dropout and batchnorm are not available, but those API is very simple and available in pytorch.

Objectively speaking, the advantage of tensorflow is that TF has the perfect community and documentation which are supported by GOOGLE, which is a great benefit for industrial developers. So in the future, although tensorflow has some shortcomings, I will still use it anyway.

(The following content and introduction are based on the assignment of CS231n)

liangliangzheng

July,31,2018