Open In Colab

Lecture 1 : Neural Networks: A Review Part 1 Code #

from ipywidgets import widgets
out1 = widgets.Output()
with out1:
  from IPython.display import YouTubeVideo
  video = YouTubeVideo(id=f"47d0M3UAXNc", width=854, height=480, fs=1, rel=0)
  print("Video available at" +
from IPython import display as IPyDisplay
    <a href= "" target="_blank">
    <img src=""
  alt="button link to Airtable" style="width:200px"></a>
    </div>""" )


# Imports
import time
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# PyTorch libraries
import torch
from torch import nn
from torchvision import datasets
from import DataLoader
from torchvision.transforms import ToTensor

The Basics of PyTorch#

PyTorch is a Python-based scientific computing package targeted at two sets of audiences:

  • A replacement for NumPy optimized for the power of GPUs

  • A deep learning platform that provides significant flexibility and speed

At its core, PyTorch provides a few key features:

  • A multidimensional Tensor object, similar to NumPy Array but with GPU acceleration.

  • An optimized autograd engine for automatically computing derivatives.

  • A clean, modular API for building and deploying deep learning models.

You can find more information about PyTorch in the Appendix.

Creating Tensors#

# We can construct a tensor directly from some common python iterables,
# such as list and tuple nested iterables can also be handled as long as the
# dimensions are compatible

# tensor from a list
a = torch.tensor([0, 1, 2])

#tensor from a tuple of tuples
b = ((1.0, 1.1), (1.2, 1.3))
b = torch.tensor(b)

# tensor from a numpy array
c = np.ones([2, 3])
c = torch.tensor(c)

print(f"Tensor a: {a}")
print(f"Tensor b: {b}")
print(f"Tensor c: {c}")
Tensor a: tensor([0, 1, 2])
Tensor b: tensor([[1.0000, 1.1000],
        [1.2000, 1.3000]])
Tensor c: tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

Some common tensor constructors:

# The numerical arguments we pass to these constructors
# determine the shape of the output tensor

x = torch.ones(5, 3)
y = torch.zeros(2)
z = torch.empty(1, 1, 5)
print(f"Tensor x: {x}")
print(f"Tensor y: {y}")
print(f"Tensor z: {z}")
Tensor x: tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
Tensor y: tensor([0., 0.])
Tensor z: tensor([[[1.7433e-35, 0.0000e+00, 3.3631e-44, 0.0000e+00,        nan]]])
# There are also constructors for random numbers

# Uniform distribution
a = torch.rand(1, 3)

# Normal distribution
b = torch.randn(3, 4)

# There are also constructors that allow us to construct
# a tensor according to the above constructors, but with
# dimensions equal to another tensor.

c = torch.zeros_like(a)
d = torch.rand_like(c)

print(f"Tensor a: {a}")
print(f"Tensor b: {b}")
print(f"Tensor c: {c}")
print(f"Tensor d: {d}")
Tensor a: tensor([[0.3057, 0.2544, 0.9737]])
Tensor b: tensor([[ 1.7424, -0.7077, -0.7339, -0.0538],
        [-0.5136, -0.7716,  0.0846,  0.6274],
        [-0.5731,  0.4269, -2.5037,  0.6730]])
Tensor c: tensor([[0., 0., 0.]])
Tensor d: tensor([[0.5056, 0.3271, 0.4315]])


  • PyTorch Random Number Generator (RNG): You can use torch.manual_seed() to seed the RNG for all devices (both CPU and GPU):

import torch
  • For custom operators, you might need to set python seed as well:

import random
  • Random number generators in other libraries (e.g., NumPy):

import numpy as np

Here, we define for you a function called set_seed that does the job for you!

def set_seed(seed=None, seed_torch=True):
  Function that controls randomness. NumPy and random modules must be imported.

    seed : Integer
      A non-negative integer that defines the random state. Default is `None`.
    seed_torch : Boolean
      If `True` sets the random seed for pytorch tensors, so pytorch module
      must be imported. Default is `True`.

  if seed is None:
    seed = np.random.choice(2 ** 32)
  if seed_torch:
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

  print(f'Random seed {seed} has been set.')

Now, let’s use the set_seed function in the previous example. Execute the cell multiple times to verify that the numbers printed are always the same.

def simplefun(seed=True, my_seed=None):
  Helper function to verify effectiveness of set_seed attribute

    seed: Boolean
      Specifies if seed value is provided or not
    my_seed: Integer
      Initializes seed to specified value

  if seed:

  # uniform distribution
  a = torch.rand(1, 3)
  # normal distribution
  b = torch.randn(3, 4)

  print("Tensor a: ", a)
  print("Tensor b: ", b)
simplefun(seed=True, my_seed=0)  # Turn `seed` to `False` or change `my_seed`
Random seed 0 has been set.
Tensor a:  tensor([[0.4963, 0.7682, 0.0885]])
Tensor b:  tensor([[ 0.3643,  0.1344,  0.1642,  0.3058],
        [ 0.2100,  0.9056,  0.6035,  0.8110],
        [-0.0451,  0.8797,  1.0482, -0.0445]])

Numpy-like number ranges:

The .arange() and .linspace() behave how you would expect them to if you are familar with numpy.

a = torch.arange(0, 10, step=1)
b = np.arange(0, 10, step=1)

c = torch.linspace(0, 5, steps=11)
d = np.linspace(0, 5, num=11)

print(f"Tensor a: {a}\n")
print(f"Numpy array b: {b}\n")
print(f"Tensor c: {c}\n")
print(f"Numpy array d: {d}\n")
Tensor a: tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Numpy array b: [0 1 2 3 4 5 6 7 8 9]

Tensor c: tensor([0.0000, 0.5000, 1.0000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000, 4.0000,
        4.5000, 5.0000])

Numpy array d: [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5. ]

Operations in PyTorch#

Tensor-Tensor operations

We can perform operations on tensors using methods under torch.

a = torch.ones(5, 3)
b = torch.rand(5, 3)
c = torch.empty(5, 3)
d = torch.empty(5, 3)

# this only works if c and d already exist
torch.add(a, b, out=c)

# Pointwise Multiplication of a and b
torch.multiply(a, b, out=d)

tensor([[1.0362, 1.1852, 1.3734],
        [1.3051, 1.9320, 1.1759],
        [1.2698, 1.1507, 1.0317],
        [1.2081, 1.9298, 1.7231],
        [1.7423, 1.5263, 1.2437]])
tensor([[0.0362, 0.1852, 0.3734],
        [0.3051, 0.9320, 0.1759],
        [0.2698, 0.1507, 0.0317],
        [0.2081, 0.9298, 0.7231],
        [0.7423, 0.5263, 0.2437]])

However, in PyTorch, most common Python operators are overridden. The common standard arithmetic operators (\(+\), \(-\), \(*\), \(/\), and \(**\)) have all been lifted to elementwise operations

x = torch.tensor([1, 2, 4, 8])
y = torch.tensor([1, 2, 3, 4])
x + y, x - y, x * y, x / y, x**y  # The `**` is the exponentiation operator
(tensor([ 2,  4,  7, 12]),
 tensor([0, 0, 1, 4]),
 tensor([ 1,  4, 12, 32]),
 tensor([1.0000, 1.0000, 1.3333, 2.0000]),
 tensor([   1,    4,   64, 4096]))

Tensor Methods

Tensors also have a number of common arithmetic operations built in.

All of these operations should have similar syntax to their numpy equivalents.

x = torch.rand(3, 3)
# sum() - note the axis is the axis you move across when summing
print(f"Sum of every element of x: {x.sum()}")
print(f"Sum of the columns of x: {x.sum(axis=0)}")
print(f"Sum of the rows of x: {x.sum(axis=1)}")

print(f"Mean value of all elements of x {x.mean()}")
print(f"Mean values of the columns of x {x.mean(axis=0)}")
print(f"Mean values of the rows of x {x.mean(axis=1)}")
tensor([[0.5846, 0.0332, 0.1387],
        [0.2422, 0.8155, 0.7932],
        [0.2783, 0.4820, 0.8198]])

Sum of every element of x: 4.187318325042725
Sum of the columns of x: tensor([1.1051, 1.3306, 1.7517])
Sum of the rows of x: tensor([0.7565, 1.8509, 1.5800])

Mean value of all elements of x 0.46525758504867554
Mean values of the columns of x tensor([0.3684, 0.4435, 0.5839])
Mean values of the rows of x tensor([0.2522, 0.6170, 0.5267])

Matrix Operations

The @ symbol is overridden to represent matrix multiplication. You can also use torch.matmul() to multiply tensors. For dot multiplication, you can use, or manipulate the axes of your tensors and do matrix multiplication (we will cover that in the next section).

Transposes of 2D tensors are obtained using torch.t() or Tensor.T. Note the lack of brackets for Tensor.T - it is an attribute, not a method.

Below are two expressions involving operations on matrices.

\begin{equation} \textbf{A} = \begin{bmatrix}2 &4 \5 & 7 \end{bmatrix} \begin{bmatrix} 1 &1 \2 & 3 \end{bmatrix} + \begin{bmatrix}10 & 10 \ 12 & 1 \end{bmatrix} \end{equation}


\begin{equation} b = \begin{bmatrix} 3 \ 5 \ 7 \end{bmatrix} \cdot \begin{bmatrix} 2 \ 4 \ 8 \end{bmatrix} \end{equation}

The code block below that computes these expressions using PyTorch

def simple_operations(a1: torch.Tensor, a2: torch.Tensor, a3: torch.Tensor):
  Helper function to demonstrate simple operations
  i.e., Multiplication of tensor a1 with tensor a2 and then add it with tensor a3

    a1: Torch tensor
      Tensor of size ([2,2])
    a2: Torch tensor
      Tensor of size ([2,2])
    a3: Torch tensor
      Tensor of size ([2,2])

    answer: Torch tensor
      Tensor of size ([2,2]) resulting from a1 multiplied with a2, added with a3

  result =  a1 @ a2 + a3
  return result

# init our tensors
a1 = torch.tensor([[2, 4], [5, 7]])
a2 = torch.tensor([[1, 1], [2, 3]])
a3 = torch.tensor([[10, 10], [12, 1]])
A = simple_operations(a1, a2, a3)
tensor([[20, 24],
        [31, 27]])
def dot_product(b1: torch.Tensor, b2: torch.Tensor):
  Helper function to demonstrate dot product operation
  Dot product is an algebraic operation that takes two equal-length sequences
  (usually coordinate vectors), and returns a single number.
  Geometrically, it is the product of the Euclidean magnitudes of the
  two vectors and the cosine of the angle between them.

    b1: Torch tensor
      Tensor of size ([3])
    b2: Torch tensor
      Tensor of size ([3])

    product: Tensor
      Tensor of size ([1]) resulting from b1 scalar multiplied with b2
  # Use to compute the dot product of two tensors
  product =, b2)
  return product

b1 = torch.tensor([3, 5, 7])
b2 = torch.tensor([2, 4, 8])
b = dot_product(b1, b2)

Manipulating Tensors in Pytorch#


Just as in numpy, elements in a tensor can be accessed by index. As in any numpy array, the first element has index 0 and ranges are specified to include the first to last_element-1. We can access elements according to their relative position to the end of the list by using negative indices. Indexing is also referred to as slicing.

For example, [-1] selects the last element; [1:3] selects the second and the third elements, and [:-2] will select all elements excluding the last and second-to-last elements.

x = torch.arange(0, 10)
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([1, 2])
tensor([0, 1, 2, 3, 4, 5, 6, 7])

When we have multidimensional tensors, indexing rules work the same way as NumPy.

# make a 5D tensor
x = torch.rand(1, 2, 3, 4, 5)

print(f" shape of x[0]:{x[0].shape}")
print(f" shape of x[0][0]:{x[0][0].shape}")
print(f" shape of x[0][0][0]:{x[0][0][0].shape}")
 shape of x[0]:torch.Size([2, 3, 4, 5])
 shape of x[0][0]:torch.Size([3, 4, 5])
 shape of x[0][0][0]:torch.Size([4, 5])

Flatten and reshape

There are various methods for reshaping tensors. It is common to have to express 2D data in 1D format. Similarly, it is also common to have to reshape a 1D tensor into a 2D tensor. We can achieve this with the .flatten() and .reshape() methods.

z = torch.arange(12).reshape(6, 2)
print(f"Original z: \n {z}")

# 2D -> 1D
z = z.flatten()
print(f"Flattened z: \n {z}")

# and back to 2D
z = z.reshape(3, 4)
print(f"Reshaped (3x4) z: \n {z}")
Original z: 
 tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11]])
Flattened z: 
 tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Reshaped (3x4) z: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

You will also see the .view() methods used a lot to reshape tensors. There is a subtle difference between .view() and .reshape(), though for now we will just use .reshape().

Squeezing tensors

When processing batches of data, you will quite often be left with singleton dimensions. E.g., [1,10] or [256, 1, 3]. This dimension can quite easily mess up your matrix operations if you don’t plan on it being there…

In order to compress tensors along their singleton dimensions we can use the .squeeze() method. We can use the .unsqueeze() method to do the opposite.

x = torch.randn(1, 10)
# printing the zeroth element of the tensor will not give us the first number!

print(f"x[0]: {x[0]}")
torch.Size([1, 10])
x[0]: tensor([-0.7391,  0.8027, -0.6817, -0.1335,  0.0658, -0.5919,  0.7670,  0.6899,
         0.3282,  0.5085])

Because of that pesky singleton dimension, x[0] gave us the first row instead!

# Let's get rid of that singleton dimension and see what happens now
x = x.squeeze(0)
print(f"x[0]: {x[0]}")
x[0]: -0.7390837073326111
# Adding singleton dimensions works a similar way, and is often used when tensors
# being added need same number of dimensions

y = torch.randn(5, 5)
print(f"Shape of y: {y.shape}")

# lets insert a singleton dimension
y = y.unsqueeze(1)
print(f"Shape of y: {y.shape}")
Shape of y: torch.Size([5, 5])
Shape of y: torch.Size([5, 1, 5])


Sometimes our dimensions will be in the wrong order! For example, we may be dealing with RGB images with dim \([3\times48\times64]\), but our pipeline expects the colour dimension to be the last dimension, i.e., \([48\times64\times3]\). To get around this we can use the .permute() method.

# `x` has dimensions [color,image_height,image_width]
x = torch.rand(3, 48, 64)

# We want to permute our tensor to be [ image_height , image_width , color ]
x = x.permute(1, 2, 0)
# permute(1,2,0) means:
# The 0th dim of my new tensor = the 1st dim of my old tensor
# The 1st dim of my new tensor = the 2nd
# The 2nd dim of my new tensor = the 0th
torch.Size([48, 64, 3])

You may also see .transpose() used. This works in a similar way as permute, but can only swap two dimensions at once.


In this example, we concatenate two matrices along rows (axis 0, the first element of the shape) vs. columns (axis 1, the second element of the shape). We can see that the first output tensor’s axis-0 length (6) is the sum of the two input tensors’ axis-0 lengths (3+3); while the second output tensor’s axis-1 length (8) is the sum of the two input tensors’ axis-1 lengths (4+4).

# Create two tensors of the same shape
x = torch.arange(12, dtype=torch.float32).reshape((3, 4))
y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

# Concatenate along rows
cat_rows =, y), dim=0)

# Concatenate along columns
cat_cols =, y), dim=1)

# Printing outputs
print('Concatenated by rows: shape{} \n {}'.format(list(cat_rows.shape), cat_rows))
print('\n Concatenated by colums: shape{}  \n {}'.format(list(cat_cols.shape), cat_cols))
Concatenated by rows: shape[6, 4] 
 tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 2.,  1.,  4.,  3.],
        [ 1.,  2.,  3.,  4.],
        [ 4.,  3.,  2.,  1.]])

 Concatenated by colums: shape[3, 8]  
 tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])

Conversion to Other Python Objects

Converting a tensor to a numpy.ndarray, or vice versa, is easy, and the converted result does not share memory. This minor inconvenience is quite important: when you perform operations on the CPU or GPUs, you do not want to halt computation, waiting to see whether the NumPy package of Python might want to be doing something else with the same chunk of memory.

When converting to a NumPy array, the information being tracked by the tensor will be lost, i.e., the computational graph. This will be covered in detail when you are introduced to autograd tomorrow!

x = torch.randn(5)
print(f"x: {x}  |  x type:  {x.type()}")

y = x.numpy()
print(f"y: {y}  |  y type:  {type(y)}")

z = torch.tensor(y)
print(f"z: {z}  |  z type:  {z.type()}")
x: tensor([ 0.2659, -0.5148, -0.0613,  0.5046,  0.1385])  |  x type:  torch.FloatTensor
y: [ 0.26593232 -0.5148316  -0.06128114  0.5046449   0.13848118]  |  y type:  <class 'numpy.ndarray'>
z: tensor([ 0.2659, -0.5148, -0.0613,  0.5046,  0.1385])  |  z type:  torch.FloatTensor

To convert a size-1 tensor to a Python scalar, we can invoke the item function or Python’s built-in functions.

a = torch.tensor([3.5])
a, a.item(), float(a), int(a)
(tensor([3.5000]), 3.5, 3.5, 3)


By default, when we create a tensor it will not live on the GPU!

x = torch.randn(10)

When using Colab notebooks, by default, will not have access to a GPU. In order to start using GPUs we need to request one. We can do this by going to the runtime tab at the top of the page.

By following RuntimeChange runtime type and selecting GPU from the Hardware Accelerator dropdown list, we can start playing with sending tensors to GPUs.

Once you have done this your runtime will restart and you will need to rerun the first setup cell to reimport PyTorch. Then proceed to the next cell.

Now we have a GPU.

The cell below should return True.


CUDA is an API developed by Nvidia for interfacing with GPUs. PyTorch provides us with a layer of abstraction, and allows us to launch CUDA kernels using pure Python.

In short, we get the power of parallelizing our tensor computations on GPUs, whilst only writing (relatively) simple Python!

Here, we define the function set_device, which returns the device use in the notebook, i.e., cpu or cuda. Unless otherwise specified, we use this function on top of every tutorial, and we store the device variable such as

DEVICE = set_device()

Let’s define the function using the PyTorch package torch.cuda, which is lazily initialized, so we can always import it, and use is_available() to determine if our system supports CUDA.

def set_device():
  Set the device. CUDA if available, CPU otherwise


  device = "cuda" if torch.cuda.is_available() else "cpu"
  if device != "cuda":
    print("GPU is not enabled in this notebook. \n"
          "If you want to enable it, in the menu under `Runtime` -> \n"
          "`Hardware accelerator.` and select `GPU` from the dropdown menu")
    print("GPU is enabled in this notebook. \n"
          "If you want to disable it, in the menu under `Runtime` -> \n"
          "`Hardware accelerator.` and select `None` from the dropdown menu")

  return device

Let’s make some CUDA tensors!

# common device agnostic way of writing code that can run on cpu OR gpu
# that we provide for you in each of the tutorials
DEVICE = set_device()

# we can specify a device when we first create our tensor
x = torch.randn(2, 2, device=DEVICE)

# we can also use the .to() method to change the device a tensor lives on
y = torch.randn(2, 2)
print(f"y before calling to() | device: {y.device} | dtype: {y.type()}")

y =
print(f"y after calling to() | device: {y.device} | dtype: {y.type()}")
GPU is enabled in this notebook. 
If you want to disable it, in the menu under `Runtime` -> 
`Hardware accelerator.` and select `None` from the dropdown menu
y before calling to() | device: cpu | dtype: torch.FloatTensor
y after calling to() | device: cuda:0 | dtype: torch.cuda.FloatTensor

Operations between cpu tensors and cuda tensors

Note that the type of the tensor changed after calling .to(). What happens if we try and perform operations on tensors on devices?

We cannot combine CUDA tensors and CPU tensors in this fashion. If we want to compute an operation that combines tensors on different devices, we need to move them first! We can use the .to() method as before, or the .cpu() and .cuda() methods. Note that using the .cuda() will throw an error, if CUDA is not enabled in your machine.

Generally, in this course, all Deep Learning is done on the GPU, and any computation is done on the CPU, so sometimes we have to pass things back and forth, so you’ll see us call.

x = torch.tensor([0, 1, 2], device=DEVICE)
y = torch.tensor([3, 4, 5], device="cpu")
z = torch.tensor([6, 7, 8], device=DEVICE)

# moving to cpu
x ="cpu")  # alternatively, you can use x = x.cpu()
print(x + y)

# moving to gpu
y =  # alternatively, you can use y = y.cuda()
print(y + z)
tensor([3, 5, 7])
tensor([ 9, 11, 13], device='cuda:0')


Code adopted from the Deep Learning Summer School offered by Neuromatch Academy