This chapter describes basic operations in TensorFlow.
Basic Python operations (assignments, branch & loop statements, importing libraries)
NumPy , a commonly used Python library for scientific computing. TensorFlow 2.X is integrated closely with NumPy.
Gradient descent that searches local minima of a function
In the beginning, we can simply regard TensorFlow as a library for scientific computing (like Numpy in Python).
First, let us import TensorFlow:
import tensorflow as tf
This handbook is based on the Eager Execution mode of TensorFlow. In TensorFlow 1.X, you MUST run
tf.enable_eager_execution() after importing it to enable Eager Execution mode. In TensorFlow 2.X, the Eager Execution is default thus you do not need to run
tf.enable_eager_execution(). (However, if you want to disable it, you should run
TensorFlow uses tensors as its basic elements of data. Tensors in TensorFlow are conceptually equal to multidimensional arrays. We can use them to describe scalars, vectors, matrices and so on. Here are some examples:
# Declare a random float (scalar). random_float = tf.random.uniform(shape=()) # Declare a zero vector with two elements. zero_vector = tf.zeros(shape=(2)) # Declare two 2*2 constant matrices A and B. A = tf.constant([[1., 2.], [3., 4.]]) B = tf.constant([[5., 6.], [7., 8.]])
A tensor have three important attributes: shape, data type and value. You can use the
dtype attribute and the
numpy() method to fetch them. For example:
# View the shape, type and value of matrix A. print(A.shape) # Output (2, 2), which means the number of rows and cols are both 2. print(A.dtype) # Output <dtype: 'float32'>. print(A.numpy()) # Output [[1. 2.] # [3. 4.]].
Most of the TensorFlow API functions will infer the data type automatically from the input (
tf.float32 in most cases). However, you can add the parameter
dtype to assign the data type manually. For example,
zero_vector = tf.zeros(shape=(2), dtype=tf.int32) will return a tensor with all elements in type of
numpy() method of a tensor is to return a NumPy array whose value is equal to the value of the tensor.
There are lots of operations in TensorFlow so that we can obtain new tensors as the result of operations between given tensors. For example:
C = tf.add(A, B) # Compute the elementwise sum of A and B. D = tf.matmul(A, B) # Compute the multiplication of A and B.
After the operations, the value of
tf.Tensor( [[ 6. 8.] [10. 12.]], shape=(2, 2), dtype=float32) tf.Tensor( [[19. 22.] [43. 50.]], shape=(2, 2), dtype=float32)
So we can see that we have successfully used
tf.add() to compute , and have used
tf.matmul() to compute .
Automatic differentiation mechanism¶
In machine learning, we often need to compute derivatives of functions. TensorFlow provides the powerful Automatic differentiation mechanism for computing derivatives. The following codes show how to use
tf.GradientTape() to computer the derivative of the function at :
import tensorflow as tf x = tf.Variable(initial_value=3.) with tf.GradientTape() as tape: # All calculation steps will be recorded within the context of tf.GradientTape() for differentiation. y = tf.square(x) y_grad = tape.gradient(y, x) # Compute the derivative of y with respect to x. print([y, y_grad])
[array([9.], dtype=float32), array([6.], dtype=float32)]
x is a variable initialized to 3, declared by
tf.Variable(). Same as an ordinary tensor, a variable also has three attributes: shape, data type and value. An initialization is required before using a variable, which can be specified by the parameters
x is initialized to
3. 1. One significant difference between the variables and the tensors is the former can be used to differentiate by the automatic differentiation mechanism of TensorFlow by default, which is often used to define parameters of ML models.
tf.GradientTape() is an automatic differentiation recorder, in which variables and calculation steps are automatically recorded. In the previous example, the variable
x and the step
y = tf.square(x) were recorded automatically, thus the derivative of the tensor
y with respect to the variable
x can be obtained by
y_grad = tape.gradient(y, x).
The more common case in machine learning is partial differentiation of multivariable functions as well as differentiation of vectors and matrices. TensorFlow can handle these as well. The following codes show how to obtain the partial derivative of the function for respectively by
tf.GradientTape() where .
X = tf.constant([[1., 2.], [3., 4.]]) y = tf.constant([[1.], [2.]]) w = tf.Variable(initial_value=[[1.], [2.]]) b = tf.Variable(initial_value=1.) with tf.GradientTape() as tape: L = 0.5 * tf.reduce_sum(tf.square(tf.matmul(X, w) + b - y)) w_grad, b_grad = tape.gradient(L, [w, b]) # Compute the partial derivative of L(w, b) with respect to w and b. print([L.numpy(), w_grad.numpy(), b_grad.numpy()])
[62.5, array([[35.], [50.]], dtype=float32), array([15.], dtype=float32)]
tf.square() here squared each element of the input tensor without altering its shape.
tf.reduce_sum() summed up all the elements of the input tensor, outputing a scalar tensor with a none shape (the dimensions for sum can be specified by the parameter
axis, without which all elements will be summed up by default). There are a large number of tensor operation APIs in TensorFlow, including mathematical operations, tensor shape operations (e.g.,
tf.reshape()), slicing and concatenation (e.g.,
tf.concat()), etc. Further information can be acquired by viewing the TensorFlow official API documentaion 2.
From the output we can see TensorFlow has helped us obtained that
A basic example: Linear regression¶
UFLDL Tutorial, Linear Regression.
Consider a practical problem. The estate price of a city between 2013 and 2017 are listed below:
Now we wish to perform a linear regression on this data, that is, use the linar model to fit the data above, where
b are parameters yet to be determined.
First we define the data and conduct basic normalization.
import numpy as np X_raw = np.array([2013, 2014, 2015, 2016, 2017], dtype=np.float32) y_raw = np.array([12000, 14000, 15000, 16500, 17500], dtype=np.float32) X = (X_raw - X_raw.min()) / (X_raw.max() - X_raw.min()) y = (y_raw - y_raw.min()) / (y_raw.max() - y_raw.min())
In the following steps we use gradient descent to find the parameters
b in the linear model 3.
Recall the basic knowledge of machine learning, to find a local minimum of a multivariable function , the process of gradient descent is as follows:
Initialize the independent variable to , .
Iterate the following steps until the convergence criterion is met:
Find the gradient of the function with respect to the independent variable.
Update the independent variable: where is the learning rate (i.e. the “stride” in one gradient descent).
Next, we consider how to programme to implement the gradient descent method to find the solution of the linear regression .
Linear regression under NumPy¶
Implementations of ML models are not preserved for TensorFlow. In fact, simple models can be solved even by using regular scientific computing libraries. Here we use NumPy, a common scientific computing library to implement gradient descent. NumPy provides support for multidimensional arrays, which can represent vectors, matrices and even higher dimensional tensors. Meanwhile, it also provides many functions that support operations on multidimensional arrays (e.g. the following
np.dot() evaluates the dot product and
np.sum() gets the sum). NumPy and MATLAB are similar in this regard. In the following codes, we will find the partial derivative of the loss function with respect to the parameters
b manually 4 and use gradient descent iteratively to obtain the values of
a, b = 0, 0 num_epoch = 10000 learning_rate = 1e-3 for e in range(num_epoch): # Compute the gradient of the loss function with respect to independent variables (model parameters) manually. y_pred = a * X + b grad_a, grad_b = (y_pred - y).dot(X), (y_pred - y).sum() # Update parameters. a, b = a - learning_rate * grad_a, b - learning_rate * grad_b print(a, b)
However, you may have already noticed that there are two pain points for implementing ML models when using conventional scientific computing libraries:
You have to find the partial derivatives with respect to parameters by yourself often. It may be easy for simple functions, but the process would be very painful or even impossible once the functions become complex.
You have to update the parameters according to the result of the derivative by yourself frequently. Here we used gradient descent, the most fundamental approach, thus it was not hard updating parameters. However, the process would have been very complicated if you use more advanced approaches updating parameters (e.g., Adam or Adagrad).
The emergence of DL frameworks such as TensorFlow has largely solved these problems and has brought considerable convenience for implementing ML models.
Linear regression under TensorFlow¶
TensorFlow Eager Execution Mode 5 is quite similar with how NumPy worked above, while it provides a series of features which are rather crucial for deep learning, such as faster computation (GPU support), automatic differentiation, optimizers, etc. The following shows how to use TensorFlow to compute linear regression. You can notice that the structure of the program is very similar to the previous implemention with NumPy. Here TensorFlow helps us accomplished two crucial tasks:
tape.gradient(ys, xs)to compute the gradient automatically
optimizer.apply_gradients(grads_and_vars)to update model parameters automatically
X = tf.constant(X) y = tf.constant(y) a = tf.Variable(initial_value=0.) b = tf.Variable(initial_value=0.) variables = [a, b] num_epoch = 10000 optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3) for e in range(num_epoch): # Use tf.GradientTape() to record information about the gradient of the loss function. with tf.GradientTape() as tape: y_pred = a * X + b loss = 0.5 * tf.reduce_sum(tf.square(y_pred - y)) # TensorFlow computes the gradients of the loss function with respect to independent variables (model parameters) automatically. grads = tape.gradient(loss, variables) # TensorFlow updates parameters according to the gradient automatically. optimizer.apply_gradients(grads_and_vars=zip(grads, variables)) print(a, b)
Here we used the approach mentioned before to compute the partial derivative of the loss function with respect to parameters. Meanwhile, we declared a gradient descent optimizer whose learning rate was 1e-3 by
tf.keras.optimizers.SGD(learning_rate=1e-3). The optimizer can help us update model parameters based on the calculated derivative result, thereby minimizing a certain loss function. Specifically, you should call the method
apply_gradients() for doing so.
Notice here we needed to provide the parameter
grads_and_vars, which were the variables to be updated (like
variables in the codes above) and the partial derivatives of the loss function with respect to them (like
grads in the codes above), to the method
optimizer.apply_gradients() that updated model paramters. Specifically, you need to pass in a Python list here whose elements are
(the partial derivative for the variable, the variable) pairs, e.g.,
[(grad_a, a), (grad_b, b)] in this case. By
grads = tape.gradient(loss, variables) we found the partial derivatives of
loss with respect to each variable in
variables = [a, b] recorded in tape, which are
grads = [grad_a, grad_b]. Then we used the
zip() function in Python to assemble
grads = [grad_a, grad_b] and
variables = [a, b] together to get the parameters we needed.
zip() function is a built-in function of Python. It would be confounding to describe it with natural language, but it will be much more accessible by giving an example: If
a = [1, 3, 5] and
b = [2, 4, 6], then
zip(a, b) = [(1, 2), (3, 4), ..., (5, 6)]. In other words, it “takes iterable objects as parameters, packs their corresponding elements into tuples and returns a list of these tuples”. In Python 3, the
zip() function returns an object, which needs to be converted into a list by calling
In practical applications, the models we code are usually much more complicated than the linear model
y_pred = a * X + b (whose paramters are
variables = [a, b]) which can be written in a single line. Therefore we will often create and instantiate a model class
model = Model(), then use
y_pred = model(X) to call it and use
model.variables to acquire model parameters. Refer to chapter “TensorFlow Models” for writing model classes.
In Python an integer can be defined in float type by adding a period after it. E.g.,
3.means the float
Refer to Tensor Transformations and Math. Notice that tensor operations in TensorFlow are quite similar in form with the popular Python scientific computing library NumPy. You can get started quickly if you have already known about the latter.
In fact, there has already been an analytical solution of linear regression. We used gradient descent here only for demonstrating how TensorFlow works.
The loss function here is the mean squared error , whose partial derivatives with respect to the parameters
bare and .
The opposite of the Eager Execution mode is the Graph Execution mode, which is the primary mode of TensorFlow before version 1.8 published in March 2018. In this handbook we focus on the Eager Execution mode for rapid iterative development, but we will get to the Graph Execution mode in the appendix for readers in need.