**Gradient Descent Algorithm (****for linear regression with multiple variables i.e. for multiple linear regression)**

Terminology (notation used):

$n$ - this is the number of features (the number of input variables)

$X_1, X_2, \dots, X_n$ - these are the features

$X_j$ - the $j$-th feature ($j = 1,2,\dots,n$)

$m$ - the size of the training set (the number of training examples)

$X_j^{(i)}$ - the $i$-th training value from the $j$-th feature ($i = 1,2,\dots,m$)

Let's assume that we have the following features and training examples.

# |
$X_1$ |
$X_2$ |
... |
$X_n$ |

1 |
2100 |
5 |
... |
80 |

2 |
1200 |
3 |
... |
45 |

3 |
800 |
2 |
... |
50 |

... |
... |
... |
... |
... |

m |
1500 |
2 |
... |
72 |

Here we have that
$X_1^{(3)} = 800$, $X_2^{(2)} = 3$, $X_n^{(3)} = 50$, $X_n^{(m)} = 72$

$\overrightarrow{X}^{(i)}$ - this is a row vector of all $i$-th training values (basically the $i$-th row from the table above), so this is the $i$-th training example

$\overrightarrow{X}^{(i)} = (X_1^{(i)}, X_2^{(i)}, \dots, X_n^{(i)})$, for $i=1,2,\dots,m$

**Note: **The training examples are the rows from the table above.

The linear regression model for a single feature was defined as follows.

$f_{w,b} = w \cdot x + b \tag{1}$

The linear regression model for multiple features is defined as follows.

$f_{w,b} = \sum_{j=1}^n w_j \cdot x_j + b \tag{2}$

We can rewrite this model in vector notation. Let us denote

$\overrightarrow {w} = (w_1, w_2, \dots, w_n)$ - this is a row vector of $n$ scalars (these are the weights assigned to the features)

$b$ - this is a scalar (a number)

$\overrightarrow {x} = (x_1, x_2, \dots, x_n)$ - this is a row vector of $n$ scalars (the input variables to our model)

Then for the multiple linear regression model we obtain

$f_{\overrightarrow{w},b} = \overrightarrow{w} \cdot \overrightarrow{x} + b \tag{3}$

Here $\overrightarrow{w}, \overrightarrow{b}$ are the parameters of the multiple linear regression model.

In this notation $\overrightarrow{w}\cdot \overrightarrow{x}$ is the dot product of the vectors $\overrightarrow{w}$ and $\overrightarrow{x}$ i.e.

$\overrightarrow{w} \cdot \overrightarrow{x} = \sum_{j=1}^n w_j \cdot x_j \tag{4}$

**Note: **This is called **multiple linear regression**, and not multivariate linear regression. The term multivariate linear regression in machine learning is used for something else.