Various notes on math, programming, machine learning: 2021

Inverse of a matrix

All matrices we talk about here are $n \times n$ square matrices over a numeric field.

In linear algebra the definition of the concept invertible matrix is usually given this way.

Def.: A matrix is said to be invertible if there exists a matrix B such that $AB=BA=I$ where I is the identity matrix. In this case B is called the inverse matrix of A.

But I somehow don't like this definition, it seems too strong to me since it implies B is both left inverse (meaning $BA=I$) and right inverse (meaning $AB=I$) of the matrix A.

So let us try to introduce this concept in a somewhat different way.

Def.1: A matrix is said to be left invertible if there exists a matrix B such that $BA=I$ where I is the identity matrix. In this case B is called left inverse matrix of A.

Def.2: A matrix is said to be right invertible if there exists a matrix B such that $AB=I$ where I is the identity matrix. In this case B is called right inverse matrix of A.

We still don't know if left/right inverses of a matrix A exist and under what conditions. We also don't know if they are unique (in case they exist).

Now we will prove a few statements to clarify all this.

Th1: If a matrix A is left (or right) invertible then $\det (A) \ne 0$

Proof: We know that $\det (XY) = \det(X) \cdot \det(Y)$

If A is left invertible then there exists a matrix B such that $BA = I$. But then $1 = \det(I) = \det(BA) = \det(B) \cdot \det(A)$ And now it follows that $\det(A) \ne 0$

If A is right invertible then there exists a matrix B such that $AB = I$. But then $1 = \det(I) = \det(AB) = \det(A) \cdot \det(B)$ And now again it follows that $\det(A) \ne 0$

Th2: If $\det(A) \ne 0$ then A is left and right invertible.

Proof: The proof here is done by construction. If $A=(a_{ij})$, we construct the matrix $S=(A_{ji})$ which is the matrix formed by the cofactors of $A$ transposed. Then one easily shows (using previous theory from linear algebra) that the matrix $T = \frac{1}{\det(A)} \cdot S$ satisfies both $TA=I$ and $AT=I$

So far we proved that a matrix A is left/right invertible if and only if $\det(A) \ne 0$ In the case when $\det(A) \ne 0$, we also showed how one left inverse and one right inverse can be constructed i.e. we showed existence of the left/right inverses (the matrix T is both left and right inverse of A).

This construction of $T$ (from $A$) is important so we will keep denoting this so-constructed matrix as $T$ for the rest of this post.

Th3: For each matrix $A$ with $\det(A) \ne 0$ there is a unique left inverse and a unique right inverse and they are both equal to the above constructed matrix $T$.

Proof: Let's assume that $BA=CA=I$ for some matrices $B,C$ - left inverses of $A$.

Then $T=IT=(BA)T=B(AT)=BI = B$

And also $T=IT=(CA)T=C(AT)=CI = C$

OK, so it follows that $B=C$ (and both B and C are equal to that special matrix T). This proves the uniqueness of the left inverse.

Note that to prove the uniqueness of the left inverse we used the existence of the right inverse T of A.

The uniqueness of the right inverse is proved in the same way.

Let's assume that $AB'=AC'=I$ for some matrices $B',C'$ - right inverses of $A$.

Then $T=TI=T(AB')=(TA)B'=IB' = B'$

Also $T=TI=T(AC')=(TA)C'=IC' = C'$

So it follows that $B'=C'$ (and both B' and C' are equal to that special matrix T). This proves the uniqueness of the right inverse.

Note that to prove the uniqueness of the right inverse we used the existence of the left inverse T of A.

We are done. Now we have everything introduced in a clear way. We proved that A is left/right invertible if and only if its determinant is non-zero. And we proved that in that case (when the determinant is non-zero) the left/right inverses exists (T), and also that they are unique and coincide (both are equal to T).

Elementary row/column operations on matrices

The elementary row/column operations on matrices are:

1) Multiplying the $i$-th row (column) by a number $\lambda \ne 0$.

$R_i := \lambda \cdot R_i$

2) Adding the $j$-th row/column multiplied by a number $\lambda$ to the $i$-th row/column.

$R_i := R_i + \lambda \cdot R_j$

3) Exchanging the rows/columns $i$ and $j$.

$R = R_i$

$R_i = R_j$

$R_j = R$

The interesting thing is that for square matrices each of these operations can be accomplished by matrix multiplication.

Let us define:

$E$ - the identity matrix of order $n \times n$.
$E_{ij}$ - the square matrix of order $n \times n$ which has an element $1$ at position $(i, j)$ and zeroes at all other positions.
$A$ - any square matrix of order $n \times n$.

Then one can show that:

(1) Multiplying the i-th row/column of A by the number $\lambda \ne 0$ is accomplished by left/right multiplying A with the matrix $A_i(\lambda) = E + (\lambda-1)E_{ii}$

(2A) Adding the $j$-th row multiplied by a number $\lambda$ to the $i$-th row is accomplished by left multiplying A with the matrix $B_{ij}(\lambda) = E + \lambda E_{ij}$

(2B) Adding the $j$-th column multiplied by a number $\lambda$ to the $i$-th column is accomplished by right multiplying A with the matrix $B_{ji}(\lambda) = E + \lambda E_{ji}$

(3) Exchanging the rows/columns $i$ and $j$ is accomplished by left/right multiplying A with the matrix $C_{ij} = E - E_{ii} - E_{jj} + E_{ij} + E_{ji}$

The matrices $A_i(\lambda), B_{ij}(\lambda), C_{ij}$ are usually called matrices of the elementary transformations.

Rank of a system of vectors

Let $V\ $ be a vector space and let

$$b_1, b_2, \dots, b_n \tag{1}$$ be a system (multiset) of vectors from $V$

(multiset because the vectors $b_i$ are not required to be distinct).

Definition:

a) We will say that the system $(1)$ has rank $r \in \mathbb{N}$ if there exist $r$ linearly independent vectors from $(1)$, and every other vector from $(1)$ can be represented as a linear combination of these $r$ vectors.

b) We will say that the system $(1)$ has rank $0$ if $b_i = 0$ (the zero vector) for every $i=1,2,\dots, n$.

The rank of the system of vectors $(1)$ is denoted by $r(b_1, b_2, \dots, b_n)$.

Proposition 1: The rank of the system $(1)$ is equal to the maximal number of linearly independent vectors in the system $(1)$

Proposition 2: $r(b_1, b_2, \dots, b_n) = \dim\ \textbf{span}(b_1, b_2, \dots, b_n)$

Proposition 3: If $r(b_1, b_2, \dots, b_n)=r$ and $b$ is a vector, then $\ r(b_1, b_2, \dots, b_n, b) = r\ $ or $\ r(b_1, b_2, \dots, b_n, b) = r + 1\ $

More specifically:

A) $r(b_1, b_2, \dots, b_n, b) = r(b_1, b_2, \dots, b_n)$

if and only if $b$ is a linear combination of the vectors $b_i$

B) $r(b_1, b_2, \dots, b_n, b) = r(b_1, b_2, \dots, b_n) + 1$

if and only if $b$ is not a linear combination of the vectors $b_i$

Algebra Notes 3

3.1) Permutations

Every ordering of the first n natural numbers $1,2,\dots,n$ is called a permutation.

There are $n!$ permutations of the first n natural numbers.

3.2) Inversions

If we have the permutation $i_1, i_2, \dots i_n$ (of the first n natural numbers) then the numbers $i_k$ and $i_s$ are said to form an inversion if $k \lt s$ but $i_k > i_s$

In other words, an inversion in a permutation is a pair of numbers such that the larger number appears to the left of the smaller one in the permutation. The inversion number of a permutation is the total number of inversions. The inversion number of the permutation $i_1, i_2, \dots i_n$ is denoted by $[i_1, i_2, \dots ,i_n]$.

3.3) Parity of a permutation

Odd permutations - the inversion number is odd

Even permutations - the inversion number is even

3.4) Sign of a permutation

The sign of a permutation $i_1, i_2, \dots i_n$ is defined as $(-1)^{[i_1, i_2, \dots ,i_n]}$, where $[i_1, i_2, \dots ,i_n]$ is the inversion number of the permutation. It is obvious that the sign is $+1$ for even permutations and $-1$ for odd permutations.

3.5) Transposition

We say that we have applied a transposition if in a permutation P we swap the order of any two elements and the other elements we leave in place. When we do this, we get (from P) a new permutation R.

Lemma 1: The permutations P and R are of different parity.

Lemma 2: When $n \ge 2$, the number of the even permutations is equal to the number of the odd permutations.

3.6) Determinant

The concepts introduced here are important building blocks for introducing the concept of determinant (of a square matrix). The concept of determinant is usually defined using the Leibniz formula for determinants

Algebra Notes 2

2.1) Systems of linear equations

$a_{11}x_1 + a_{12}x_2 + \dots + a_{1n}x_n = b_1$

$a_{21}x_1 + a_{22}x_2 + \dots + a_{2n}x_n = b_2$

$\dots$

$a_{m1}x_1 + a_{m2}x_2 + \dots + a_{mn}x_n = b_m$

This is a system of m linear equations with n unknowns.

a) The matrix of this system is defined as follows

$$A = \begin{bmatrix}a_{11}&a_{12}&...&a_{1n}\\a_{21}&a_{22}&...&a_{2n}\\...&...&...&...\\a_{m1}&a_{m2}&...&a_{mn}\end{bmatrix}$$

На български: матрица на системата

b) The augmented matrix of this system is defined as follows

$$B = \begin{bmatrix}a_{11}&a_{12}&...&a_{1n}&b_1\\a_{21}&a_{22}&...&a_{2n}&b_2\\...&...&...&...&...\\a_{m1}&a_{m2}&...&a_{mn}&b_m\end{bmatrix}$$

На български: разширена матрица на системата

2.2) Types of systems of linear equations

a) Independent system: has exactly one solution

b) Inconsistent system: has no solutions

c) Consistent system: has at least one solution

d) Dependent system: has infinitely many solutions

a) Определена система: има точно едно решение

b) Несъвместима система: няма решения

c) Съвместима систeмa: има поне едно решение

d) Неопределена система: има безбройно много решения

2.3) Elementary transformations applied to a system of linear equations

a) swapping two rows

b) multiplying a row of the system with a non-zero number

$R := \lambda R$, where $\lambda \neq 0$

c) adding to a row another row (multiplied by a number)

$R_2 := R_2 + \lambda R_1$

Note: If we apply (to a given system of linear equations) a finite number of elementary transformations, the resulting system is equivalent to the original system.

2.4) Gaussian elimination - a general method for solving systems of linear equations

Gaussian elimination

Sufficient conditions for integrability

This post is about single variable functions $f : [a, b] \to \mathbb{R}$ defined in a finite closed interval, and about some sufficient conditions for their integrability in the interval $[a, b]$. Here $a$ and $b$ are real numbers of course.

Integrability refers to the existence of the definite integral

$$\int\limits_a^b f(x)\,dx$$

One should note that there are different definitions of integrability (i.e. different definitions of the definite integral shown above).

The one I am referring to here is Darboux integrability. This one is equivalent to Riemann integrability.

Then there is also Lebesgue integrability which allows a wider family of functions to be classified as integrable. So all functions which are Darboux/Riemann integrable are also Lebesgue integrable but the converse is not true. This is a very interesting topic and relates to measure theory.

OK... so here are several sufficient conditions for Darboux/Riemann integrability.

Theorem 1: If the function $f : [a, b] \to \mathbb{R}$ is defined and continuous in the finite closed interval $[a,b]$, it is integrable in $[a,b]$.

Theorem 2: If the function $f : [a, b] \to \mathbb{R}$ is defined and bounded in the finite closed interval $[a,b]$, and if it is discontinuous only at a finite number of points, then it is integrable in $[a,b]$.

Theorem 3: If the function $f : [a, b] \to \mathbb{R}$ is defined and monotonic in the finite closed interval $[a,b]$, then it is integrable in $[a,b]$.

Note that the last theorem does not disallow a situation where $f$ has infinitely many points of discontinuity. That situation is allowed and the function if still integrable (provided that it's monotonic, as the theorem says).

Again, it should be pointed out that these are just sufficient conditions for integrability, they are not necessary. There are functions which do not satisfy the conditions of any of these three theorems but are still integrable.

Algebra Notes 1

1.1) Fundamental theorem of algebra (D'Alembert):

Every non-zero, single-variable, degree $n$ polynomial with complex coefficients has, counted with multiplicity, exactly $n$ complex roots.

For more details see here

1.2) Number field definition (A):

Let $F$ be a subset of $\mathbb{C}$ which contains at least two elements. We say that $F$ is a number field if the following conditions are met:

a) if $z_1$ and $z_2$ are arbitrary numbers from $F$, then the numbers $z_1 + z_2$, $z_1 - z_2$, $z_1 \cdot z_2$ are also in $F$,

b) if $z_1$ and $z_2$ are arbitrary numbers from $F$ and $z_2 \ne 0$, then $z_1 / z_2$ is also in $F$.

1.3) Number field definition (B):

Let $F$ be a subset of $\mathbb{C}$ which contains at least two elements. We say that $F$ is a number field if the following conditions are met:

a) if $z_1$ and $z_2$ are arbitrary numbers from $F$, then the numbers $z_1 + z_2$, $z_1 - z_2$, $z_1 \cdot z_2$ are also from $F$;

b) if $z \in F$ and $z \ne 0$, then $z^{-1} \in F$

1.4) Note: Not all fields in mathematics are considered number fields (e.g. finite Galois fields are not considered number fields). The number sets $\mathbb{Q}, \mathbb{R}, \mathbb{C}$ (equipped with the usual algebraic operations) are number fields. The number sets $\mathbb{N}, \mathbb{Z}$ are not number fields.

1.5) Theorem: Every number field is a superset of the field of the rational numbers $\mathbb{Q}$.

1.6) Notations:

a) $F_{m \times n}$ - the set of all $m \times n$ matrices with elements from the number field $F$

b) $M_{n}(F)$ - the set of all $n \times n$ square matrices with elements from the number field $F$; these are also called square matrices of order $n$

c) If $A = (a_{ij})_{m\times n}$ is a matrix, then by $A^t$ we denote the transposed matrix of the matrix $A$; it is defined as follows: $A^t = (a_{ji})_{n\times m}$

d) $E_n$ or just $E$ is the identity matrix ($E_n$ just emphasizes the fact that the order of the matrix is $n$)

e) $E_{ij}$ is the matrix from $F_{m \times n}$ which has element $1$ at position $(i,j)$ and zero elements at all other positions $(k,l) \ne (i,j)$

1.7) Operations on matrices:

a) sum of matrices: $A+B$

b) multiplication of a matrix with a number: $\lambda A$

c) transposed matrix: $A^t$

1.8) Property:

Obviously if $A = (a_{ij})_{m\times n}$ is an arbitrary matrix from $F_{m \times n}$ then $$A = \sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij} \cdot E_{ij}$$

1.9) Definition:

a) The matrices which have a single row (or a single column) are called row matrices (or column matrices), or n-tuples, or also n-dimensional vectors.

b) The set of all n-tuples with elements from the number field $F$ is denoted by $F^n$

c) The n-tuples

$e_1 = (1, 0, 0, \dots, 0, 0),\ e_2 = (0, 1, 0, \dots, 0, 0),\ e_3 = (0, 0, 1, \dots, 0, 0),\ \dots\ ,\ $

$e_{n-1} = (0, 0, 0, \dots, 1, 0),\ e_{n} = (0, 0, 0, \dots, 0, 1)$

are called unit vectors or unit n-dimensional vectors.

1.10) Note: If $a = (a_1, a_2, \dots, a_n) \in F^n$ then obviously $a = a_1 e_1 + a_2 e_2 + \dots + a_n e_n$ This expression for $a$ is called a linear combination. In this particular case, $a$ is linear combination of the unit vectors $e_1, e_2, \dots, e_n$

1.11) Properties of the transposed matrix:

a) $(A^t)^t = A$

b) $(A+B)^t = A^t + B^t$

c) $(\lambda A)^t = \lambda A^t$ (for any $\lambda \in F$)

A nice symmetric system of linear equations

Suppose we want to solve the following system of linear equations in which $a$ and $b_i$ are real parameters.

$$ax_1 + ax_2 + \dots + ax_{n-1} + (a+1)x_n = b_1$$

$$ax_1 + ax_2 + \dots + (a+1)x_{n-1} + ax_n = b_2$$

$$\dots$$

$$(a+1)x_1 + ax_2 + \dots + ax_{n-1} + ax_n = b_n$$

We can proceed as follows.

Denote

$$T = \sum_{i=1}^n b_i$$

$$S = \sum_{i=1}^n x_i$$

Then the given system of equations can be rewritten as

$$aS + x_n = b_1$$

$$aS + x_{n-1} = b_2$$

$$\dots$$

$$aS + x_2 = b_{n-1}$$

$$aS + x_1 = b_n$$

Then it follows that

$$x_1 - b_n = x_n - b_1$$

$$x_2 - b_{n-1} = x_n - b_1$$

$$\dots$$

$$x_{n-1} - b_2 = x_n - b_1$$

$$x_n - b_1 = x_n - b_1$$

Let's call these equations the differences equations.

When we sum up these equations we get that

$S-T = nx_n - nb_1$

So it follows that

$aS-aT = anx_n - anb_1$

$(b_1 - x_n)-aT = anx_n - anb_1$

$b_1(1 + an)-aT = (1 + an)x_n$

Note that in this equation only $x_n$ is unknown. Also, we can get analogical equations for all unknowns $x_i$ ($i=1,2,\dots,n)$. They look like this

$$(1 + an)b_{n+1-i}-aT = (1 + an)x_i \tag{*}$$

We note here that the equations $(*)$ (for $i=1,2,\dots,n$) are just a subsequence of the original system. They are not necessarily equivalent to the original system.

Now based on $(*)$ we can look at the possible cases for the parameters $a$ and $b_i$ (where $i=1,2,\dots,n$).

Case (1)

$1+an \ne 0$ i.e. $a \ne -1/n$

In this case from $(*)$ we find that

$$x_1 = b_{n} - \frac{aT}{1+an}$$

$$x_2 = b_{n-1} - \frac{aT}{1+an}$$

$$\dots$$

$$x_n = b_{1} - \frac{aT}{1+an}$$

A direct check (substituting the $x_i$ unknowns in the original system with these values) shows that these values are indeed a solution. So in this case the original system has a unique solution given by the formulas above.

Case (2)

$1+an = 0$ i.e. $a = -1/n$ and $aT \ne 0$

Note that since $a=-1/n$ the condition $aT \ne 0$ is in fact equivalent to $T \ne 0$.

So this case is the case when $a=-1/n$ and $T \ne 0$.

Obviously the equations $(*)$ cannot be satisfied in this case.

So in this case the system has no solutions.

Case (3)

$1+an = 0$ i.e. $a = -1/n$ and $aT = 0$

Note that since $a=-1/n$ the condition $aT = 0$ is in fact equivalent to $T=0$.

So this case is the case when $a=-1/n$ and $T = 0$.

Now $(*)$ is always satisfied. Going back to the differences equations we see that if we let $x_n=p$ we get

$$x_1 = p - b_1 + b_n$$

$$x_2 = p - b_1 + b_{n-1}$$

$$\dots$$

$$x_{n-1} = p - b_1 + b_2$$

$$x_{n} = p$$

Again, a direct check (substituting the $x_i$ unknowns in the original system with these values) shows that these values are indeed a solution. So in this case the system has infinite number of solutions which depend on a single parameter $x_n = p$.

No other cases are possible logically. So this completes the solution of the given system of linear equations.

An application of the rational root theorem

This problem is a nice application of the rational root theorem so I present it here together with a solution which I was able to construct.

Problem: Let $f(x) = a_0x^n + a_1x^{n-1} + a_2x^{n-2} + ... + a_{n-1}x + a_n$ be a polynomial with integer coefficients ($a_i \in \mathbb{Z}$). Here $n \ge 0, n \in \mathbb{Z}$. The number $\alpha = \frac{p}{q}$ is a rational root of $f(x)$, where $p \in \mathbb{Z}$, $q \in \mathbb{Z}$, $q \ne 0$, $(p,q) = 1$. Prove that $(mq-p)\ |\ f(m)$ for every number $m \in \mathbb{Z}$ such that $(mq-p) \ne 0$.

Solution: We will apply induction on the degree $n$ of $f(x)$.

1) Let $n=0$ i.e. $f(x) = a_0$ is a constant polynomial. Since the number $\alpha$ is a root, it follows that $a_0=0$ i.e. $f(x)$ is the constant $0$. But then $f(m)=0$ for every $m \in \mathbb{Z}$. So for each integer $m$ such that $(mq-p) \ne 0$ we trivially obtain that $(mq-p)\ |\ f(m)$. Thus we proved the statement in the case when $n=0$.

2) Let's also prove the statement when $n=1$. In this case $f(x) = a_0x + a_1$ and $a_0 \ne 0$. From the rational root theorem we get that $q\ |\ a_0$. Then $a_0 = q c_0$ where $c_0 \in \mathbb{Z}$. So $f(x) = c_0 q x + a_1$. Since $\frac{p}{q}$ is a root of $f(x)$ we get $c_0 q (p/q) + a_1 = 0$ which gives us that $c_0 p + a_1 = 0$. Now for $f(x)$ we get the following $f(x) = c_0qx + a_1 = c_0 (qx-p) + (c_0p + a_1)$. But as we saw $(c_0p + a_1) = 0$. So finally $f(x) = c_0 (qx-p) = c_0 (xq - p)$. From this expression we see that $f(m) = c_0 (mq-p)$ for every integer $m$. Then if $m$ is an integer such that $(mq-p) \ne 0$ we obviously get that $(mq-p)\ |\ f(m)$ which is what we wanted to prove.

3) Induction hypothesis. We now assume the statement is true for all polynomials $g(x)$ which satisfy the conditions of the problem statement, and have degrees $\le (n-1)$. Again, the statement is as follows: $(mq-p)\ |\ g(m)$ for every number $m \in \mathbb{Z}$ such that $(mq-p) \ne 0$

4) Now let $f(x) = a_0x^n + a_1x^{n-1} + a_2x^{n-2} + ... + a_{n-1}x + a_n$ be a polynomial of degree $n$ (where $n \ge 2$) which satisfies all conditions of the problem statement. Again from the rational root theorem we obtain that $q\ |\ a_0$ so we can write $a_0 = q c_0$, where $c_0 \in \mathbb{Z}$. Then

$f(x) = a_0x^n + a_1x^{n-1} + a_2x^{n-2} + ... + a_{n-1}x + a_n = $

$ = q c_0 x^n + a_1x^{n-1} + a_2x^{n-2} + ... + a_{n-1}x + a_n = $

$ = c_0(xq - p) x^{n-1} + (a_1 + c_0p)x^{n-1} + a_2x^{n-2} + ... + a_{n-1}x + a_n$

So if we denote $h(x) = (a_1 + c_0p)x^{n-1} + a_2x^{n-2} + ... + a_{n-1}x + a_n$ we obtain that

$f(x) = c_0(xq - p) x^{n-1} + h(x) \tag{*}$

Now we know $p/q$ is a root of $f(x)$. But obviously it's also a root of $c_0(xq - p) x^{n-1}$. So from equality $(*)$ it follows that $p/q$ is also a root of $h(x)$. This means that now $h(x)$ satisfies all the conditions of the problem statement (it has integer coefficients, $p/q$ is its root etc.). Also $h(x)$ has a degree which does not exceed $n-1$. So we can apply to it the induction hypothesis and when we do so, we get that $(mq-p)\ |\ h(m)$ for every integer $m$ such that $(mq - p) \ne 0$. Obviously it's also true that $(mq-p)\ |\ c_0(mq - p) m^{n-1}$ for every integer $m$ such that $(mq - p) \ne 0$. From the last two statements we obtain that $(mq-p)\ |\ (c_0(mq - p) m^{n-1} + h(m))$ for every integer $m$ such that $(mq - p) \ne 0$. But from $(*)$ this expression $(c_0(mq - p) m^{n-1} + h(m))$ is equal to $f(m)$. Thus we obtain that $(mq-p)\ |\ f(m)$ for every integer $m$ such that $(mq - p) \ne 0$. This completes the induction and therefore the statement is now completely proved.

Note 1: In the special case when $m = 1$ we get that $(p-q)\ |\ f(1)$ (unless $q = p = 1$ of course)

Note 2: In the special case when $m = -1$ we get that $(p+q)\ |\ f(-1)$ (unless $q = -p = \pm1$ of course)

A problem about scalar matrices $A = \lambda \cdot E$

I encountered this problem on MathSE

Even though this question was heavily downvoted, I think it's quite a nice problem. Here it is.

We are given that $A$ is a square $n \times n$ matrix which commutes with all invertible matrices of the same size $n$.

Prove that $A$ is the scalar matrix i.e. $A = \lambda E\ $ where $E$ is the identity matrix and $\lambda$ is a scalar/number.

Let's solve this problem for the case $n=3$. Solving for any $n$ is fully analogical to the case $n=3$.

Let's assume

$$A = \begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{bmatrix}$$

and let's assume it commutes with all invertible matrices of size 3.

Let's pick the matrix

$$B = \begin{bmatrix}1&0&0\\0&2&0\\0&0&3\end{bmatrix}$$

This one is obviously invertible as its determinant is equal to $6 = 3!$

Now we use the fact that $AB=BA$.

$$AB = \begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{bmatrix} \cdot \begin{bmatrix}1&0&0\\0&2&0\\0&0&3\end{bmatrix} = \begin{bmatrix}a_{11}&2a_{12}&3a_{13}\\a_{21}&2a_{22}&3a_{23}\\a_{31}&2a_{32}&3a_{33}\end{bmatrix}$$

$$BA = \begin{bmatrix}1&0&0\\0&2&0\\0&0&3\end{bmatrix} \cdot \begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{bmatrix} = \begin{bmatrix}a_{11}&a_{12}&a_{13}\\2a_{21}&2a_{22}&2a_{23}\\3a_{31}&3a_{32}&3a_{33}\end{bmatrix}$$

But these two resulting matrices must be equal. Comparing their respective elements, it's easy to see that we get the following. $$a_{ij} = 0,\ \ for\ \ all\ \ i \ne j \tag{1}$$

OK... So now our matrix $A$ gets the form

$$A = \begin{bmatrix}a&0&0\\0&b&0\\0&0&c\end{bmatrix} \tag{2}$$

Now let's pick another matrix $C$ and use the fact that $A$ commutes with $C$. We pick $C$ to be e.g. the Vandermonde matrix for the numbers 1,2,3 (the important part is to pick the alphas in the Vandermonde matrix to be all distinct numbers).

$$C = \begin{bmatrix}1^0&1^1&1^2\\2^0&2^1&2^2\\3^0&3^1&3^2\end{bmatrix}$$

We know that the determinant of $C$ equals $(2-1)(3-1)(3-2) = 2 \ne 0$ so $C$ is invertible.

Now in a similar way we use that $A$ commutes with $C$ i.e. $AC = CA$.

$$AC = \begin{bmatrix}a&0&0\\0&b&0\\0&0&c\end{bmatrix} \cdot \begin{bmatrix}1^0&1^1&1^2\\2^0&2^1&2^2\\3^0&3^1&3^2\end{bmatrix} = \begin{bmatrix}a&a&a\\b&2b&4b\\c&3c&9c\end{bmatrix}$$

$$CA = \begin{bmatrix}1^0&1^1&1^2\\2^0&2^1&2^2\\3^0&3^1&3^2\end{bmatrix} \cdot \begin{bmatrix}a&0&0\\0&b&0\\0&0&c\end{bmatrix} = \begin{bmatrix}a&b&c\\a&2b&4c\\a&3b&9c\end{bmatrix}$$

But these two resulting matrices must be equal. Comparing their first columns we finally get that $a=b=c$.

Thus our matrix $A$ finally gets the form

$$A = \begin{bmatrix}a&0&0\\0&a&0\\0&0&a\end{bmatrix} \tag{3}$$

which is exactly what we wanted to prove.

Various notes on math, programming, machine learning

Search This Blog

Inverse of a matrix

Elementary row/column operations on matrices

Rank of a system of vectors

Algebra Notes 3

Algebra Notes 2

Sufficient conditions for integrability

Algebra Notes 1

A nice symmetric system of linear equations

An application of the rational root theorem

A problem about scalar matrices $A = \lambda \cdot E$

Report Abuse