UNIVERSITY OF LJUBLJANA FACULTY OF COMPUTER AND INFORMATION SCIENCE LABORATORY FOR MATHEMATICAL METHODS IN COMPUTER SCIENCE AND INFORMATICS Damir Franetič, Peter Marijan Kink PROBLEMS IN MATHEMATICS 1 (WITH SOLUTIONS) Study material for the course Mathematics 1 (63506A) Ljubljana, 2024 Problems in Mathematics 1 (with solutions) Study material for the course Mathematics 1 (63506A) Authors: Damir Franetič, Peter Marijan Kink Reviewers: Polona Oblak, Žiga Virk Published by: University of Ljubljana Press (Založba Univerzev Ljubljani) For the publisher: Gregor Majdič, The Rector of the University of Ljubljana Issued by: Faculty of Computer and Information Science, University of Ljubljana For the issuer: Mojca Ciglarič, The Dean of Faculty of Computer and Information Science, University of Ljubljana Ljubljana, 2024 Digital copy of the book is available at: https://ebooks.uni-lj.si/ First e-edition Publication is free of charge DOI:10.51939/9789612973186 This work is licensed under CC BY-NC-SA 4.0. To view a copy of this li-cense, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID 191876355 ISBN 978-961-297-318-6 (PDF) Preface The following pages contain a representative selection of tasks and problems which appeared at problem sessions, midterms and final (computational) exams at the course Mathematics 1 for the first year students of master’s study programme at the Faculty of Computer and Information Science, University of Ljubljana. In addition to the two authors, Polona Oblak also contributed many problems and ideas for problems. Uroš Kozole helped with suggestions for im-provements and a thorough inspection of the first typed solutions. The entire text was reviewed by Polona Oblak and Žiga Virk. Of course, that doesn’t mean that the solutions in front of you are error-free. If you find errors, please bring them to the authors’ attention. Individual tasks in each section are in no specific order. The idea is that you, the reader, attempt to solve an exercise regardless of its perceived difficulty. Nonetheless, the solution to each problem is linked at the right edge, via e.g. (solution 3.17), while each solution contains a link back to the problem, e.g. . . . to problem 3.17. While immediate reading of the solutions may seem tempting, arriving at the solution by yourself is much more rewarding. 3 Contents 1 Linear algebra 5 1.1 A recollection of basic concepts . . . . . . . . . . . . . . . . . . . 5 1.2 Schur, Frobenius, Eckart–Young . . . . . . . . . . . . . . . . . . . 6 2 Vector spaces and linear maps 11 2.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Functions of several variables 16 3.1 Multiple integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Local extrema of real multivariate functions . . . . . . . . . . . . 18 3.3 Constrained extrema . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Solutions 22 4 Chapter 1 Linear algebra 1.1 A recollection of basic concepts Problem 1.1 (solution 1.1, page 22) Determine the eigenvalues and corresponding eigenvectors of the matrix  0 2 2    A =      3 1 −3     −1 −1 3  Problem 1.2 (solution 1.2, page 23) Determine the eigenvalues and orthonormal bases of corresponding eigenspaces for the matrix 1 1 1 0      1 0 0 1  H =         1 0 0 −1     0 1 −1 1  Problem 1.3 (solution 1.3, page 25) You are given an n × n matrix  0 1 · · · 1       1 0 · · · 0    A =      . . .  ,  .   . . . .   . . . .       1 0 · · · 0  i.e., the adjacency matrix of a (undirected) star graph. (a) Determine bases of N ( A) and C( A), i.e., bases of the nullspace and the column space of A. 5 CHAPTER 1. LINEAR ALGEBRA 6 (b) Determine the eigenvalues and eigenvectors of A. Hint: Why is N ( A) an eigenspace of A? Why are eigenspaces of A other than N ( A) contained in N ( A)⊥? Problem 1.4 (solution 1.4, page 28) The following is known about a symmetric matrix A ∈ 4×4 R : A has 3 as a double eigenvalue and it interchanges the vectors 0 1         1 0 v     1 =       and v   ,   2 =   1 0         0 1 i.e. A v1 = v2 and A v2 = v1. Find such a matrix A or prove that it does not exist. Problem 1.5 (solution 1.5, page 29) Let A be an n × n matrix. One way to define the exponential of the matrix A is ∞ X 1 eA := Ak k! k=0 (we substituted x ∈ R with A in the Taylor series for the function ex). (a) Prove the identity det( eA) = e tr( A) (b) Assume A is an antisymmetric matrix, meaning A T = − A. Show that eA is an orthogonal matrix with determinant equal to 1. 1.2 Schur decomposition, Frobenius norm, Eckart– Young theorem Problem 1.6 (solution 1.6, page 31) Determine one Schur decomposition for each of the matrices 6 −1 1  2 −1 0     A =        0 1 0 4 3 1 and B =   .    √ √      2 2 3 − 2 − 2 2 Problem 1.7 (solution 1.7, page 34) Let A be an arbitrary matrix and let U and V be orthogonal matrices, so that one can form the product U AV . Prove that the following equalities hold: CHAPTER 1. LINEAR ALGEBRA 7 1. ∥ U A∥F = ∥ A∥F, 2. ∥ AV ∥F = ∥ A∥F, 3. ∥ U AV ∥F = ∥ A∥F. Problem 1.8 (solution 1.8, page 35) Denote by ⟨ A, B⟩F := tr( A T B) the Frobenius inner product of matrices A,B ∈ √ m× n R , and denote by ∥ A∥ ⟨ F := A, A⟩F the corresponding Frobenius norm. Prove: 1. the Cauchy–Schwarz inequality, |⟨ A, B⟩ | ≤ ∥ ∥ F A∥F B∥F, 2. the triangle inequality, ∥ A + B∥ ≤ ∥ F A∥F + ∥ B∥F, 3. submultiplicativity, ∥ AB∥ ≤ ∥ ∥ F A∥F B∥F, 4. multiplicativity for the Kronecker product, ∥ A ⊗ B∥ ∥ F = ∥ A∥F B∥F. Problem 1.9 (solution 1.9, page 37) Let I be the 2 × 2 identity matrix. Find an orthonormal basis of orthogonal complement of I (the vector subspace I⊥ ⊆ 2×2 R ) with respect to the (Frobe- nius) inner product ⟨ A, B⟩F := tr( A T B). Problem 1.10 (solution 1.10, page 37) Find rank 1 matrices closest to the matrices 2 0 0 "1 3# "2 0#   (b) , (c) (a)     0 −3 0, 3 1 0 2     0 0 1 with respect to the Frobenius norm. Are those rank 1 matrices unique? Problem 1.11 (solution 1.11, page 40) Prove the following: (a) If the set {u n j : j = 1 , . . . , q} ⊆ R is linearly independent, then, whenever q X x ⊗ mn j u j = 0 ⊗ 0 = 0 ∈ R j=1 holds for vectors x ∈ m 1 , . . . , x q R , the vectors x j are necessarily zero, i.e., x m 1 = · · · = x q = 0 ∈ R . (b) Given linearly independent sets of vectors {v m i : i = 1 , . . . , p} ⊆ R and {u n j : j = 1 , . . . , q} ⊆ R , the set {v ⊗ mn i u j : i = 1 ,...,p ; j = 1 ,...,q} ⊆ R is also linearly independent. CHAPTER 1. LINEAR ALGEBRA 8 Problem 1.12 (solution 1.12, page 41) Let A ∈ m× m n× n R and B ∈ R . Show that the Kronecker sum A ⊕ B := A ⊗ I ⊗ n + Im B has the property: Eigenvalues of A ⊕ B are all possible sums of the form λi + µj, where λ 1 ,...,λm are eigenvalues of A, and µ 1 ,...,µn eigenvalues of B. Use this to find eigenvalues and eigenvectors of A ⊕ B, where "−1 2# "1 0# A = and B = . 0 3 2 2 Problem 1.13 (solution 1.13, page 41) Let "2 2 # A = . 2 −1 (a) Determine a diagonal matrix D and an orthogonal matrix U such that A = U DU T. (b) Explain why for an orthogonal matrix U the matrix U ⊗ U is also orthogonal. (c) Find matrices of rank 1 and 2 which are best approximations to the Kronecker product A ⊗ A with respect to the Frobenius norm. Problem 1.14 (solution 1.14, page 43) You are given matrices "−2 2# "−3 2# A = and B = . 2 1 2 0 (a) Diagonalize matrices A and B. Write down both corresponding diagonal and transition matrices. (b) Let u be an eigenvector of A and v and eigenvector of B. Prove that u ⊗ v is an eigenvector of A⊗ B+ A⊗ I + I ⊗ B. Additionally, find all eigenvalues of A ⊗ B + A ⊗ I + I ⊗ B. (c) Find a rank 1 matrix M, which is closest to the matrix A⊗ B+ A⊗ I + I ⊗ B with respect to the Frobenius norm. Problem 1.15 (solution 1.15, page 44) Find the eigenvalues and corresponding eigenvectors of the matrix A⊗ A+ A 2⊗ I, where A is the matrix "−1 3 # A = . 3 −1 CHAPTER 1. LINEAR ALGEBRA 9 Problem 1.16 (solution 1.16, page 45) The objective of this exercise is to express the Sylvester matrix equation AX + XB = C in the ‘usual’ form ( ˆ A x = b) using the vec operator and then solve this equation. (a) Verify that the matrix equation AX + XB = C in the unknown matrix X is equivalent to the linear system ( B T ⊕ A)vec( X) = vec( C) in the unknown column vec( X). (b) Let A and B be 2 × 2 matrices "−1 2# "1 0# A = and B = . 0 3 2 2 Does AX + XB = 0 posses a non-trivial solution? (You need to answer quickly! Do not attempt to solve the corresponding linear system. . . ) (c) Find a matrix X which solves "−2 1# AX + XB = . 2 5 Problem 1.17 (solution 1.17, page 46) Let A ∈ n× n R be a matrix with only nonnegative eigenvalues. (a) Prove that A is invertible if and only if all of its eigenvalues are (strictly) positive. (b) Assume that A is invertible. Prove that all of its eigenvalues are positive if and only if all of the eigenvalues of A−1 are positive. (c) Assume A T = A. Prove that there exists a matrix S, with only nonnegative √ eigenvalues and S 2 = A holds. We denote such matrix S as S = A. Problem 1.18 (solution 1.18, page 46) You are given the matrix 2 3 1   A =     3 6 3 .     1 3 2 (a) Check that A is positive semidefinite. (b) Find all eigenvalues and corresponding eigenvectors of A. √ (c) Find A. CHAPTER 1. LINEAR ALGEBRA 10 Problem 1.19 (solution 1.19, page 47) Find the Cholesky decomposition ( A = LL T, where L is lower triangular) of the matrix  1 2 −1   A =      2 8 2      −1 2 6  using (recursive) algorithm below: Write a symmetric matrix A ∈ n× n R in the block form " a # A 11 bT 1 := A = b B and define √ " a # 11 0T L 1 := . √1 b I a n−1 11 Then " a # "1 0T # A 11 bT 1 = = L L T b B 1 0 B − 1 bbT 1. a 11 Repeat this on the symmetric matrix A ( n−1)×( n−1) 2 := B − 1 bbT ∈ R . a 11 Let L 2 ,L 3 ,...,Ln be the matrices obtained in repeated steps. The matrix L is then "1 0T# " I # L = L · · · · n−1 0 1 . (1.2) 0 L 2 0T Ln Problem 1.20 (solution 1.20, page 48) Is any of the matrices  1 −1 5  −4 6 −2      A =     −     1 4 −5 and B =  6 −10 5           5 −5 2  −2 5 −14 negative definite? For each negative definite matrix X ∈ { A, B} find the Cholesky decomposition of − X. Chapter 2 Vector spaces and linear maps 2.1 Vector spaces Problem 2.1 (solution 2.1, page 49) Which subsets of the vector space n× n R are vector subspaces? Determine the dimension of those that are. (a) All matrices, which have 0 as the (1 , 2)-entry. (b) All matrices, which have 1 as the (1 , 2)-entry. (c) All matrices with integer entries, i.e., for A = [ a ∈ ij ] we have aij Z (for all indices i, j). (d) All upper-triangular matrices. (e) All symmetric matrices; A = A T. (f) All antisymmetric matrices; A = − A T. (g) All invertible matrices; the subset GL( n, n× n R) ⊆ R . (h) All matrices with determinant 0, i.e., n× n R ∖ GL( n, R). (i) All nilpotent matrices, i.e., matrices N ∈ n× n R such that N n = 0. (j) All upper-triangular nilpotent matrices. (Hint: Which elements appear on the diagonal of an upper-triangular nilpotent matrix?) (k) All matrices with trace 0. Problem 2.2 (solution 2.2, page 52) Equip the open interval + R = (0 , ∞) with the operation x ⊕ y := xy, and define α ⊙ x = xα for scalars α ∈ R. (a) Show that ( + R , ⊕ , ⊙) is a vector space over R. (b) Find a basis for ( + R , ⊕ , ⊙) and determine its dimension. Problem 2.3 (solution 2.3, page 53) 11 CHAPTER 2. VECTOR SPACES AND LINEAR MAPS 12 Let F be the set of all Fibonacci sequences, i.e., sequences ( an)∞ n=0 = ( a 0 , a 1 , a 2 , . . . ), where a 0 and a 1 are arbitrary real numbers, and an = an−1 + an−2 holds for all n ≥ 2. (a) Show that F is a vector space under operations ( an) + ( bn) := ( an + bn) and α( an) := ( αan), where α ∈ R. (b) Find a basis for F and express the usual Fibbonacci sequence (the one with a 0 = a 1 = 1) in this basis. Problem 2.4 (solution 2.4, page 54) Let N be the matrix "0 0# N = . 1 0 Show that the set of all real 2 × 2 matrices which commute with N , i.e., U = { A ∈ 2×2 R : AN = N A}, is a vector subspace in 2×2 R . Find a basis for U and determine its dimension! Problem 2.5 (solution 2.5, page 54) (a) Is the set U 1 = { p( x) = ax + b : a , 0 ,a,b ∈ R} a vector subspace in the vector space of polynomials R1[ x]? (b) Is the set U 2 = { p( x) : p(0) = 0} a vector subspace in the vector space of polynomials R2[ x]? (c) Is the set U 3 = { p( x) : p(0) = 1} a vector subspace in the vector space of polynomials R n[ x]? (d) Is the set U 4 = { p( x) : p′′(3) = 0 } a vector subspace in the vector space of polynomials R n[ x]? Problem 2.6 (solution 2.6, page 55) Let A be the matrix "1 −1# A = . 0 −1 Define subsets V := { X ∈ 2×2 2×2 R : XA + AX = 0} and W := { X ∈ R : XAX = X}. of the vector space 2×2 2×2 R . Which of these subsets are vector subspaces of R ? Why/why not? For each subset that is a vector space find its basis and determine its dimension. CHAPTER 2. VECTOR SPACES AND LINEAR MAPS 13 Problem 2.7 For a polynomial p( x) = ax 3 + bx 2 + cx + d and a square matrix A denote p( A) = aA 3 + bA 2 + cA + dI. Let A ∈ 2×2 R be the matrix "1 2# A = . 2 1 Let U ⊆ R3[ x] be the subset of those polynomials (of degree at most 3), for which p( A) = 0 (the zero matrix). (a) Show that U is a vector subspace of R3[ x]. (b) Find a basis for U and determine dim U . (Hint: If ∆ A( x) is the characteristic polynomial of A, then ∆ A( A) = 0.) (c) Let q( x) = x( x 2 − 2 x − 3). Is the set of all 2 × 2 matrices X, for which q( X) = 0 holds, a vector subspace of 2×2 R ? Justify your answer! Problem 2.8 (solution 2.8, page 55) Let R[ x] be the set of all polynomials in the indeterminate x. (Hence, R[ x] contains polynomials of arbitrary degrees!) Show that R[ x] is a vector space for the usual addition of polynomials and multiplication of a scalar and a polynomial. Can you describe a basis for R[ x]? Can you find a basis for the subspace W = { p ∈ R[ x] : p(1) = p(−1) = 0}? Determine dim R[ x] and dim W . Problem 2.9 (solution 2.9, page 56) Let V ⊆ C∞(0 , 2 π) be the set of all solutions to the differential equation y′′ + y = 0. Show that V is a vector subspace of C∞(0 , 2 π). Find its basis. 2.2 Linear maps Problem 2.10 (solution 2.10, page 56) A map τ : 2×2 2×2 R → R is given by "1 1# "1 1# τ( X) = X + X . 1 0 1 0 (a) Show that τ is a linear map. (b) Find the matrix corresponding to τ with respect to the standard basis { E } 2×2 11 , E 12 , E 21 , E 22 of the vector space R . CHAPTER 2. VECTOR SPACES AND LINEAR MAPS 14 Problem 2.11 (solution 2.11, page 57) For a polynomial p( x) = ax 3 + bx 2 + cx + d, and a square matrix A denote p( A) = aA 3 + bA 2 + cA + dI. Let A ∈ 2×2 R be the matrix "1 2# A = . 2 1 (a) Show that the map given by φ : 2×2 R3[ x] → R , φ( p) = p( A) is linear and determine the matrix corresponding to φ in the standard bases of spaces 2×2 R3[ x] and R . (b) Find a basis for ker φ and determine dim(ker φ). (Hint: If ∆ A( λ) is the characteristic polynomial of A, then ∆ A( A) = 0.) (c) Let q( x) = x( x 2 − 2 x − 3). Is the set of all 2 × 2 matrices X, for which q( X) = 0 holds, a vector subspace of 2×2 R ? Problem 2.12 (solution 2.12, page 58) We are given vectors a = [1 , 1 , 0]T, b = [1 , 0 , 1]T, and c = [0 , 1 , 1]T in 3 R , and a linear map τ : 3 3 R → R for which τ(a) = a , τ(b) = a + b, and τ(c) = a + c holds. (a) Show that {a , b , c} is a basis of 3 R . (b) Find the matrix for τ in the basis B = {a , b , c}. (c) Find the matrix for τ in the standard basis S = {i , j , k}. (d) Where does the vector [1 , 1 , 1]T get mapped by τ? Problem 2.13 (solution 2.13, page 59) Let R3[ x] be the vector space of polynomials p of degree at most 3. (a) Check that the map φ : 3 R3[ x] → R , φ( p) := [ p(−1) , p(0) , p(1)]T is linear. (b) Find a basis Bker φ for the kernel ker φ of the map φ. (c) Find the matrix corresponding to φ in the basis {1 , x, x 2 , x 3} of R3[ x] and the standard basis of 3 R . Problem 2.14 (solution 2.14, page 60) A map ψ : R2[ x] → R2[ x] is given by ( ψ( p))( x) = ( xp( x + 1))′ − 2 p( x) . Show that ψ is linear. Find its matrix in the basis {1 , x, x 2}. Find its kernel and image. CHAPTER 2. VECTOR SPACES AND LINEAR MAPS 15 Problem 2.15 (solution 2.15, page 61) Let a = [1 , 1]T. A map φ : 2 2×2 R → R is given by φ(x) = x aT = x [1 , 1]. (a) Show that φ is linear. (b) Find the matrix corresponding to φ in the standard bases of 2 R and 2×2 R . (c) Determine dim(ker φ) and dim(im φ). (d) Find a basis of im φ. Problem 2.16 (solution 2.16, page 62) Let B = {1 , 1 − x 2 , 1 + x 4} and V = L(B) ⊂ R4[ x]. Define a map " p(−1) p′(−1)# φ : V → 2×2 R by φ( p) = . p(1) p′(1) (a) Show that φ is a linear map. (b) Find the matrix Aφ corresponding to φ with respect to the basis B of V and the standard basis of 2×2 R . (c) Determine vector space bases of the kernel ker( φ) and the image im( φ). Problem 2.17 (solution 2.17, page 63) Assume that U and V are vector subspaces of a vector space W . Define sets: U × V := {( u, v) : u ∈ U , v ∈ V }, U + V := { u + v : u ∈ U and v ∈ V }, and U ∩ V := { w ∈ W : w ∈ U and w ∈ V }. (a) Verify that U + V and U ∩ V are vector subspaces of W . (b) ‘Guess’ the appropriate vector space structure on U × V . Prove that U × V is actually a vector space in the guessed case! Determine dim( U × V ) from dim U and dim V . (c) Let a map φ : U × V → W be given by φ( u, v) = u − v. Confirm that φ is linear. (If it turns out, that it is not, return to part (b).) Determine ker φ and im φ. (d) Show that the map ψ : U ∩ V → ker φ, ψ( w) = ( w, w) is a linear bijection, therefore dim( U ∩ V ) = dim(ker φ). (e) Conclude that dim U + dim V = dim( U + V ) + dim( U ∩ V ). Chapter 3 Functions of several variables 3.1 Multiple integrals Problem 3.1 (solution 3.1, page 64) Let a vector–valued function F : 2 2 R → R be given by F(r) = F( r, ϕ) = [ x, y]T = x, where x = r cos ϕ, y = r sin ϕ, the so-called polar coordinates. Find the Jacobian matrix J F = ∂ F and the Jaco- ∂ r bian determinant det( J F) of F. Problem 3.2 (solution 3.2, page 64) Let a vector–valued function F : 3 3 R → R be given by F(r) = F( r, ϕ, z) = [ x, y, z]T = x, where x = r cos ϕ, y = r sin ϕ, z = z, the cylindrical coordinates. Find the Jacobian matrix J F = ∂ F and the Jacobian ∂ r determinant det( J F) of F. Problem 3.3 (solution 3.3, page 64) Let F : 3 3 R → R , r 7→ x be a vector–valued function given by F( r, ϕ, θ) = [ x, y, z]T, where: x = r cos θ cos ϕ, y = r cos θ sin ϕ, z = r sin θ, 16 CHAPTER 3. FUNCTIONS OF SEVERAL VARIABLES 17 the spherical coordinates. Find the Jacobian matrix J F = ∂ F and the Jacobian ∂ r determinant det( J F) of F. Problem 3.4 Let R ≥ 0, and let F : 3 3 R → R be a vector–valued function given by ( R + r cos θ) cos ϕ   F( r, ϕ, θ) = F [ r, ϕ, θ]T =     ( R + r cos θ) sin ϕ  ,      r sin θ  the toroidal coordinates. (a) Find the Jacobian matrix F; J F = ∂ F of F. ∂[ r,ϕ,θ]T (b) Find the determinant of that Jacobian matrix; det( J F). Problem 3.5 (solution 3.5, page 65) Evaluate double integrals below. " (a) (5 − x − y) dxdy, where D = [0 , 1] × [0 , 1], D " y (b) dxdy, where D is given by x ≥ 0, y ≥ x in x 2 + y 2 ≤ 2, D x + 1 " sin x (c) dxdy, where D is the triangle given by 0 ≤ y ≤ x, and x ≤ π, D x " Z ∞ (d) e− x 2− y 2 dxdy, and use this to evaluate e− x 2 dx. 2 R −∞ Problem 3.6 Sketch the domain of integration and evaluate the integrals. Z 1  Z x  1.      xeydy dx,   0  − x  √ √ Z 1  Z y    y Z 2 Z 2− y 2 y 2.          dx dy +  dx dy.     0  0 x + 1  1  0 x + 1  Problem 3.7 (solution 3.7, page 66) What is the volume of the solid bounded by the paraboloid z = 8 − x 2 − y 2 and the plane z = −1? Problem 3.8 (solution 3.8, page 66) Find the coordinates of the center of mass of the quarter of a disk given by inequalities x 2 + y 2 ≤ R 2, x ≥ 0, y ≥ 0 if the density at each point is equal to the distance from the origin, i.e., ρ( x, y) = p x 2 + y 2. CHAPTER 3. FUNCTIONS OF SEVERAL VARIABLES 18 ! Hint: the mass of a figure D ⊆ 2 R is given by m = ρ( x, y) dxdy, coordinates ! D ! of the center of mass are x∗ = 1 xρ( x, y) dxdy and y∗ = 1 yρ( x, y) dxdy. m D m D Use polar coordinates. Problem 3.9 (solution 3.9, page 67) Determine the mass and the coordinates of the center of mass of a homogeneous solid (i.e., ρ( x, y, z) = 1) bounded by surfaces z 2 = x 2 + y 2 and x 2 + y 2 + z 2 = 4, which lies in the half-space z ≥ 0. Hint: Use spherical coordinates. Problem 3.10 (solution 3.10, page 68) Determine the mass and the coordinates of the center of mass of a ball given by inequality x 2 + y 2 + z 2 ≤ 2 z if the density at every point is equal to the distance from the origin. Hint: Use spherical coordinates. Problem 3.11 (solution 3.11, page 69) A solid D ⊆ 3 R is bounded by parabolic cylinders z = 2− x 2 and z = y 2−2. Determine the volume and mass of this solid if the density is given by ρ( x, y, z) = y 2. Hint: Find the (orthogonal) projection of this solid onto the xy-plane, use cylindrical coordinates. 3.2 Local extrema of real multivariate functions Problem 3.12 (solution 3.12, page 69) Find and classify the stationary points of functions below. (a) f ( x, y) = x 3 − 4 x 2 + 2 xy − y 2 (b) g( x, y) = xex + 2 yey + 1 (c) h( x, y) = (1 + ey) cos x − yey (d) k( x, y, z) = x 3 + y 3 + 3 z 2 − 3 xyz (e) r( x, y, z) = x 2 + y 2 + z 2 − 2 xyz (f) u( x, y) = 4 + x 3 + y 3 − 3 xy (g) v( x, y) = 3 x 2 y + y 3 − 3 x 2 − 3 y 2 Problem 3.13 (solution 3.13, page 73) Given a , b ∈ n R let f (x) = (xTa)(xTb). (a) Evaluate ∂f and ∂ 2 f . ∂ x ∂ x2 (b) Additionally assume that a and b are nonzero and orthogonal. What is the type of any stationary point of f in this case? CHAPTER 3. FUNCTIONS OF SEVERAL VARIABLES 19 Problem 3.14 (solution 3.14, page 74) Find the vector x ∈ n R for which the sum of squared distances from given vectors a ∈ n 1 , . . . , a k R is the smallest possible. 3.3 Constrained extrema Problem 3.15 (solution 3.15, page 74) Find the points in the domain described by the inequality 4( x − 1)2 + y 2 ≤ 16, at which the largest and the smallest values of the function f ( x, y) = 2 x 2 + y 2 are attained. Problem 3.16 (solution 3.16, page 76) Let T be the triangle which is the intersection of the first octant and the plane given by x + y + z = 5. At which point on this triangle is the largest value of the function g( x, y, z) = xy 2 z 2 attained? Problem 3.17 (solution 3.17, page 77) Find all points on the ellipse given by x 2 − xy + y 2 = 3, which are farthest from the origin. Problem 3.18 (solution 3.18, page 78) Which points on the curve given implicitly by ( x 2 + y 2)2 = x 3 + y 3 are farthest from the origin? Problem 3.19 (solution 3.19, page 78) Find the largest and the smallest value of the function f ( x, y) = xy − y + x − 1 (a) on the disk given by x 2 + y 2 ≤ 2, (b) on the half-disk given by x 2 + y 2 ≤ 2 and x ≥ 0. Problem 3.20 (solution 3.20, page 80) CHAPTER 3. FUNCTIONS OF SEVERAL VARIABLES 20 An ellipsoid is given by the equation x 2 y 2 z 2 + + = 1. a 2 b 2 c 2 A box with edges parallel to x, y, and z axes is inscribed inside this ellipsoid. (a) What is the largest possible volume of the inscribed box? (b) What is the largest possible surface area of the inscribed box? If you get stuck with the general case, assume c = b, i.e., an ellipsoid of revolution. Problem 3.21 (solution 3.21, page 84) We are given an ℓ meters long thin rod. We cut it into 12 shorter rods, from which a frame of a box can be assembled. (a) How long should those shorter rods be if the box is to have the largest possible volume? (b) Same question as above with an additional restriction – the base rectan-gle should have area equal to A. Problem 3.22 (solution 3.22, page 86) We wish to assemble a frame of a triangular prism (equilateral base triangle) from an ℓ metres long thin rod. (a) What should be the length of the pieces we cut our rod into, for the prism to have the largest possible volume? (b) What should be the length of the pieces we cut our rod into, for the prism to have the largest possible surface area? Problem 3.23 (solution 3.23, page 87) Let a ∈ n R and let d ≥ 0 be a real number. (a) Find the largest and least value of the expression aTx for a vector x ∈ n R with prescribed length ∥x∥ = d. (b) Explain the solution. Problem 3.24 (solution 3.24, page 88) Assume A ∈ n× n R , and let d be a positive real number. (a) Find the largest and least value of f (x) = xT A x with constraint ∥x∥ = d. (b) Find the largest and least value of f (x) = ∥x∥2 with constraint xT A x = d 2 if A is symmetric positive definite. Problem 3.25 (solution 3.25, page 89) Let A ∈ n× n n R , b , p ∈ R , and d > 0 a real number. (a) Minimize f (x) = ∥x∥2 with respect to ∥x − p∥ ≤ d. (b) Minimize f (x) = ∥x∥2 with respect to A x = b. CHAPTER 3. FUNCTIONS OF SEVERAL VARIABLES 21 (c) Minimize f (x) = ∥x∥2 with respect to ∥x − p∥ ≤ d and A x = b. Chapter 4 Solutions Solution to problem 1.1, page 5: First we evaluate the characteristic polynomial of A. − λ 2 2 − λ 2 2 − λ p A( λ) = det( A − λI ) = 3 1 − λ −3 = 3 1 − λ 0 −1 −1 3 − λ −1 −1 2 − λ 1 − λ 3 0 1 − λ 3 = 3 1 − λ 0 = (2 − λ) 3 1 − λ −1 −1 2 − λ = (2 − λ)((1 − λ)2 − 9) = (2 − λ)(4 − λ)(−2 − λ). The eigenvalues are the roots of the characteristic polynomial (indexed in decreasing order) λ 1 = 4, λ 2 = 2, λ 3 = −2. To find the corresponding eigenvectors we determine the null space N ( A − λI) = {v ∈ 3 R : ( A − λI)v = 0} for each eigenvalue λ, which can be done using Gaussian elimination. • For N ( A − λ 1 I) we compute −4 2 2  1 1 1  1 1 1  1 0 0                   ∼   ∼   ∼    3 −3 −3 1 −1 −1 0 −2 −2 0 1 1 .                 −1 −1 −1 2 −1 −1 0 3 3  0 0 0 The homogenous system of equations for components of the vector [ x, y, z]T is thus simplified to x = 0, y + z = 0. We can take z to be the free variable and choosing z = 1 gives an eigenvector for λ 1 = 4:  0    v −  1 =    1 .      1  22 CHAPTER 4. SOLUTIONS 23 • For N ( A − λ 2 I) we have −2 2 2   1 −1 −1 1 −1 −1 1 0 −1                   ∼   ∼   ∼    3 −1 −3  3 −1 −3 0 2 0  0 1 0  .                 −1 −1 1  −1 −1 1  0 −2 0  0 0 0  The equations are x − z = 0 and y = 0. Choosing z = 1 gives 1   v   2 =   0 .     1 • For N ( A − λ 3 I) we have  2 2 2   1 1 1  1 1 1  1 1 0                   ∼   ∼   ∼    3 3 −3  1 1 −1 0 0 −2 0 0 1 .                 −1 −1 5  −1 −1 5  0 0 6  0 0 0 The equations are x + y = 0 , z = 0. This time y is the free variable and the choice y = 1 gives −1   v   3 =    1  .      0  Solution to problem 1.2, page 5: We know that it will be possible to find an orthogonal basis of 4 R consisting of eigenvectors of H because H is a symmetric matrix ( H T = H). Computing the characteristic polynomial for H we get 1 − λ 1 1 0 1 − λ 1 1 0 1 − λ 0 1 2 − λ 0 1 det( H − λI) = = 1 0 − λ −1 0 0 − λ −1 0 1 −1 1 − λ 1 − λ 1 −1 1 − λ 1 − λ 1 1 0 2 − λ 0 1 1 − λ 1 − λ −1 = · = 0 0 − λ −1 2 − λ −2 1 − λ 0 0 −2 1 − λ = (−(1 − λ) λ − 2)2 = (2 − λ)2(−1 − λ)2. In addition to the usual rules for computing determinants (addition of multiples of rows/columns to other rows/columns) we used the following rule for determinants of block upper–triangular matrices: " A C#! det = det( A) · det( B), 0 B where A, B, C, and 0 are matrices of appropriate dimensions. We therefore obtain two distinct eigenvalues λ 1 , 2 = 2 and λ 3 , 4 = −1 which are both double roots of the characteristic polynomial. Since H is symmetric, we know that it has a two-dimensional eigenspace for each eigenvalue (for general matrices this is not necessarily the case). Let’s proceed with eigenspaces and their bases: CHAPTER 4. SOLUTIONS 24 • Eigenspace for λ 1 , 2 = 2 is the nullspace N ( H − 2 I). We have −1 1 1 0  1 0 −2 −1          1 −2 0 1  0 1 −1 −1 H − 2 I =       ∼ · · · ∼       .      1 0 −2 −1 0 0 0 0           0 1 −1 −1 0 0 0 0  Denoting a general eigenvector as v = [ x 1 ,x 2 ,x 3 ,x 4]T, the equations for the components of v are x − − 1 2 x 3 x 4 = 0 ∴ x 1 =2 x 3 + x 4, x − − 2 x 3 x 4 = 0 ∴ x 2 = x 3 + x 4. Hence an eigenvector corresponding to λ 1 , 2 = 2 looks like 2 x      3 + x 4 2 1              x  1 1 v =  3 + x 4              = x   + x   .   3   4    x  1 0  3             x      4 0 1 Setting v } 1 = [2 , 1 , 1 , 0]T and v2 = [1 , 1 , 0 , 1]T, the set BN ( H−2 I) = {v1 , v2 is a basis of N ( H − 2 I). Note, however, that v1 and v2 are not orthogonal, so we don’t have an orthonormal basis yet. We can obtain an orthonormal basis via Gram–Schmidt orthogonalization: 1   u 1   1 u 2   2 := v2 and q2 = = √     , ∥u ∥   2 3 0     1  1   1  uTv         1  0  u 1  0  u − 2   1   1 := v1 u2 =   √     and q1 = =   . uTu      1  ∥u ∥  1  2 2   1 3       −1 −1 (Starting with index 2 produces ‘smaller’ square roots, we could just as well start with index 1.) So, we have an orthonormal basis B′ = N ( H−2 I) {q } 1 , q2 for N ( H − 2 I ). • Eigenspace for λ 3 , 4 = −1 is the nullspace N ( H + I). We have 2 1 1 0  1 0 1 −1         1 1 0 1  0 1 −1 2  H + I =       ∼ · · · ∼       .     1 0 1 −1 0 0 0 0          0 1 −1 2  0 0 0 0  Using the same notation as above, we obtain  x −  −    4 x 3 1 1              x − 2 x   1  −2 v =  3 4             = x   + x   .   3   4    x   1   0   3             x      4 0 1 CHAPTER 4. SOLUTIONS 25 Again, v3 = [−1 , 1 , 1 , 0]T and v4 = [1 , −2 , 0 , 1]T constitue a basis of N ( H + I), but they are not orthogonal. Let’s repeat the Gram–Schmidt orthogonalization: −1   u 1    1  u 3   3 := v3 and q3 = = √     , ∥u ∥   3 3  1       0   0   0  uTv         4 −1 u 1 −1 u − 3   4   4 := v4 u3 =   √     and q4 = =   . uTu      1  ∥u ∥  1  3 3   4 3        1   1  Now, B′ = {q } is an orthonormal basis of N ( H + I). N ( H+ I) 3 , q4 Solution to problem 1.3, page 5: (a) By performing Gaussian elimination  0 1 · · · 1   1 0 · · · 0               1 0 · · · 0   0 1 · · · 1              A =  1 0 · · · 0   0 0 · · · 0    ∼            . . . .   . . . .   . . . .   . . . .   .   .   . . .   . . .           1 0 · · · 0   0 0 · · · 0  or by simply looking at the matrix A it is clear that the column space C( A) is spanned by the first two column vectors. We denote these two vectors by 0 1         1 0     u =          .  and v =  .       .   .   .   .          1 0 so we can write C( A) = L({u , v}) and conclude that the dimension of C( A) equals 2. This implies that dim( N ( A)) = n − 2, which can also be seen from the row-reduced form of A above. To determine a basis for the null space N ( A), i.e., the subspace of the solutions to the equation A w = 0, we need to find n − 2 linearly independent solutions to the system of equations x 1 = 0, x 2 + ... + xn = 0, where the variables are the components of the vector w = [ x 1 ,...,xn]T. This system of equations can be obtained directly from the result of Gaussian elimination above or by observing that A is a symmetric matrix: We know that C( A)⊥ = N ( A T) = N ( A), so we can conclude that w ∈ N ( A) is equivalent to the condition that w is orthogonal to both u and v. Expressing these conditions via equations, w · u = 0 and w · v = 0, CHAPTER 4. SOLUTIONS 26 and writing these equations component-wise produces the same equa- tions as above. The ‘free’ variables of this system are x 3 ,...,xn which means we can choose n − 2 linearly independent solutions by setting xk = 1, xi = 0 for i , k for each choice of k = 3 , . . . , n. The solutions can then be written as  0   0   0   0                  −1 −1 −1 −1                                  1   0   0   0          w         3 =   , w   , . . . , w   , w    0  4 =  1  n−1 =  .  n =  .       .   .               .   .   .   .               .   .       .   .   1   0                   0   0   0   1  and hence one choice of a basis for N ( A) is the set of vectors {w } 3 , . . . , w n . (b) The subspace N ( A) is by definition the eigenspace of A for eigenvalue λ = 0 (if N ( A) is non-trivial). We can therefore immediately conclude that we have an ( n − 2)-dimensional eigenspace for eigenvalue λ = 0 spanned by the vectors {w } 3 , . . . , w n defined above. Since A is a symmetric matrix, we know that the algebraic multiplicity of each eigenvalue is equal to the dimension of corresponding eigenspace. This means we have (algebraically) two non-zero eigenvalues yet to be determined. It is possible to obtain the remaining two eigenvalues as usual by computing the characteristic polynomial directly and then solving the corresponding systems of equations to obtain the eigenvectors. As an alternative we will use the fact that A is a symmetric matrix. Since we know that the eigenspaces of a symmetric matrix for distinct eigenvalues are mutually orthogonal, and that the orthogonal complement of the eigenspace for λ = 0 is N ( A)⊥ = C( A T) = C( A) = L({u , v}), we can deduce that it must be possible to express the remaining two eigenvectors as linear combinations of the column vectors u and v. It therefore makes sense to see how the matrix A acts on the vectors u and v. Direct computation produces the equations A u = ( n − 1)v, (1.1) A v = u. There are now at least two ways to proceed from here. 1. Since we know that any eigenvector w of A for any non-zero eigenvalue λ can be written as a linear combination w = x u + y v, we can write the eigenvector equation A w = λ w using (1.1) as follows A w = A( x u + y v) = xA u + yA v = x( n − 1)v + y u = λ( x u + y v). The last equality can be rewritten as ( λx − y)u − ( x( n − 1) − λy)v = 0. Since u and v are linearly independent this implies that both coef- ficients above must equal 0, which yields a system of equations λx − y = 0, x( n − 1) − λy = 0. CHAPTER 4. SOLUTIONS 27 By expressing y = λx from the first equation and plugging into the √ second equation we get λ 2 = n − 1 or λ 1 , 2 = ± n − 1. (Of course this system also has an obvious solution for x = y = 0, but this is not a valid solution in this case because eigenvectors must be non-zero.) To obtain the eigenvectors for λ 1 , 2 we have freedom to choose any non-zero value for (say) x (because know that any nonzero multiple of an eigenvector is also an eigenvector for the same eigenvalue). We choose x = 1 and hence y = λ for each eigenvalue λ 1 , 2 to get √  n − 1   √      1    w   1 = x u + y v = u + n − 1v =    .   .     .       1  and √ − n − 1   √      1    w   2 = u − n − 1v =    .   .     .       1  as the remaining eigenvectors. 2. Another way is to use a little theory of linear maps (which are the subject of subsequent chapters). According to (1.1), we can view multiplication by A (restricted to the 2-dimensional subspace C( A)) as a linear map from C( A) to C( A). Using (1.1) we can represent this map in the basis {u , v} with the 2 × 2 matrix, which we denote by B; " 0 1# B = . n − 1 0 Theory tells us that the eigenvalues of a linear map do not depend on the choice of basis of the underlying vector space, which means that the eigenvectors of B must be same as the restriction of A to C( A). The characteristic polynomial for B is det( B − λI) = λ 2 − ( n − 1) = 0 √ which yields λ 1 , 2 = ± n − 1. Then the bases of the eigenspaces N ( B − λ 1 , 2 I) (expressed with respect to the basis {u , v}) can be obtained as usual via Gaussian elimination: " 1 # " 1 # w √ √ 1 = , w . n − 1 2 = − n − 1 Since the components of these vectors represent coefficients with respect to the basis {u , v} of C( A), we can express them as √ √ w1 = u + n − 1v , w2 = u − n − 1v. CHAPTER 4. SOLUTIONS 28 Solution to problem 1.4, page 6: The idea: Determine all eigenvalues and eigenvectors of A. Since A is symmetric, it can be diagonalized in an orthonormal basis, i.e., , written as A = QDQ T, where D is a diagonal matrix (with eigenvalues on the diagonal) and Q is orthogonal (with properly chosen eigenvectors as columns). So, once we have all eigenvalues and eigenvectors, we’ll just multiply QDQ T to get A. The simplest way to determine the remaining eigenvalues of A (apart from the double eigenvalue 3) is to notice what happens when we add and subtract the equations A v1 = v2 and A v2 = v1. By adding them we get A(v1 + v2) = v1 + v2 and by subtracting them we get A(v − − 1 v2) = −(v1 v2). From these two equations we can directly read two eigenvectors 1     1 u   1 = v1 + v2 =     for eigenvalue λ   1 = 1 1     1 and −1      1  u −   2 = v1 v2 =     for eigenvalue λ   2 = −1.  1      −1 An alternative way to determine these two eigenvectors would be similar to that in Exercise 1.3 (b): we could write a 2 × 2 matrix that represents multiplication by A on the subspace spanned by {v } 1 , v2 and compute λ 1 , λ 2 , u1 , u2 using the usual procedure. We also notice that the computed eigenvectors u1 and u2 are orthogonal, which is a necessary condition for A to be a symmetric matrix. If the computed u1 and u2 were not orthogonal we could conclude that a symmetric matrix A with the required properties does not exist. Now that we have all four eigenvalues λ 1 = 1 ,λ 2 = −1 (together with their eigenvectors) and λ 3 , 4 = 3, we only need to determine the corresponding eigenspace for λ 3 , 4 = 3. The requirement that A must be a symmetric matrix implies that the eigenspace N ( A−3 I) must be 2-dimensional and it must be orthogonal to the eigenspaces for λ 1 and λ 2. This means it is enough to find a basis for the orthogonal complement to L({u } 1 , u2 ) which is also 2-dimensional. The condition x ∈ L({u } · · 1 , u2 )⊥ can be described by equations u1 x = 0 and u2 x = 0, i.e., , a system of equations x 1 + x 2 + x 3 + x 4 = 0, − x − 1 + x 2 + x 3 x 4 = 0 for the components x = [ x 1 ,x 2 ,x 3 ,x 4]T. CHAPTER 4. SOLUTIONS 29 Another way to obtain this system of equations is to define the matrix 1 −1     1 1  U = [u   1 , u2] =       1 1      1 −1 and note that L({u } 1 , u2 )⊥ = C( U )⊥ = N ( U T) In any case, we can perform Gaussian elimination on the matrix U T to obtain "1 0 0 1# U T ∼ 0 1 1 0 or on the system above to directly to obtain the conditions x 1 + x 4 = 0, x 2 + x 3 = 0. By choosing the appropriate values for the ‘free’ variables x 3 and x 4 we can choose the following vectors for a basis of the eigenspace N ( A − 3 I) = N ( U T):  0  −1         −1  0  u     3 =       and u   .   4 =    1   0           0   1  By finding all the eigenvalues and eigenvectors we effectively have a diagonalisation of the matrix A. It remains to define the matrices D = diag(1 , −1 , 3 , 3) and P = [u1 , u2 , u3 , u4] and compute A = P DP −1. If we are computing this product by hand, it may be preferable to choose an orthonormal basis of eigenvectors (in order to avoid computing the inverse P −1) instead of {u } 1 , u2 , u3 , u4 . Luckily, all these vectors are already orthogonal, so we only need to normalize them. We can therefore define an orthonormal basis consisting of eigenvectors by 1 1 1 1 q1 = u u √ u √ u 2 1 , q2 = 2 2 , q3 = 3 , q4 = 4. 2 2 Now define the matrix Q = [q1 , q2 , q3 , q4] and evaluate  3 1 1 −3   1    1 3 −3 1  A = QDQ T =       . 2    1 −3 3 1      −3 1 1 3  Solution to problem 1.5, page 6: CHAPTER 4. SOLUTIONS 30 (a) We will discuss the general case below. Let us first assume that A is diagonalizable, so that we have A = P DP −1 where D = diag( λ 1 ,...,λn) is a diagonal matrix that contains the eigenvalues of A along the diagonal. The powers Ak of the matrix A can then be expressed as Ak = P DP −1 P DP −1 · · · P DP −1 = P DkP −1 where Dk = diag( λk 1 ,...,λkn) contains the powers of the eigenvalues along the diagonal. We use this expression in the power series for eA. ∞ ∞ X 1 X 1 eA = Ak = P DkP −1 k! k! k=0 k=0  ∞  ∞ ∞ X 1  X 1 X 1 = P    Dk P −1 = P · diag( λk λk   n) · P −1   1 , . . . ,  k!  k! k! k=0 k=0 k=0 = P · diag( eλ 1 , . . . , eλn ) · P −1 In other words, the matrix eA can be diagonalized in the same basis as A, and its eigenvalues are simply eλ 1 , . . . , eλn . Using elementary properties of determinants, the determinant of eA can then be expressed as det( eA) = det( P · diag( eλ 1 , . . . , eλn) · P −1) = det( P ) · det( diag( eλ 1 , . . . , eλn )) · det( P −1) = det( diag( eλ 1 , . . . , eλn )) = eλ 1 · · · eλ n = eλ 1+···+ λn We know that tr( A) equals the sum of the eigenvalues of A which completes the proof for diagonalizable A. For the general case we can use the Schur decomposition A = U T U ∗ where T is an upper-triangular matrix and U is a unitary matrix ( U ∗ U = U U ∗ = I). We know that the upper–triangular matrix in the Schur decomposition of A also contains the eigenvalues of A along its diagonal. The proof then follows the same pattern as in the diagonalizable case, with the difference being that we work with upper–triangular matrices instead of diagonal matrices. However the upper–triangular matrices that appear contain the same diagonal elements as the diagonal matrices above. Since the matrix elements above the diagonal do not affect the determinant or the trace, this means the result is the same in the end. (b) Using the power series for eA we can directly see that eA T = eA T. CHAPTER 4. SOLUTIONS 31 For matrices A and B that commute (meaning AB = BA) it is also easy to see that eA · eB = eA+ B. (This is identity is not valid for matrices that do not commute!). To show that eA is orthogonal, we need to prove that ( eA)T eA = I. For an antisymmetric matrix A we have eA T · eA = eA T · eA = e− A · eA = e− A+ A = e 0 = I, since A and − A commute (the zero above denotes the zero matrix), hence eA is an orthogonal matrix. If A T = − A, then all elements on the diagonal of A must be equal to zero, implying that tr( A) = 0. Using the identity from (a) we then have det( eA) = e tr( A) = e 0 = 1. Solution to problem 1.6, page 6: In principle, a Schur decomposition for a matrix A ∈ n× n R can be computed by the following algorithm. We first find an eigenvalue λ 1 for A and a corresponding normalized eigenvector q1, so we have A q1 = λ 1q1 with qTq 1 1 = 1. Then we find an orthonormal basis {q } 2 , . . . , q n for the orthogonal complement of q1. In other words we form a an orthogonal matrix h i Q 1 = q1 , q2 ,..., q n meaning Q T q q 1 Q 1 = I , or, equivalently, qT i i = 1 and qT i j = 0 for i , j, i, j = 1 , . . . , n. Then we compute the matrix qT qT  1   1          qT qT  2  h i  2  h i T     1 := Q T 1 AQ 1 =   A q =   λ  .  1 q2 ... q n  .  1q1 A q2 ... A q n  .   .       .   .          qT   n qT n  λ  1qTq1 qT  1 1 A q2 . . . qT1 A q n     " #  λ q   1qT 2 1 qT2 A q2 ... qT2 A q n λ =   1 bT   =  . . . .   . . . .  0 A  .  2  . . .       λ  1qT n q1 qT nA q2 ... qT nA q n where the vector b and matrix A 2 are simply the results of the computation. This basically yields the first column of the upper triangular matrix T in the Schur decomposition A = QT Q T. We can then repeat the procedure on the A 2 block of the matrix T 1 to obtain ( n − 1) × ( n − 1) matrices Q 2 and T 2 and so on. The end result is the upper-triangular matrix from the Schur decomposition together with a sequence Q 1 ,Q 2 ,...,Qn−1 of orthognal matrices of decreasing size. The orthognal martix Q from the Schur decomposition can then be computed by "1 0T # " I # Q = Q · · · n−1 0T 1 . 0 Q 2 0 Qn−1 CHAPTER 4. SOLUTIONS 32 It should be noted that we have considerable freedom during the computing of the Schur decomposition using the described algorithm, from the choice of the ‘first’ eiqenvalue to the choice of orthonormal basis for the orthogonal complement of the chosen eigenvector at each step of the algorithm. When computing by hand it is therefore worth it to use this freedom at each step with an eye towards keeping the subsequent computations as simple as possible. We will find the Schur decomposition of matrix B first, since it is the easier example. The characteristic polynomial is 2 − λ −1 0 2 − λ 0 det( B − λI) = 0 1 − λ 0 √ √ = (2 − λ) = (2 − λ)2(1 − λ). 0 1 − λ − 2 − 2 2 − λ We choose the ‘first’ eigenvalue to be λ = 2 since we easily notice that the corresponding eigenvector is simply q1 = [0 , 0 , 1]T. Using Gaussian elimination we could also verify that q1 is also the only eigenvector for the double eigenvalue λ = 2 which means that B is not diagonalizable, but a Schur decomposition still exists. Clearly, we have many choices for the orthogonal matrix Q 1 which should contain q1 in the first column. A good choice is 0 1 0   Q   1 =   0 0 1 .     1 0 0 We now compute 0 0 1  2 −1 0 0 1 0       T       1 = Q T    0 1 0   1 BQ 1 = 1 0 0   0 0 1    √ √          0 1 0 − 2 − 2 2 1 0 0 √ √ 0 0 1 0 2 −1  2 − 2 − 2       =         0 0 1    1 0 0   = 0 2 −1  .    √ √          0 1 0 2 − 2 − 2 0 0 1  The matrix multiplication in this case is easy since Q 1 happens to be a permutation matrix: multiplication by Q 1 from the right just permutes column vectors and multiplication from the left permutes row vectors. We notice that the result T 1 already happens to be an upper-triangular matrix so no further steps are needed. The Schur decomposition of B is simply B = Q 1 T 1 Q T. 1 Of course, a less fortunate choice of Q 1 would require more computation. For instance, a sensible choice for Q 1 also seems to be 0 0 1   Q   1 =   0 1 0 .     1 0 0 This choice then results in √ √ 2 − 2 − 2 " #   λ T   1 bT 1 =   0 1 0  = ,     0 B 0 −1 2  2 CHAPTER 4. SOLUTIONS 33 where √ √ " 1 0# bT = [− 2 , − 2] and B 2 = . −1 2 As before, we can notice the eigenvalue λ = 2 of B 2 (we also know this must be an eigenvalue since the matrix Q T1 BQ 1 has the same eigenvalues as B) with eigenvector q2 = [0 , 1]T. The (almost) only choice for Q 2 is then "0 1# Q 2 = 1 0 and we get "2 −1# T 2 = Q T2 B 2 Q 2 = . 0 1 ( Q 2 is a permutation matrix that just swaps the first and second columns/rows of matrices.) The final result is √ √   " 2 − 2 − 2 2 bT Q #   T = 2 =   0 2 −1  0 T     2   0 0 1  for the upper–triangular matrix of the decomposition and   " 0 1 0 1 0T #   Q = Q   1 = 0 0 1 0 Q     2   1 0 0 for the orthogonal matrix, which is the same result as before. A somewhat more involved example is matrix A. The characteristic polynomial is 6 − λ −1 1 6 − λ −1 1 det( A − λI) = 4 3 − λ 1 = λ − 2 4 − λ 0 2 2 3 − λ 2 2 3 − λ λ − 2 4 − λ 6 − λ −1 = + (3 − λ) 2 2 λ − 2 4 − λ = (2( λ − 2) − 2(4 − λ)) + (3 − λ) ((6 − λ)(4 − λ) + ( λ − 2)) = −4(3 − λ) + (3 − λ)( λ 2 − 9 λ + 22) = (3 − λ)( λ 2 − 9 λ + 18) = (3 − λ)2(6 − λ). Let us find an eigenvector for λ = 6. Using Gaussian elimination we get 0 −1 1  2 2 −3 2 2 −3 2 0 −1         A − 6 I ∼           ∼   ∼   ∼   4 −3 1  0 −1 1  0 −1 1  0 −1 1                  2 2 −3 4 −3 1  0 −7 7  0 0 0  The equations for the components of an eigenvector v1 = [ x 1 ,x 2 ,x 3]T for λ = 6 therefore reduce to 2 x − 1 x 3 = 0, − x 2 + x 3 = 0. CHAPTER 4. SOLUTIONS 34 If we choose x 3 = 2 for the value of the ‘free’ variable we get the eigenvector v1 = [1 , 2 , 2]T. Similarly, we can find that the only eigenvectors for the eigenvalue λ = 3 are nonzero multiples of v2 = [−1 , 1 , 4]T. However, the eigenvector v ∥ 1 seems ‘nicer’ than v2. For one, ∥v1 = 3 is an integer which means we don’t need to deal with square roots when normaliz-ing. Also, it is possible to find two mutually orthogonal vectors to v1 simply by cleverly permuting and changing signs of its coordinates, which also ensures they all have the same length. After some guesswork, a sensible choice for Q 1 seems to be 1 2 2  1   Q   1 =   2 1 −2 3     2 −2 1  The result for T 1 is 2 3 5 6 3 3     T     1 = Q T     1 AQ 1 = Q T 1 4 3 1 = 0 3 3         4 0 1 0 0 3 Since T 1 happens to be an upper-triangular matrix we can terminate the algorithm and write the Schur decomposition as A = Q 1 T 1 Q T. 1 With a little less luck with our choice for Q 1, for instance if we had chosen 1 2 2  1   Q   1 =   2 −2 1  3     2 1 −2 the result for T 1 would be 6 3 3   T   1 =   0 0 3     0 3 3 The algorithm would then require one more step. But even in this case we can notice that T 1 can be transformed into an upper-triangular matrix by the same permutation matrix that transposes the second and third columns and rows as in the example for the B matrix above. Solution to problem 1.7, page 6: p 1. By the definition ∥ A∥ F = tr( A T A) and assumption U T U = I we have ∥ U A∥2F = tr(( UA)T UA) = tr( A T U T UA) = tr( A T A) = ∥ A∥2F. 2. Here, we also need a basic property of the trace operation tr( AB) = tr( BA). ∥ AV ∥2F = tr(( AV )T AV ) = tr( V T( A T AV )) = tr(( A T AV ) V T) = tr( A T A) = ∥ A∥2F. 3. This can also be proved directly by definition, or simply by combining the previous two equalities ∥ U AV ∥F = ∥ U( AV )∥F = ∥ AV ∥F = ∥ A∥F. CHAPTER 4. SOLUTIONS 35 Solution to problem 1.8, page 7: 1. Define the function f ( x) := ∥ xA + B∥2 for F x ∈ R. Clearly, f ( x) ≥ 0 for all x ∈ R. Expanding according to the definitions and the properties of the inner product we have f ( x) = ∥ Ax + B∥2F = ⟨ xA + B, xA + B⟩F = ⟨ xA, xA⟩F + ⟨ xA,B⟩F + ⟨ B,xA⟩F + ⟨ B,B⟩F = ⟨ A, A⟩ · · F x 2 + 2⟨ A, B⟩F x + ⟨ B, B⟩F. This shows f ( x) is a quadratic function of x. A quadratic function f ( x) is non-negative for all x ∈ R if and only if its discriminant D = b 2 − 4 ac is non-positive, D ≤ 0. In our case the discriminant equals D = 4(⟨ A, B⟩ ⟨ ∥ F)2 − 4⟨ A, A⟩F B, B⟩F = 4(⟨ A, B⟩F)2 − 4∥ A∥2 F B∥2 F Since D ≤ 0 we have (⟨ A, B⟩ ∥ | ≤ ∥ ∥ F)2 ≤ ∥ A∥2 , hence |⟨ F B∥2 F A, B⟩F A∥F B∥F. We also mention that the Cauchy–Schwarz inequality holds not only for the Frobenius inner product but for general inner products on vector spaces. The only properties of the inner product needed to prove the Cauchy–Schwarz inequality were ⟨ A, A⟩ ≥ 0, ⟨ A, B⟩ = ⟨ B, A⟩ and ⟨ xA, B⟩ = x⟨ A, B⟩ for scalar values x. 2. In the proof we need the Cauchy–Schwarz inequality (notice that |⟨ A, B⟩ | ≤ F ∥ A∥ ∥ ≤ ∥ ∥ F B∥F implies ⟨ A, B⟩F A∥F B∥F) but is otherwise quite direct. ∥ A + B∥2F = ⟨ A + B,A + B⟩F = ⟨ A, A⟩F + 2⟨ A,B⟩F + ⟨ B,B⟩F ≤ ∥ A∥2 ∥ F + 2∥ A∥F B∥F + ∥ B∥2 F = (∥ A∥F + ∥ B∥F)2. Since ∥ A+ B∥F and ∥ A∥F+∥ B∥F are both non-negative numbers this implies the triangle inequality. Obviously, as is the case with the Cauchy–Schwarz inequality, the triangle inequality also holds in general vector spaces with inner products. 3. First, we use the Cauchy–Schwarz inequality to obtain the following inequality ∥ AB∥2F = ⟨ AB,AB⟩F = tr(( AB)T AB) = tr( B T A T AB) = tr( A T ABB T) = tr( A T A( B T B)T) = ⟨ A T A, B T B⟩F ≤ ∥ A T A∥ ∥ F B T B∥F In order to obtain ∥ AB∥ ≤ ∥ ∥ F A∥F B∥F it therefore suffices to prove the in- equality ∥ A T A∥ ≤ ∥ F A∥2F CHAPTER 4. SOLUTIONS 36 To see this, we need to use a few properties of the matrix B = A T A ∈ m× m R . Clearly, B is a symmetric matrix since B T = ( A T A)T = A T A = B. Also, for all x ∈ m R we have ⟨ B x , x⟩ = ⟨ A T A x , x⟩ = ⟨ A x , A x⟩ ≥ 0 where ⟨x , y⟩ := xTy denotes the usual Euclidean inner product. In other words, B is a positive semidefinite matrix, so all of its eigenvalues are non-negative. Let λ ≥ 1 , . . . , λm 0 denote the eigenvalues of B. On the one hand we have  m 2 m 2 X  X X ∥ A∥4   F = tr( A T A) = tr( B)2 =  λ  = λ 2 + λ  i  i λj   i   i=1 i=1 i, j since we know that the trace of a matrix equals the sum of its eigenvalues. On the other hand we have m X ∥ A T A∥2F = ∥ B∥2F = tr( B T B) = tr( B 2) = λ 2 i i=1 since B is a symmetric matrix and we know that if λ is an eigenvalue of B then λ 2 is an eigenvalue of B 2. Combining the last two identities and noting that P ≥ i 0, because , j λi λj all the eigenvalues of B are non-negative, we can write m X ∥ A T A∥2F = λ 2 i i=1 m X X ≤ λ 2 + i λiλj i=1 i, j = ∥ A∥4 F from which the desired inequality follows. 4. Using the properties of the Kronecker product we can verify the identity directly. ∥ A ⊗ B∥2F = tr(( A ⊗ B)T( A ⊗ B)) = tr(( A T ⊗ B T)( A ⊗ B)) = tr( A T A ⊗ B T B) = tr( A T A)tr( B T B) = ∥ A∥2∥ F B∥2 F. CHAPTER 4. SOLUTIONS 37 Solution to problem 1.9, page 7: We simply write the orthogonality condition ⟨ I, A⟩F = 0 for A ∈ I⊥ in terms of the coordinates of " x # A = 1 x 2 x 3 x 4 The equation is ⟨ I, A⟩F = tr( I T A) = tr( A) = x 1 + x 4 = 0 This gives us three ’free’ variables x 2, x 3 and x 4, which agrees with the fact that 2×2 R is a four-dimensional space which implies that the orthogonal complement of any non-zero element should be three-dimensional. By choosing the appropriate values for the free variables we define three linearly independent solutions "0 1# "0 0# 1 "−1 0# A 1 = , A , A √ 0 0 2 = 1 0 3 = 2 0 1 It is straight-forward to verify that all the matrices I, A 1, A 2, A 3 are mutually orthogonal with respect to the Frobenius product. Obviously we also have ∥ A ∥ ∥ ∥ 1 F = ∥ A 2 F = ∥ A 3 F = 1, so these three matrices form an ONB for I ⊥. Solution to problem 1.10, page 7: The Eckart–Young theorem states that the problem of the best rank k approximation of a rank n matrix (with regard to the Frobenius norm) can be found using the SVD matrix decomposition. The SVD of a matrix A ∈ n× m R is a matrix factorization A = U Σ V T where  σ  1 0 . . . 0     Σ =  0 σ  n× m  2 . . . 0 ∈   R    . .   .   .. . . ..  is a diagonal matrix containing the singular values σ ≥ · · · ≥ ≥ 1 σ min( n,m) 0 along the diagonal and h i h i U = u n× n m× m 1 , . . . , u ∈ ∈ n R and V = v1 ,..., v m R are orthogonal matrices. Another useful way of writing the SVD is the sum min( n,m) X A = σi u i vT. i i=1 Each term in this sum is a rank 1 matrix (every column in a matrix of the form σ uvT is clearly a multiple of u), these are the ‘singular’ matrices that give the ‘singular value decomposition’ its name. Note also that the Frobenius norm of a ‘singular’ matrix in the decomposition equals ∥ σ uvT∥2F = tr(( σ uvT)T σ uvT) = σ 2tr(vuTuvT) = σ 2tr(vvT) = σ 2tr(vTv) = σ 2 since the vectors u are v are normed, uTu = vTv = 1. Hence, the Frobenius norm of any matrix A can be expressed in terms of its singular values as ∥ A∥ F = CHAPTER 4. SOLUTIONS 38 q σ 2 + + · · · + . The Eckart–Young theorem states that in order to 1 σ 2 2 σ 2 min( n,m) find the best rank k approximation to a matrix A we simply pick out k of those rank 1 ‘singular’ matrices corresponding to k largest singular values. In order words, we form the diagonal matrix Σ k which contains only the first k singular values (assuming σ ≥ ≥ 1 σ 2 . . . ) along the diagonal and com- pute M = U Σ kV T. Among all rank k matrices M will then be the matrix that minimises the value ∥ A − M∥F. A difficulty in using the Eckart–Young theorem when dealing with matrices without numerical computation software, is that computing the SVD for general matrices by hand can be quite tedious. For the examples below we don’t use any general algorithm for computing the SVD because for these special cases matrices the decomposition can often be found by simpler means. (a) For a diagonal matrix D we almost already have the SVD D = U Σ V T with Σ = D and U = V = I, but not quite. We can write 2 0 0   A =     − 0 −3 0 = 2 · e 3 · e   1eT 1 2eT 2 + 1 · e3eT 3 = 2 · e1eT 1 + 3 · e2(−eT 2 ) + 1 · e3eT 3 ,   0 0 1 i.e., we need to change the signs of the negative diagonal elements and change the sign of the corresponding column of either U = I or V = I. The best rank 1 approximation to A contains the ‘largest’ term (by absolute value) in that sum 0 0 0   M = 3 · e   2(−eT   2 ) = −3 · e2eT 2 = 0 −3 0 .     0 0 0 So clerly, for diagonal matrix D, we only need to pick the diagonal entry which is largest by absolute value. Similarly, the best rank 2 approximation contains the largest 2 diagonal entries by absolute value, 2 0 0   M = 2 · e   1eT − 3 · e2eT   2 = 0 −3 0 .     0 0 0 Of course, the decomposition D = IDI is not actually a SVD with positive and ordered singular values along the diagonal of D. But we could obtain this with permutation matrices (with an additional change of sign because of the −3) for U and V . For instance, we could also write 2 0 0  0 1 0 3 0 0 0 1 0                   −      0 −3 0 =  1 0 0 0 2 0 1 0 0 =: U Σ V T                 0 0 1  0 0 1 0 0 1 0 0 1 to obtain the standard SVD and then compute  0 1 0 3 0 0 0 1 0 0 0 0         M =         −         1 0 0 0 0 0 1 0 0 = U Σ 0 −3 0       1 V T =            0 0 1 0 0 0 0 0 1 0 0 0 which is the same result. CHAPTER 4. SOLUTIONS 39 (b) Noticing that B is a symmetric, we remember that a symmetric matrix can be diagonalised in an orthonormal basis, meaning that we have an eigenvalue decomposition B = QDQ T with Q an orthogonal matrix. Comparing such an eigenvalue decomposition with the SVD, we notice that we can almost write B = U Σ V T with Σ = D and U = V = Q. Depend-ing on the order of the eigenvalues in D we may need to permute the columns of U and V , and change the signs of those columns in either U or V which correspond to a negative eigenvalue, in order to obtain the standard SVD. But as seen in the previous example, this is not essential for the application of the Eckart–Young theorem. To find the eigenvalue decomposition of B we first compute the characteristic polynomial 1 − λ 3 det( B − λI) = = (1 − λ)2 − 9 = (−2 − λ)(4 − λ). 3 1 − λ In order to obtain the best rank 1 approximation we actually only need the largest eigenvalue (by absolute value) λ = 4 along with the appropriate (normalised) eigenvector. Gaussian elimination "−3 3 # "1 −1# A − 4 I = ∼ 3 −3 0 0 reduces the equations for the coordinates of the eigenvector v = [ x 1 ,x 2]T to x − 1 x 2 = 0. A normalised solution is 1 "1# q = √2 1 and the best rank 1 approximation can be expressed by 1 "1# 1 " # h i 2 2 M = 4 · qqT = 4 · √ √ 1 1 = . 2 1 2 2 2 (c) Since C is a diagonal matrix we can immediately write at least two different best rank 1 approximations to C. "2 0# "0 0# M = or M = 0 0 0 2 because we have two equal ‘largest’ singular values σ 1 = σ 2 = 2. In fact, the are many more solutions, because in the case of two or more equal singular values there are also infinitely many valid singular value decompositions. Indeed, for any orthogonal matrix Q = U = V we can write a SVD for C as C = 2 I = Q(2 I) Q T For instance, if we choose Q to be a rotation matrix "cos( t) −sin( t)# Q = sin( t) cos( t) CHAPTER 4. SOLUTIONS 40 for some t ∈ R, we can even explicitly compute "2 0# "cos( t) −sin( t)#"2 0#" cos( t) sin( t)# M = Q Q T = 0 0 sin( t) cos( t) 0 0 − sin( t) cos( t) " cos2( t) cos( t) sin( t)# = 2 cos( t)sin( t) sin2( t) to get an infinite set of different best rank 1 approximations to C (which also includes the two previously mentioned solutions by choosing t = 0 and t = π ). One can also explicitly compute that we have ∥ C − M∥ 2 F = 2 for any choice of t in the expression for M above. Solution to problem 1.11, page 7: (a) Let’s recall, linear independence of the set {u } 1 , . . . , u q is the validity of the statement (for β ∈ j R): q X Whenever βj u j = 0, we have that β 1 = β 2 = ··· = βq = 0. j=1 As the reader will readily check, the implication we need to prove is now really just a direct consequence of the definition of the Kronecker product and linear independence of {u } 1 , . . . , u q . (b) Consider a trivial linear combination of vectors v ⊗ i u j, i.e., p q X X α ⊗ ij v i u j = 0. i=1 j=1 Notice that p q p  q  q  p  ! ! X X X  X  X  X  α ⊗    ⊗   ⊗  ij v i u j = v α  =  α u  .  i ij u j   ij v i j        i=1 j=1 i=1  j=1  j=1 i=1 (We really just notice that middle expression, we will not use it.) Now it follows from linear independence of u j’s and part (a) that for every j = 1 , . . . , q we have that p X αij v i = 0. i=1 As the set {v } 1 , . . . , v p is also linearly independent, we must have that αij = 0 for every i = 1 ,...,p and every j = 1 ,...,q, i.e., the set {v ⊗ i u j : i = 1 ,...,p ; j = 1 ,...,q} is linearly independent. CHAPTER 4. SOLUTIONS 41 Solution to problem 1.12, page 8: Assume we have eigenvalue/eigenvector pairs for A and B, A v = λ v and B u = µ u. By the definition of the Kronecker sum and properties of the Kronecker product we have ( A ⊕ B)(v ⊗ u) = ( A ⊗ I ⊗ n + Im B)(v ⊗ u) = ( A ⊗ I ⊗ n)(v ⊗ u) + ( Im B)(v ⊗ u) = A v ⊗ In u + Im v ⊗ B u = λ(v ⊗ u) + µ(v ⊗ u) = ( λ + µ)(v ⊗ u). This shows that the sum of eigenvalues λ + µ is an eigenvalue for A ⊕ B with eigenvector v ⊗ u which proves the claim. This means it is possible to compute the eigensystem of A ⊕ B without explicitly computing the Kronecker sum simply by computing the eigensystems of A and B separately and then computing all possible sums λ + µ of eigenvalues together with corresponding eigenvectors v ⊗ u. (As seen in problem 1.11, linearly independent sets of eigenvectors of A and B will give us corresponding linearly independent sets of eigenvectors of A ⊕ B.) For A the characteristic polynomial is −1 − λ 2 det( A − λI) = = (−1 − λ)(3 − λ). 0 3 − λ An eigenvector for λ 1 = −1 is v1 = [1 , 0]T and an eigenvector for λ 2 = 3 is v2 = [1 , 2]T. For B we have 1 − µ 0 det( B − µI) = = (1 − µ)(2 − µ) 2 2 − µ so the eigenvalues are µ 1 = 1 and µ 2 = 2 with eigenvectors u1 = [−1 , 2]T and u2 = [0 , 1]T. We can organize the pairs λ ⊗ i + µj , v i u j into a table. " # " # / 1 1 µ − j , u j λi , v i 1 , 3 , 0 2 −1 −1 "     −1#      2   2  1 , 0 ,       4 ,   2       −   0   2          0   4  0 0 "     0#     1 1 2 , 1 ,       5 ,   1         0 0         0 2 Solution to problem 1.13, page 8: (a) Since A is a symmetric matrix, we know it can be diagonalised with an orthogonal matrix U . The characteristic polynomial is 2 − λ 2 det( A− λI) = = (2− λ)(−1− λ)−4 = λ 2 − λ−6 = ( λ−3)( λ+2). 2 −1 − λ CHAPTER 4. SOLUTIONS 42 For λ 1 = 3 a normalized eigenvector is u1 = 1 √ [2 , 1]T and for λ 5 2 = −2 we have for example u2 = 1 √ [−1 , 2]T. The matrices D and U are therefore 5 "3 0 # 1 "2 −1# D = and U = √ . 0 −2 5 1 2 Note that since A is a symmetric matrix this diagonalization readily gives us the SVD which we can write as A = 3 · u − 1uT 1 2 · u2uT2 = 3 · u1uT1 + 2 · u2(−uT2). (b) We can directly verify that if we have U T U = I then we also have ( U ⊗ U )T( U ⊗ U ) = ( U T ⊗ U T)( U ⊗ U ) = U T U ⊗ U T U = I ⊗ I which shows this is an orthogonal matrix. Similarly, we can see that if we have a diagonalisation A = U DU T then ( U ⊗ U )( D ⊗ D)( U ⊗ U )T is a diagonalisation for A ⊗ A since ( U ⊗ U )( D ⊗ D)( U ⊗ U )T = ( U ⊗ U )( D ⊗ D)( U T ⊗ U T) = U DU T ⊗ U DU T = A ⊗ A. (c) For the best rank 1 approximation of A ⊗ A we need the largest singular value and corresponding singular vectors. Since A is symmetric, so is A ⊗ A, and its SVD can be obtained from the eigenvalue decomposition which was given in (b). We only need the largest (by absolute value) eigenvalue of A ⊗ A. We know that for matrices A and B the eigenvalues of A ⊗ B are the products λiµj (where λi and µj are the eigenvalues of A and B, respectively) and the eigenvectors are v ⊗ i u j (where v i and u j are eigenvectors of A and B, respectively). The largest eigenvalue of A ⊗ A is therefore λ · 1 λ 1 = 9 and the corresponding eigenvector is 4   1   2 q ⊗   1 = u1 u1 =     . 5   2     1 The best rank 1 approximation is therefore 16 8 8 4   9    8 4 4 2 M   1 = 9 · q1qT   1 =   . 25    8 4 4 2      4 2 2 1 To obtain the best rank 2 approximation to A ⊗ A we can add the second largest singular matrix in the SVD to M 1. The second largest eigenvalue (by absolute value) of A ⊗ A is λ 1 λ 2 = λ 2 λ 1 = −6. The eigenspace for the eigenvalue −6 is two-dimensional, so for the eigenvector we can take −2 −2     1      4  1 −1 q ⊗   ⊗   2 = u1 u2 =       or q u   5 −  3 = u2 1 =    1 5  4           2   2  CHAPTER 4. SOLUTIONS 43 or any normalised linear combination of q2 and q3, e.g. sin( t)q2+cos( t)q3 for any choice of t ∈ R. If we choose q2 we get  4 −8 2 −4 24 24 12 12     6     −8 16 −4 8  1 24 −12 12 −6 M − −     2 = M 1 6 · q2qT     2 = M 1   =   25      2 −4 1 −2 5 12 12 6 6          −4 8 −2 4  12 −6 6 −3 but as in the case of Exercise 1.10 (c) we could actually construct infinitely many rank 2 matrices M − 2 with minimal value of ∥ M 2 A ⊗ A∥F. Solution to problem 1.14, page 8: (a) Let’s start with the matrix A. Its characteristic polynomial is −2 − λ 2 det( A − λI) = = λ 2 + λ − 6 = ( λ + 3)( λ − 2), 2 1 − λ which gives λ 1 = −3 and λ 2 = 2 as eigenvalues of A. The corresponding eigenvectors are u′ = [−2 = [1 1 , 1]T and u′2 , 2]T, which both have norm √ 5. Since we’re aiming for an orthogonal transition matrix, we’ll pick 1 1 "−2 1# Q = [u1 , u2] = √ [u′ √ 1 , u′2] = . 5 5 1 2 The corresponding diagonal matrix is "−3 0# D = 0 2 and we have A = QDQ T. Note that B = A − I, hence B = Q( D − I) Q T and B has eigenvalues µ − − 1 = λ 1 1 = −4 and µ 2 = λ 2 1 = 1 (and same eigenvectors as A). (b) Write A u = λ u and B v = µ v. Then ( A ⊗ B + A ⊗ I + I ⊗ B)(u ⊗ v) = ( A ⊗ B)(u ⊗ v) + ( A ⊗ I)(u ⊗ v) + ( I ⊗ B)(u ⊗ v) = ( A u) ⊗ ( B v) + ( A u) ⊗ v + u ⊗ ( B u) = ( λ u) ⊗ ( µ v) + ( λ u) ⊗ v + u ⊗ ( µ u) = ( λµ + λ + µ)(u ⊗ v), hence u ⊗ v is an eigenvector of A ⊗ B + A ⊗ I + I ⊗ B corresponding to the eigenvalue λµ + λ + µ. For actual eigenvalues of matrices A and B we obtain λ 1 µ 1 + λ 1 + µ 1 = 5, λ 1 µ 2 + λ 1 + µ 2 = −5, λ 2 µ 1 + λ 2 + µ 1 = −10, λ 2 µ 2 + λ 2 + µ 2 = 5. CHAPTER 4. SOLUTIONS 44 (c) Note that the matrices A and B are symmetric, hence the matrices A ⊗ B, A ⊗ I and I ⊗ B are also symmetric, and so is their sum, i.e., the matrix A ⊗ B + A ⊗ I + I ⊗ B. Singular value decomposition of this matrix can therefore be completely determined from its eigendecomposition. In particular, the singular values of A ⊗ B + A ⊗ I + I ⊗ B are (in decreasing order) 10 , 5 , 5 , 5. The singular value 10 corresponds to the eigenvalue λ 2 µ 1 + λ 2 + µ 1 = −10 with corresponding eigenvector −2   1 "1# "−2# 1    1  u ⊗ ⊗   2 v1 = =     . 5 2 1 5 −   4      2  By the Eckart–Young theorem the rank 1 matrix M which is closest to A ⊗ B + A ⊗ I + I ⊗ B with respect to the Frobenius norm is −2   1 1    1  M = −10(u ⊗ ⊗ ·   2 v1)(u2 v1)T = −10 ·     [−2 , 1 , −4 , 2] 5 5 −   4      2   4 −2 8 −4   2   −2 1 −4 2  = −       . 5    8 −4 16 −8     −4 2 −8 4  Solution to problem 1.15, page 8: Let us check what happens if we multiply the matrix B = A⊗ A+ A 2 ⊗ I and the vector v⊗u where v and u are eigenvectors of A, A v = λ v and A u = µ u: B(v ⊗ u) = ( A ⊗ A + A 2 ⊗ I)(v ⊗ u) = A v ⊗ A u + A 2v ⊗ I u = λµ v ⊗ u + λ 2v ⊗ u = ( λµ + λ 2)v ⊗ u. This shows that we can obtain the eigensystem of B by computing the eigenvalues λi and eigenvectors v i of A and then compute λiλj + λ 2 to obtain all four i eigenvalues of B together with their eigenvectors v ⊗ i v j, i,j = 1 , 2. The characteristic polynomial of A is −1 − λ 3 det( A − λI) = = (1 + λ)2 − 9 = ( λ + 4)( λ − 2). 3 −1 − λ Eigenvectors for λ 1 = −4 and λ 2 = 2 are v1 = [−1 , 1]T and v2 = [1 , 1]T. Then the eigenvalues for B are µ 1 = λ 1( λ 1 + λ 1) = 32, µ 2 = λ 1( λ 2 + λ 1) = 8, µ 3 = λ 2( λ 1 + λ 2) = −4 and µ 4 = λ 2( λ 2 + λ 2) = 8 with eigenvectors  1  −1 −1 1                 −1 −1  1  1 u ⊗   ⊗   ⊗   ⊗   1 = v1 v1 =           , u v   , u v   , u v   . −  2 = v1 2 =   3 = v2 1 = −  4 = v2 2 =    1  1   1 1                  1   1   1  1 CHAPTER 4. SOLUTIONS 45 Solution to problem 1.16, page 9: (a) To prove this, we need the identity vec( ABC) = ( C T ⊗ A)vec( B) which holds for any matrices A, B and C for which the product ABC is defined. Applying the (linear) operator vec to the expression AX + XB we can write vec( AX + XB) = vec( AXI) + vec( IXB) = ( I ⊗ A)vec( X) + ( B T ⊗ I)vec( X) = ( B T ⊕ A)vec( X) which holds by the definition of the Kronecker sum. (b) We can answer this question quickly by considering the eigenvalues of the matrix B T ⊕ A. Since both B T and A are upper triangular matrices, we can read their eigenvalues off their diagonals. The eigenvalues of the Kronecker sum B T ⊕ A are all the possible sums of the eigenvalues of B T and A (see Exercise 1.12), i.e., 0 , 1 , 4 , 5. Because 0 is an eigenvalue, the matrix B T ⊕ A is singular and the homogeneous system ( B T ⊕ A)vec( X) = 0 has non-trivial solutions. By (a) AX + XB = 0 also has a nontrivial solution. (c) Here we actually compute the Kronecker sum B T ⊕ A to obtain the system matrix of our equation. 1 0 2 0 −1 2 0 0 0 2 2 0             0 1 0 2  0 3 0 0 0 4 0 2 B T ⊕ A = B T ⊗ I + I ⊗ A =               +   =   .       0 0 2 0  0 0 −1 2 0 0 1 2             0 0 0 2  0 0 0 3 0 0 0 5 By the way, we can confirm this matrix does indeed have the eigenvalues as claimed in (b). Gaussian elimination for the system ( B T ⊕ A)vec( X) = vec( C) then yields the reduced form of the system  0 2 2 0 −2   0 1 1 0 −1           0 4 0 2 2   0 2 0 1 1        ∼            0 0 1 2 1   0 0 1 2 1           0 0 0 5 5   0 0 0 1 1   0 1 1 0 −1   0 1 0 0 0           0 1 0 0 0   0 0 1 0 −1  ∼       ∼       .      0 0 1 0 −1   0 0 0 1 1           0 0 0 1 1   0 0 0 0 0  We therefore have the equations x 2 = 0, x 3 = −1 and x 4 = 1 for the entries of the unknown matrix " x # X = 1 x 3 x 2 x 4 with x 1 being the ‘free’ variable of the system. The general solution for our equation can then be written as " t −1# X = 0 1 where t ∈ R. CHAPTER 4. SOLUTIONS 46 Solution to problem 1.17, page 9: (a) A square matrix A is invertible if and only if N ( A) = {0}. This is true if and only if 0 is not an eigenvalue. Since by assumption all eigenvalues of A are not negative, this holds if and only if they are all strictly positive. (b) Let λ be an eigenvalue of A and v a corresponding eigenvector, A v = λ v. Since we are assuming the inverse A−1 exists, we can multiply this equation from the left by A−1 to get 1 A v = λ v ⇒ v = λA−1v ⇒ A−1v = v λ since we know λ , 0 from (a). This shows λ is an (non-zero) eigenvalue of A if and only if λ−1 is an eigenvalue of A−1. Clearly then all the eigenvalues λ 1 ,...,λn of A are positive if and only all the eigenvalues λ−1 1 , . . . , λ−1 n of A−1 are positive since inverting a number does not change its sign. (c) A PSD matrix A (i.e., a symmetric matrix with only nonnegative eigenvalues) can be diagonalised, A = QDQ T, with  λ  1     D =  .   .     .       λ  n where λ 1 ,...,λn are the eigenvalues and Q is an orthogonal matrix. Let us define the square-root of D by √  λ  √  1      D :=  .   .   .     √     λ  n √ and define the matrix S by S = Q DQ T. Then we can verify √ √ √ √ S 2 = Q DQ T Q DQ T = Q D DQ T = QDQ T = A. Since by assumption all the eigenvalues λ √ √ 1 , . . . , λn of A are nonnegative, all the eigenvalues λ 1 ,..., λn of S are also nonnegative, so S is also a positive semiedefinite matrix. Solution to problem 1.18, page 9: (a) This can be done by computing the eigenvalues and verifying that they are nonnegative, as we will do in (b). (b) We start by evaluating the characteristic polynomial 2 − λ 3 1 2 − λ 3 λ − 1 det( A − λI) = 3 6 − λ 3 = 3 6 − λ 0 1 3 2 − λ 1 3 1 − λ 2 − λ 3 λ − 1 3 6 − λ = 3 6 − λ 0 = ( λ − 1) 3 − λ 6 3 − λ 6 0 = (1 − λ)(18 − (3 − λ)(6 − λ)) = λ( λ − 1)(9 − λ). CHAPTER 4. SOLUTIONS 47 To obtain the eigenspace for λ 1 = 0 we compute 2 3 1 1 2 1 1 2 1  1 0 −1         A ∼           ∼   ∼   ∼   3 6 3 2 3 1 0 −1 −1 0 1 1  .                 1 3 2 1 3 2 0 1 1  0 0 0  We choose a normalised solution to the system, q1 = 1 √ [1 , −1 , 1]T, so that 3 we will have an orthonormal basis of eigenvectors. For λ 2 = 1 we have 1 3 1 1 3 1 1 0 1       A − I ∼         ∼   ∼   3 5 3 0 −4 0 0 1 0 .             1 3 1 0 0 0 0 0 0 We choose q2 = 1 √ [−1 , 0 , 1]T. For λ 2 3 = 9 we have −7 3 1   1 −1 1  1 −1 1  1 0 −1         A − 9 I ∼           ∼ −  ∼   ∼    3 −3 3   7 3 1  0 −4 8  0 1 −2 .                  1 3 −7  1 3 −7 0 4 −8 0 0 0  We choose q3 = 1 √ [1 , 2 , 1]T. 6 √ (c) As explained in Exercise 1.17 we define D = diag(0 , 1 , 3), Q = [q1 , q2 , q3] and compute       √ √ 1 0 −1 1 2 1 1 1 0 1   1     A = Q DQ T = q       2qT       2 +3q3qT 3 =  0 0 0 + 2 4 2 = 1 2 1 . 2         2     −1 0 1  1 2 1 0 1 1 Solution to problem 1.19, page 10: Following the algorithm, we denote " 2 # "8 2# a 11 = 1 , b = , B = . −1 2 6 Then we compute  1 0 0 " # " # " #   8 2 4 −2 4 4 L   − 1 =    2 1 0 and A = .   2 = −   2 6 2 1 4 5 −1 0 1 This completes the first step. In the next step we repeat the process for A 2. We denote h i h i a 11 = 4 , b = 4 , B = 5 and compute "2 0# L 2 = , A 2 1 3 = 5 − 4 = 1. √ The last step is to compute the square-root A 3 = 1 and to compute L 3 = 1. To get the final result it is actually not necessary to explicitly compute the product CHAPTER 4. SOLUTIONS 48 in (1.2). Namely, it can be verified that the same result can be obtained simply by nesting the first columns of the L 1, L 2 and L 3 matrices into one matrix  1 0 0   L =      2 2 0 .     −1 2 1 Solution to problem 1.20, page 10: Note that a matrix X is negative definite if and only if the matrix − X is positive definite. As the (1 , 1)-entry of − A is −1, the matrix − A is not positive definite by Sylvester’s criterion, so A is not negative definite. On the other hand, for the matrix  4 −6 2    − B =   −   6 10 −5      2 −5 14 we have 4 −6 4 > 0 , = 4 > 0 and det(− B) = 36 > 0. − 6 10 Hence, − B is positive definite by Sylvester’s criterion, which means that B is negative definite. Let’s find the Cholesky decomposition of Y = − B, which we’ll write in block form as  4 −6 2  " #   y Y =   −  11 zT  6 10 −5 = .     z Z  2 −5 14 For the first step we have √   " y # 2 0 0 11 0T   L −  1 = =  3 1 0 . √1 c I     y 2   11  1 0 1 Then 1 " 10 −5# 1 " 36 −12# " 1 −2# Y − 2 = Z − zzT = = , y − − − 11 5 14 4 12 4 2 13 so, for the second step, we get √  1 0 " 1 0# L   2 =    −2  = .  √  −  1 2 1 1 q √ The third and final step gives L 3 = 13 − 1 (−2)(−2) = 9 = 3. ‘Assembling’ L 1 1, L 2 and L 3 we obtain  2 0 0   L =    −   3 1 0 .      1 −2 3 We have the Cholesky decomposition LL T of Y = − B. CHAPTER 4. SOLUTIONS 49 Solution to problem 2.1, page 11: First, recall that any n × n matrix A can be uniquely expressed as the linear combination n X A = aijEij i,j=1 where aij is the ( i,j)-entry of A and Eij is the matrix which contains only zeros except for the ( i, j)-entry which equals 1. As the set { Eij ; i,j = 1 ,...,n} is also clearly linearly independent, it forms a basis of the vector space ( n× n R , + , ·) which we will call the standard basis. Since we have n 2 basis vectors, this demonstrates that ( n× n R , + , ·) is n 2 dimensional. (a) Denote this subset by U 1 and assume A,B ∈ U 1. Since n n n X X X αA + βB = α aijEij + β bijEij = ( αaij + βbij) Eij i,j=1 i,j=1 i,j=1 the (1 , 2)-entry of αA + βB is α · 0 + β · 0 = 0, i.e., αA + βB ∈ U 1. Hence, U 1 is a vector subspace of n× n R . For the basis of U n× n 1 we simply omit E 12 from the standard basis of R . Therefore, a basis of U 1 has one element less than the standard basis of n× n R , i.e., dim( U 1) = n 2 − 1. (b) For a subset of a vector space to be a vector subspace it must contain all scalar multiples of all its elements, including 0. The simplest argument why this subset is not a subspace is therefore because it does not contain the 0 matrix. Another argument is that this subset is not closed for addition since adding two matrices with 1 for their (1 , 2) entries produces a matrix with 2 for its (1 , 2) entry. (c) This subset contains the 0 matrix and is also closed for addition since adding matrices with integer entries results in matrices with integer entries. However, this subset is not closed for scalar multiplication. For instance multiplying any non-zero integer matrix with for instance π will produce a matrix that does not have all integer entries. (d) If we add two upper-triangular matrices, we get an upper-triangular matrix. Similarly, if we multiply an upper-triangular matrix with any scalar, the result will be an upper-triangular matrix. This subset, let us denote it with U 4, is therefore closed for addition and scalar multiplication and is clearly a subspace. To find a basis for U 4 and determine its dimension, we notice that any matrix A ∈ U 4 can be written as X A = aijEij i≤ j where the sum goes over all the pairs i, j = 1 , . . . , n with i ≤ j. So a basis for U 4 is the set { Eij : i,j = 1 ,...,n, i ≤ j}. This set contains 1 n( n + 1) 2 elements which is the dimension of U 4. (e) Adding symmetric matrices produces a symmetric matrix, multiplying a symmetric matrix with any number also results in a symmetric matrix, so the set of symmetric matrices is clearly a vector subspace. CHAPTER 4. SOLUTIONS 50 More formally, denote the subset of symmetric matrices by U 5 and assume A, B ∈ U 5, meaning A T = A and B T = B. We can then easily verify that any linear combination of A and B is also a symmetric matrix since ( αA + βB)T = αA T + βB T = αA + βB. To get an explicit basis for U 5 we can notice that any matrix A ∈ U 5 can be written as n X X A = aiiEii + aij( Eij + Eji). i=1 i 0 , = 108 > 0 , det( H − 2) = −648 < 0, 6 12 i.e., H 2 is indefinite and T 2 is a saddle point. CHAPTER 4. SOLUTIONS 72 (e) Note that the function r is not sensitive to a permutation of variables, so all three partial derivatives can be obtained from ∂r with a suitable ∂x permutation of variables. Let’s start with the stationary points: ∂r = 2 x − 2 yz = 0, ∂x ∂r = 2 y − 2 xz = 0, ∂y ∂r = 2 z − 2 xy = 0. ∂x From the third equation, we have z = xy, therefore the first two equations become x − y · xy = x(1 − y 2) = 0, y − x · xy = y(1 − x 2) = 0. From the first equation we have that either x = 0 or 1 − y 2 = 0. In case x = 0, we have z = 0 and also y = 0 from the second original equation. The first stationary point is T 1(0 , 0 , 0). In case y = ±1 we get ±1·(1− x 2) = 0, so x = ±1. (N.b.: We have the two possibilities x = ±1 for each of the possibilities y = ±1.) Since z = xy, we get four additional stationary points T 2(−1 , −1 , 1) , T 3(−1 , 1 , −1) , T 4(1 , −1 , −1) , and T 5(1 , 1 , 1). The Hesse matrix is  2 −2 z −2 y   H −  r ( x, y, z) =    2 z 2 −2 x .     −2 y −2 x 2  Evaluating this at stationary points T 1 ,...,T 5 and denoting these matrices by H 1 ,...,H 5 we obtain 2 0 0  2 −2 2  2 2 −2       H   −    1 =       0 2 0 , H  2 2 2 , H  2 2 2  ,   2 =   3 =         0 0 2  2 2 2 −2 2 2  2 2 2   2 −2 −2     H   −  4 =     2 2 −2 , H  2 2 −2 .   5 =       2 −2 2  −2 −2 2  Now, H 1 is clearly positive definite, so T 1 is a local minimum. Note that H 2 ,..., H 5 are indefinite by the Sylvester’s criterion, i.e., they all have the (1 , 1)-entry positive, while det( H 2) = ··· = det( H 5) = −32. (In fact, the matrices H 2 ,...,H 5 all have eigenvalues −2 , 4 , 4.) Hence, T 2 ,...T 5 are saddle points. (f) Stationary points: ∂u = 3 x 2 − 3 y = 0, ∂x ∂u = 3 y 2 − 3 x = 0. ∂y CHAPTER 4. SOLUTIONS 73 So, from the first equation, y = x 2, hence x 4 − x = x( x 3 − 1) = 0 from the second equation. This gives us x 1 = 0 and x 2 = 1, hence y 1 = 0 and y 2 = 1. We have two stationary points, T 1(0 , 0) and T 2(1 , 1). The Hesse matrix is "6 x −3# Hu( x,y) = , −3 6 y and, evaluating at stationary points we get " 0 −3# " 6 −3# H 1 := Hu(0 , 0) = and H . −3 0 2 := Hu (1 , 1) = −3 6 A quick application of Sylvester’s criterion reveals that H 1 is indefinite, while H 2 is positive definite. Therefore, T 1 is a saddle point, and T 2 is a local minimum. (g) Stationary points: ∂v = 6 xy − 6 x = 0, ∂x ∂v = 3 x 2 + 3 y 2 − 6 y = 0. ∂y First equation is equivalent to 6 x( y − 1) = 0 so we have either x = 0 or y = 1. • Plugging x = 0 into the second equation we get 3 y 2 − 6 y = 0, hence y = 0 or y = 2. • Plugging y = 1 into the second equation we get 3 x 2 − 3 = 0, hence x = −1 or x = 1. All in all, we have four stationary points: T 1(0 , 0), T 2(0 , 2), T 3(−1 , 1), and T 4(1 , 1). The Hesse matrix of v is "6 y − 6 6 x # Hv( x,y) = , 6 x 6 y − 6 which, evaluated at T 1 ,...,T 4 and denoted by H 1 ,...,H 4 become "−6 0 # "6 0# " 0 −6# "0 6# H 1 = , H , H , and H . 0 −6 2 = 0 6 3 = −6 0 4 = 6 0 Now, H 1 is clearly negative definite, so T 1 is a local maximum. Also clear is positive definiteness of H 2, so T 2 is a local minimum. The matrices H 3 and H 4 are not definite, so T 3 and T 4 are saddle points. Solution to problem 3.13, page 18: Given a , b ∈ n R let f (x) = (xTa)(xTb). (a) The formula for f (x) can be rewritten as f (x) = (xTa)(bTx) = xT(abT) x. Therefore, from the formula ∂(xT A x) = xT( A + A T), we get ∂ x ∂f = xT(abT + baT), ∂ x CHAPTER 4. SOLUTIONS 74 and, from the formula ∂A x = A, we also get ∂ x ∂ 2 f ∂ ∂f !T = = (abT + baT). ∂ x2 ∂ x ∂ x (b) One stationary point of f is 0, i.e., one of the solutions of ∂f (x) = 0. (All ∂ x stationary points of f are determined by the nullspace N (abT + baT).) Since a and b are orthogonal, i.e., aTb = bTa = 0, we have (abT + baT)a = ∥a∥2b, (abT + baT)b = ∥b∥2a. Let U be the vector subspace of n R spanned by a and b. From the two equalities above, we see that, restricted to U with the vector space basis {a , b}, the Hesse matrix of f is represented by " 0 ∥a∥2# , ∥b∥2 0 which has eigenvalues λ 1 , 2 = ±∥a∥∥b∥. Hence, ∂ 2 f is indefinite, since it ∂ x2 has a positive and a negative eigenvalue, so 0 is a saddle point. (Note that, since the Hesse matrix ∂ 2 f is constant, the whole nullspace N (abT +baT) ∂ x2 consists of saddle points.) Solution to problem 3.14, page 19: We are trying to minimize the function f (x) = ∥x − a ∥2 ∥2 ∥2 1 + ∥x − a2 + ··· + ∥x − a k . Note that ∥x − a ∥2 i = (x − a i)T(x − a i), so k ∂f X T = 2(x − a k x − a . ∂ x 1)T + 2(x − a2)T + · · · + 2(x − a k )T = 2 i i=1 The stationary point of f (the solution to ∂f (x) = 0) is therefore ∂ x 1 x = (a k 1 + a2 + · · · + a k ) . Note that the Hesse matrix of f is ∂ 2 f = 2 kI (with n-fold eigenvalue 2 k), which ∂ x2 is positive definite, so our stationary point is in fact a (local) minimum. Solution to problem 3.15, page 19: The problem is to find the extreme values of the function f ( x, y) = 2 x 2 + y 2 constrained to the (closed and bounded) domain given by 4( x − 1)2 + y 2 ≤ 16. We split this into two subtasks. • Find the candidates for extreme values of f in the interior of the domain, i.e., subject to strict inequality 4( x − 1)2 + y 2 < 16. Extreme values of f over the entire domain that are attained at interior points (and not at the boundary points) must be local extrema of f . CHAPTER 4. SOLUTIONS 75 • Find the candidates for extreme values of f on the boundary of the domain, i.e., subject to equality 4( x−1)2 + y 2 = 16. These will be determined using the method of Lagrange multipliers. Let’s start with the interior. Stationary points of f are solutions of the system ∂f = 4 x = 0, ∂x ∂f = 2 y = 0. ∂y Clearly, x 1 = 0 ,y 1 = 0 is the only solution of this system, i.e., T 1(0 , 0) is the only stationary point of f . This point is contained in the interior of our domain (since 4(0 − 1)2 + 02 = 4 < 16 holds). Now the boundary. Let’s rewrite the equation 4( x − 1)2 + y 2 = 16 as 4( x − 1)2 + y 2 − 16 = 0, | {z } g( x,y) i.e., we have rewritten the constraint as g( x, y) = 0. The corresponding Lagrange function is L( x, y, λ) = f ( x, y) − λg( x, y) = 2 x 2 + y 2 − λ(4( x − 1)2 + y 2 − 16). Candidates for extrema on the boundary are the stationary points of this Lagrange function. (Strictly speaking, we only need the x and y coordinates of these stationary points.) Stationary points of L are solutions of the system ∂L = 4 x − 8 λ( x − 1) = 0, ∂x ∂L = 2 y − 2 λy = 0, ∂y ∂L = −(4( x − 1)2 + y 2 − 16) = 0. ∂λ (The last equation is, of course, equivalent to our constraint.) From the second equation we get 2 y(1 − λ) = 0, which implies y = 0 or λ = 1. • The case y 2 , 3 = 0 can be plugged directly into the third equation, and we get ( x − 1)2 = 4 or x 2 = −1 and x 3 = 3. • In case λ = 1 we get −4 x = −8 from the first equation, or x 4 , √ 5 = 2. Plugging this into the third equation we have y 2 = 12 or y 4 , 5 = ±2 3. Summarizing, we have the following five points, which are candidates for global extrema of f on our domain: √ √ T ( x, y) T 1(0 , 0) T 2(−1 , 0) T 3(3 , 0) T 4(2 , −2 3) T 5(2 , 2 3) . f ( x, y) 0 2 18 20 20 The bottom row contains the values of f evaluated at corresponding stationary points. Clearly, the smallest value, 0, is attained at T √ 1(0 , 0), while the largest √ value, 20, is attained at two points, T 4(2 , −2 3) and T 5(2 , 2 3). CHAPTER 4. SOLUTIONS 76 Solution to problem 3.16, page 19: The first octant is defined by inequalities x ≥ 0, y ≥ 0, and z ≥ 0. So, along with x + y + z = 5, we have four constraints. We’ll split the task into following subtasks: • Find candidates for extrema in the interior of the triangle T . That means extrema of g with respect to one constraint, namely x+ y+ z = 5. Additionally, only candidates with x > 0, y > 0, and z > 0 should be considered. • Find candidates for extrema on the edges of the triangle T . That means extrema of g with respect to two constraints, x + y + z = 5 and one of the planes x = 0, y = 0, or z = 0. Let’s start with the interior of T . The Lagrange function is L( x, y, z, λ) = xy 2 z 2 − λ( x + y + z − 5). Its stationary points are solutions of ∂L = y 2 z 2 − λ = 0, ∂x ∂L = 2 xyz 2 − λ = 0, ∂y ∂L = 2 xy 2 z − λ = 0, ∂z ∂L = −( x + y + z − 5) = 0. ∂λ From the first three equations we have y 2 z 2 = 2 xyz 2 = 2 xy 2 z. Since we only need to consider candidates with x > 0, y > 0, and z > 0, we can safely ignore the solutions with any of the x, y, or z equal to 0. Hence: y 2 z 2 = 2 xyz 2 ∴ y = 2 x and 2 xyz 2 = 2 xy 2 z ∴ z = y ∴ z = 2 x. Plugging this into the equation of the constraint, we get 5 x − 5 = 0 or x = 1. So, T 1(1 , 2 , 2) is our first candidate. We could define Lagrange functions with two Lagrange multipliers for each of the edges of the triagle, but, due to the simplicity of the additional constraints, we don’t have to. Simply plugging x = 0, y = 0, or z = 0 into g will simplify our task. A lot! • If x = 0, then g(0 , y, z) = 0. • If y = 0, then g( x, 0 , z) = 0. • If z = 0, then g( x, y, 0) = 0. That means that g is constant (and 0) along all edges of the triangle T . At T 1(1 , 2 , 2), however, the value of g is g(1 , 2 , 2) = 16. So, the largest value of g on T is 16. CHAPTER 4. SOLUTIONS 77 Solution to problem 3.17, page 19: The equation x 2 − xy + y 2 = 3 represents the constraint in this task. The expression we’re trying to maximize is ∥x∥ = p x 2 + y 2. Dealing with square roots can be cumbersome, and we’ll see that it’s preferable to maximize f ( x, y) = x 2 + y 2 instead. (N.b.: If v is a multivariate function, then every stationary point of v is also a stationary point of v 2. Why?) So we rewrite the constraint x 2 − xy + y 2 − 3 = 0 and form the corresponding Lagrange function L( x, y, λ) = x 2 + y 2 − λ( x 2 − xy + y 2 − 3). Its stationary points are solutions of ∂L = 2 x − λ(2 x − y) = 0, ∂x ∂L = 2 y − λ(2 y − x) = 0, ∂y ∂L = −( x 2 − xy + y 2 − 3) = 0. ∂λ This system is a bit trickier. We can assume λ , 0 (since λ = 0 would imply x = 0 and y = 0 from first two equations) and rewrite the first two equations as 1 2 x − y 1 2 y − x = and = . λ 2 x λ 2 y Therefore 2 x − y 2 y − x = ∴ 2 xy − 2 y 2 = 2 xy − 2 x 2 ∴ x 2 = y 2 or y = ± x. 2 x 2 y Now we plug both cases of y = ± x into the constraint (or the third equation). • In case y = − x we have 3 x 2 − 3 = 0 or x 1 , 2 = ±1. √ • In case y = x we have x 2 − 3 = 0 or x 3 , 4 = ± 3. Let’s summarize with a table √ √ √ √ T ( x, x) T 1(−1 , 1) T 2(1 , −1) T 3(− 3 , − 3) T 4( 3 , 3) . f ( x, y) 2 2 6 6 The largest value in the bottom row is 6, so the points T √ 3 and T 4 are farthest away from the origin (at a distance 6). CHAPTER 4. SOLUTIONS 78 Solution to problem 3.18, page 19: As usual we prefer to maximize the square of the distance from the origin rather than the distance itself. The Lagrange function is L( x, y, λ) = x 2 + y 2 − λ ( x 2 + y 2)2 − x 3 − y 3 . The system for the stationary points of L is: ∂L = 2 x − λ 4 x( x 2 + y 2) − 3 x 2 = 0, ∂x ∂L = 2 y − λ 4 y( x 2 + y 2) − 3 y 2 = 0, ∂y ∂L = − ( x 2 + y 2)2 − x 3 − y 3 = 0. ∂λ One obvious solution is ( x, y) = (0 , 0). We can discard this solution (since this is the origin itself) by crossing out one of the x and y factors in the first two equations and then express the λ variable in two ways 2 = 4( x 2 + y 2)−3 x, λ 2 = 4( x 2 + y 2)−3 y. λ By identifying these two equations we get x = y (all this assuming neither x nor y equals 0), and then the constraint gives us the equation 4 x 4 = 2 x 3 with x = 1 as the only non-zero solution. 2 However, it is possible that only one of the variables equals zero and we still get a valid solution. Assume x = 0 and y , 0. The constraint then reduces to the equation y 4 = y 3 which yields y = 1. We can then directly verify that ( x, y) = (0 , 1) (with λ = 2) solves our system and is therefore a stationary point. Similarly, assuming x , 0 and y = 0 leads to stationary point ( x, y) = (1 , 0). Out of the three non-trivial stationary points ( 1 , 1 ), (0 , 1) and (1 , 0) we see 2 2 that the latter two attain the maximal distance from the origin. Solution to problem 3.19, page 19: (a) The extremal values of f can be situated either in the (strict) interior of the disk or on its boundary and we consider both possibilities separately. First, any extremal point in the interior of the disc must be a stationary point for f . The system for the stationary points is ∂f = y +1 = 0, ∂x ∂f = x −1 = 0, ∂y CHAPTER 4. SOLUTIONS 79 with solution ( x, y) = (1 , −1). We can verify that this point lies in the interior of our disk but if we compute the Hesse matrix "0 1# Hf ( x,y) = 1 0 we notice this is a saddle point not an extremal point (the eigenvalues λ 1 , 2 = ±1 are of opposite signs). So the extremal points must be on the boundary of the disk, on the circle x 2 + y 2 = 2. The relevant Lagrange function for the problem is L( x, y, λ) = xy − y + x − 1 − λ( x 2 + y 2 − 2) and the system for the stationary points is ∂L = y + 1− 2 λx = 0, ∂x ∂L = x − 1 − 2 λy = 0, ∂y ∂L = −( x 2 + y 2 − 2) = 0. ∂λ From the first two equations we express 2 λ y + 1 x − 1 2 λ = and 2 λ = . x y This leads to y + 1 x − 1 = ∴ y 2 + y = x 2 − x. x y By adding x 2 to both sides we get the identity x 2 + y 2 + y = 2 x 2 − x and considering the constraint x 2 + y 2 = 2 this gives the expression y = 2 x 2 − x − 2. By plugging this expression for y back into the constraint equation we get an equation for x. x 2 + (2 x 2 − x − 2)2 = 2 ∴ 2 x 4 − 2 x 3 − 3 x 2 + 2 x + 1 = 0. This is a quartic equation which in general has quite complicated solutions. Fortunately, we can guess two integer roots x 1 = 1 and x 2 = −1. This means the polynomial is divisible by the factor x 2 − 1 and computing the polynomial quotient gives 2 x 4 − 2 x 3 − 3 x 2 + 2 x + 1 = 2 x 2 −2 x −1, x 2 − 1 which is the factor from which we can compute the remaining two roots √ 1 ± 3 x 3 , 4 = . 2 We summarize all four stationary points along with the function values in a table CHAPTER 4. SOLUTIONS 80 √ √ √ √ T ( x, y) T 3 3 3 3 1(−1 , 1) T 2(1 , −1) T 3( 1+ , −1+ ) T , −1− ) 2 2 4( 1−2 2 . f ( x, y) −4 0 1 1 2 2 The minimum value of f is therefore attained at T 1, while the maximal value is attained at both, T 3 and T 4. (b) Comparing to Exercise 3.19 (a), it is clear that any extremal points computed in (a) that happen to lie in the half-plane x ≥ 0 are also extremal points for f on the half disk. We notice that T 3 does indeed lie in the half-plane x ≥ 0 which means it is where the maximum of f on our half-disk. The point T 1 however does not lie in the half-disk. Since the minimum is not attained in the interior of the half disk nor the edge x 2 + y 2 = 2, it must be on the edge with x = 0. The Lagrangian function for f with the constraint x = 0 is L( x, y, λ) = xy − y + x − 1 − λx The system for the stationary points is ∂L = y + 1− λ = 0, ∂x ∂L = x − 1 = 0, ∂y ∂L = − x = 0 ∂λ This is a contradictory system which means f has no stationary points on the (whole) line x = 0 (this is also clear directly since f (0 , y) = − y). The last possibility is that the minimum is attained on the edge of both √ the circle x 2 + y 2 = 2 and line x = 0, i.e., the ‘corner’ points T √ √ 5(0 , − 2) √ and T 6(0 , 2). The values of f at these points are 2 − 1 and − 2 − 1 respectively, which means the minimum value of f on our half-disk is achieved at the T 6 ‘corner’ point of the half-disk. Solution to problem 3.20, page 19: In this task, the equation of the ellipsoid, rewritten as x 2 y 2 z 2 + + − 1 = 0, a 2 b 2 c 2 represents the constraint. An inscribed box with vertices on this ellipsoid has edges of length 2 x, 2 y, and 2 z. (a) The inscribed box has volume V ( x, y, z) = 2 x · 2 y · 2 z = 8 xyz and this is the function we’d like to maximize with respect to the constraint above. The corresponding Lagrange function is x 2 y 2 z 2 ! L( x, y, z, λ) = 8 xyz − λ + + − 1 . a 2 b 2 c 2 CHAPTER 4. SOLUTIONS 81 The system that determines the stationary points of L is: ∂L 2 λx = 8 yz − = 0, ∂x a 2 ∂L 2 λy = 8 xz − = 0, ∂y b 2 ∂L 2 λz = 8 xy − = 0, ∂z c 2 ∂L x 2 y 2 z 2 ! = − + + − 1 = 0. ∂λ a 2 b 2 c 2 Multiplying the first three equations with yz, xz, and xy respectively, and then rearranging we obtain 4 a 2 y 2 z 2 = λxyz, 4 b 2 x 2 z 2 = λxyz, 4 c 2 x 2 y 2 = λxyz, and, ignoring cases with x = 0, y = 0, or z = 0 (Why?), we deduce x 2 y 2 z 2 = = . a 2 b 2 c 2 Pluging this into the constraint we get x 2 y 2 z 2 3 · = 1 , 3 · = 1 , and 3 · = 1, a 2 b 2 c 2 i.e., x = ± a √ , y = ± b √ , and z = ± c √ . Ignoring the signs (and solutions 3 3 3 with any of the edge lengths equal to 0), we deduce that a b c 8 abc V √ , √ , √ = √ 3 3 3 3 3 is the largest possible volume of the inscribed rectangular box. (b) We just need to replace the function to maximize, in this subtask it is S( x, y, z) = 2 x · 2 y + 2 x · 2 z + 2 y · 2 z = 4( xy + xz + yz). As we will see, we are now presented with a slightly more formidable problem. Anyway, let’s see how long we can consider this in full generality. The Lagrange function becomes x 2 y 2 z 2 ! L( x, y, z, λ) = 4( xy + xz + yz) − λ + + − 1 . a 2 b 2 c 2 CHAPTER 4. SOLUTIONS 82 Its stationary points are solutions of ∂L 2 λx = 4( y + z) − = 0, ∂x a 2 ∂L 2 λy = 4( x + z) − = 0, ∂y b 2 ∂L 2 λz = 4( x + y) − = 0, ∂z c 2 ∂L x 2 y 2 z 2 ! = − + + − 1 = 0. ∂λ a 2 b 2 c 2 Now, we multiply the first three equations with a 2 , b 2 , and c 2 , respec-2 2 2 tively, to obtain 2 a 2( y + z) = λx, 2 b 2( x + z) = λy, 2 c 2( x + y) = λz. This is now an eigenvalue problem. Namely, setting  0 2 a 2 2 a 2  x     A =         2 b 2 0 2 b 2 and x =  y ,         2 c 2 2 c 2 0   z the above system becomes A x = λ x. The characteristic polynomial of A is det( A − λI) = − λ 3 + 4( a 2 b 2 + a 2 c 2 + b 2 c 2) λ + 16 a 2 b 2 c 2. This is a so-called depressed cubic (it has no λ 2 term) and while its zeros can be directly expressed via Cardano’s formulae, we rather consider an ellipsoid of revolution, i.e., simplified case c = b. The above characteristic polynomial then becomes det( A − λI) = − λ 3 + 4(2 a 2 b 2 + b 4) λ + 16 a 2 b 4. It is now (almost) obvious that λ 1 = −2 b 2 is one of its zeros. To obtain remaining two zeros we first divide det( A − λI) by λ + 2 b 2: − λ 3 + 4(2 a 2 b 2 + b 4) λ + 16 a 2 b 4 = − λ 2 + 2 b 2 λ + 8 a 2 b 2. λ + 2 b 2 The zeros of this quadratic polynomial are √ λ 2 , 3 = b 2 ± 8 a 2 b 2 + b 4. Now that the possible λ’s are known, let’s determine x = [ x, y, z]T. Those would be the eigenvectors of A subject to the ‘normalization constraint’ x 2 y 2 z 2 + + = 1. a 2 b 2 b 2 CHAPTER 4. SOLUTIONS 83 • Start with λ 1 = −2 b 2. We have (assuming a , b) 2 b 2 2 a 2 2 a 2 1 0 0     A − λ   ∼   1 I =     2 b 2 2 b 2 2 b 2 0 1 1 ,         2 b 2 2 b 2 2 b 2 0 0 0 hence x = z[0 , 1 , −1]T for arbitrary (free variable) z ∈ R. That won’t produce a solution of interest, since x = 0. √ • Next one is λ 2 = b 2 + 8 a 2 b 2 + b 4. We get √  b 2 − 8 a 2 b 2 + b 4 2 a 2 2 a 2   √    A − λ   2 I =    2 b 2 − b 2 − 8 a 2 b 2 + b 4 2 b 2   √       2 b 2 2 b 2 − b 2 − 8 a 2 b 2 + b 4  q  1 0 1 1 − 8 a 2 + 1   2   b 2  ∼     .   0 1 −1      0 0 0  Therefore, a general eigenvector corresponding to λ 2 is  q 1   8 a 2 + 1 − 1   2   b 2  x = z        1       1  for arbitrary z ∈ R. A little more work shows that z which respects the constraint x 2 + y 2 + z 2 = 1 is a 2 b 2 b 2 √ ab 8 a 2 b 2 + b 4 − b 2 z = , where γ = . p γ 2 b 2 + 2 a 2 2 b 2 We will not continue to express x in full, but it is already clear that x = [ γz, z, z]T. Hence (8 γ + 4) a 2 b 2 S( γz, z, z) = 4 γz 2 + 4 γz 2 + 4 z 2 = z 2(8 γ + 4) = γ 2 b 2 +2 a 2 √ = b 2 + 8 a 2 b 2 + b 4. √ • The third and final case is λ 3 = b 2 − 8 a 2 b 2 + b 4. Now we obtain √  b 2 + 8 a 2 b 2 + b 4 2 a 2 2 a 2   √    A − λ   3 I =    2 b 2 − b 2 + 8 a 2 b 2 + b 4 2 b 2   √       2 b 2 2 b 2 − b 2 + 8 a 2 b 2 + b 4  q  1 0 1 1 + 8 a 2 + 1   2   b 2  ∼     .   0 1 −1      0 0 0  Repeating the rest of analogous steps as for λ 2 one would obtain √ S( γz, z, z) = b 2 − 8 a 2 b 2 + b 4 √ which is clearly a negative number. (This time γ = − 8 a 2 b 2+ b 4+ b 2 .) 2 b 2 CHAPTER 4. SOLUTIONS 84 In conclusion, the largest attainable surface area for a rectangular box inscribed inside an ellipsoid of revolution is √ S = b 2 + 8 a 2 b 2 + b 4. (We will not explicitly express the side lengths of this rectangular box, but an enthusiastic reader may well do so.) A comment: Relying on intuition can sometimes fail. A sphere is a special case of an ellipsoid (the one with c = b = a) and intuition would say that the inscribed box with the largest volume or surface area is a cube with side length 2 a √ . (Which is correct!) Stretching the sphere 3 produces an ellipsoid and stretching the cube produces a rectangular box. Now, intuition might say that we simply need to stretch the cube which maximizes volume and surface area. The resulting volume and surface area for an ellipsoid of revolution ( c = b) would then be 8 ab 2 8 ab + 4 b 2 V = √ and S = . 3 3 3 While the formula for the maximal attainable volume is correct, the one for maximal attainable area is false! Solution to problem 3.21, page 20: Denote the lengths of the edges of this box by a, b, and c. Since we assembled the box frame from a rod of length ℓ, we must have 4 a + 4 b + 4 c = ℓ. This is our constraint. (a) The volume of the box is V ( a, b, c) = abc, and this is precisely the function we must maximize, subject to the constraint above. Let’s rewrite the constraint as 4 a + 4 b + 4 c − ℓ = 0 and form the Lagrange function L( a, b, c, λ) = abc − λ(4 a + 4 b + 4 c − ℓ). The stationary point of L are solutions of the system ∂L = bc − 4 λ = 0, ∂a ∂L = ac − 4 λ = 0, ∂b ∂L = ab − 4 λ = 0, ∂c ∂L = −(4 a + 4 b + 4 c − ℓ) = 0. ∂λ We quickly gather from the first three equations that bc = ac = ab holds. While we could safely ignore the solutions with any of the a, b, or c equal to 0, let’s be strict (once) and find all solutions to this system. If a = 0, then we must have λ = 0. (Consider either the second or the third equation.) Now, from the first equation (and since λ = 0) we must have either b = 0 or c = 0. If b = 0, we have c = ℓ , if c = 0, we have b = ℓ . 4 4 CHAPTER 4. SOLUTIONS 85 There’s nothing special about starting the reasoning above with ‘if a = 0’. The conclusion has the same form: Any solution with one of the a, b, or c equal to 0, will force λ = 0, two of the a, b, and c equal to 0, and the remaining one equal to ℓ . These ( a, b, c, λ)-solutions are 4 ( ℓ , 0 , 0 , 0) , (0 , ℓ , 0 , 0) , and (0 , 0 , ℓ , 0). 4 4 4 They are also not the ones that interest us—the volume of the resulting (degenerate) box is 0, i.e., these solutions represent the minima. So let’s assume that none of the a, b, c are equal to 0. Then, from bc = ac = ab, we deduce that b = a and c = b, i.e., a = b = c. Plugging this into the constraint, we get a = ℓ and 12 ℓ ℓ ℓ ℓ 3 V , , = 12 12 12 12 is the largest possible volume of such a box frame. So the box frame with largest possible volume we can assemble from a rod of length ℓ is in fact a frame of a cube. (b) The additional restriction is in fact an additional constraint, ab = A must hold. We’ll rewrite this as ab − A = 0. The Lagrange function will now depend on two Lagrange multipliers, we’ll denote them by λ and µ: L( a, b, c, λ, µ) = abc − λ(4 a + 4 b + 4 c − ℓ) − µ( ab − A). We now solve the system: ∂L = bc − 4 λ − µb = 0, ∂a ∂L = ac − 4 λ − µa = 0, ∂b ∂L = ab − 4 λ = 0, ∂c ∂L = −(4 a + 4 b + 4 c − ℓ) = 0, ∂λ ∂L = −( ab − A) = 0. ∂µ From the third equation we have 4 λ = ab, and replacing 4 λ with ab in the first two equations we get bc − ab − µb = 0 ∴ b( c − a − µ) = 0, ac − ab − µa = 0 ∴ a( c − b − µ) = 0. We’re assuming that A > 0, and, since ab = A (from the fifth equation), a , 0 and b , 0. Hence, c − a − µ = 0, c − b − µ = 0, which implies a = b. From the second constraint we now get a 2 = A or √ a = A. (We ignore the negative solution, since the length cannot be CHAPTER 4. SOLUTIONS 86 negative.) To finish, we use the first constraint (the fourth equation in √ √ our ‘Lagrange system’) to get 8 A+4 c = ℓ or c = ℓ −2 A. The maximum 4 possible volume in this case is therefore √ √ √ √ ℓ ℓ V A, A, − 2 A = A · − 2 A . 4 4 The solution is, in a sense, expected. We obtained a box with base rect- √ angle a square with side length A, the remainder of the rod length is then used for four vertical edges. Solution to problem 3.22, page 20: This is similar to the previous exercise. Denote by a the side length of the base equilateral triangle and by h the prism’s height. The volume of such a prism is √ a 2 h 3 V ( a, h) = , 4 and its surface area is √ a 2 3 A( a, h) = + 3 ah. 2 Having ℓ meters of the rod available, means that 6 a + 3 h = ℓ. That will be our constraint. (a) We need to maximize V with respect to the constraint. The Lagrange function is √ a 2 h 3 L( a, h, λ) = − λ(6 a + 3 h − ℓ). 4 It stationary points are solutions of √ ∂L ah 3 = − 6 λ = 0, ∂a 2√ ∂L a 2 3 = − 3 λ = 0, ∂h 4 ∂L = −(6 a + 3 h − ℓ) = 0. ∂λ Let’s multiply the first equation with 2 and the second one with 4 to get √ ah 3 − 12 λ = 0, √ a 2 3 − 12 λ = 0. From this we get ah = a 2 ∴ ah − a 2 = 0 ∴ a( h − a) = 0. Ignoring the solutions with a = 0, we get h = a. (The reader is invited to consider the solutions we just ignored. What do they represent?) Hence, from the constraint, we get 9 a − ℓ = 0 or a = h = ℓ . That means that we 9 need to cut up the rod into 9 pieces of equal lenght, and the resulting maximal volume is √ ℓ ℓ ℓ 3 3 V , = . 9 9 4 · 93 CHAPTER 4. SOLUTIONS 87 (b) We now need to maximize A with respect to our constraint. The Lagrange function is √ a 2 3 L( a, h, λ) = + 3 ah − λ(6 a + 3 h − ℓ). 2 It stationary points are now solutions of ∂L √ = a 3 + 3 h − 6 λ = 0, ∂a ∂L = 3 a − 3 λ = 0, ∂b ∂L = −(6 a + 3 h − ℓ) = 0. ∂λ Note that λ = a from the second equation, and, plugging this into the first equation, we get √ √ a 3 + 3 h − 6 a = 0 ∴ 3 h = (6 − 3) a. Substituting that instead of 3 h into the third equation, we have √ √ ℓ 12 + 3 6 a + (6 − 3) a − ℓ = 0 ∴ a = √ = ℓ 0 . 09739 ℓ, 12 − 3 141 and √ √ √ (6 − 3) ℓ (6 − 3) ℓ 23 − 2 3 3 h = √ ∴ h = √ = ℓ 0 . 13855 ℓ. 12 − 3 3(12 − 3) 141 The reader is invited to evaluate the resulting maximal attainable area. Solution to problem 3.23, page 20: (a) We’d like to find extreme values of the function f (x) = aTx subject to the constraint ∥x∥ = d. We first rewrite the constraint as ∥x∥2 = d 2 ∴ ∥x∥2 − d 2 = 0 ∴ xTx − d 2 = 0 | {z } g(x) and set the Lagrange function as L(x , λ) = f (x) − λg(x) = aTx − λ(xTx − d 2). The stationary points of L are again the solutions of the system ∂L = aT − 2 λ xT = 0, ∂ x ∂L = −(xTx − d 2) = 0. ∂λ It follows from the first equation that a = 2 λ x, i.e., x and a are parallel. Let’s write this as x = α a and plug it into the second equation: d 2 d α 2aTa − d 2 = 0 ∴ α 2 = ∴ α = ± . ∥a∥2 ∥a∥ CHAPTER 4. SOLUTIONS 88 Hence, vectors x, at which extreme values of f are attained, are d x = α a = ± a, ∥a∥ and these extreme values are d d f ± a = aT ± a = ± d∥a∥. ∥a∥ ∥a∥ (b) Extreme value of the expression aTx (the dot product of a and x) on the sphere with equation ∥x∥ = d will be attained precisely when x is parallel to a. Solution to problem 3.24, page 20: (a) Let’s rewrite the constraint as ∥x∥2 = d 2 or xTx − d 2 = 0. The Lagrange function corresponding to our problem is L(x , λ) = xT A x − λ(xTx − d 2). Its stationary points are, as usual, the solutions to the system ∂L = xT( A + A T) − 2 λ xT = 0, ∂ x ∂L = −(xTx − d 2) = 0. ∂λ Note that the first (system of) equation(s) can be rewritten as A + A T x = λ x, 2 i.e., x is an eigenvector of A+ A T (and λ is the corresponding eigenvalue). 2 (Since A+ A T is symmetric, its eigenvalues and its eigenvectors are real, 2 i.e., λ ∈ n R and x ∈ R .) Another thing to notice is A + A T 1 1 xT x = xT A x + xT A Tx = · 2xT A x = xT A x = f (x). 2 2 2 So, for an eigenvector x of A+ A T with ∥x∥ = d, we have 2 A + A T f (x) = xT x = xT λ x = λd 2. 2 Finally, the extreme values of f subject to ∥x∥ = d can be identified as λ max d 2 and λ min d 2, where λ max and λ min are the largest and the smallest eigenvalues of A+ A T , respectively. 2 (b) Now the constraint is xT A x = d 2, which we rewrite as xT A x − d 2 = 0. For the Lagrange function we have L(x , µ) = xTx − µ(xT A x − d 2). CHAPTER 4. SOLUTIONS 89 (The decision to denote the Lagrange multiplier by µ will become clear later.) Its stationary points are solutions of ∂L = 2xT − 2 µ xT A = 0, ∂ x ∂L = −(xT A x − d 2) = 0. ∂µ (We used the fact that A is symmetric when evaluating the first derivative.) Rewriting the first equation as 1 A x = x, µ we see that 1 must be an eigenvalue of A, with x the corresponding µ eigenvector. (Note that 1 makes sense as an eigenvalue of A, since A is µ a definite matrix.) From the second equation (the constraint), we now obtain 1 ! xT A x = d 2 ∴ xT x = d 2 ∴ ∥x∥2 = µd 2 or f (x) = µd 2. µ Hence, denoting the smallest and the largest eigenvalues of A by λ min and λ max, respectively, we have that d 2 is the largest value attained by f , and λ min d 2 is the smallest value attained by f . λ max (We used that λ = 1 for an eigenvalue λ of A, and also the fact that the µ eigenvalues of A are positive.) Solution to problem 3.25, page 20: (a) The inequality ∥x − p∥ ≤ d determines the closed n R ball of radius d centered at p. We split the solution into two subtasks: extrema in the interior (deter- ∥ mined by ∥x − p∥ < d) and extrema on the boundary x − p∥ ≤ d (the sphere determined by ∥x − p∥ = d). • The interior: This is easy, the only stationary point of f is x = 0 and this is a candidate if and only if ∥p∥ < d, i.e., 0 is actually contained in the interior. • The boundary: Write the constraint as ∥x − p∥ = d ∴ ∥x − p∥2 − d 2 = 0 ∴ (x − p)T(x − p) − d 2 = 0, and let’s set up the corresponding Lagrange function L(x , λ) = xTx − λ (x − p)T(x − p) − d 2 . CHAPTER 4. SOLUTIONS 90 Now, ∂L = 2xT − λ(2xT − 2pT) = 0T, ∂ x ∂L = − ∥x − p∥2 − d 2 = 0. ∂λ The first equation implies that (2 − 2 λ)x = −2 λ p, i.e., x and p must be parallel. Let’s write this as x = α p for some α ∈ R and plug this into the second equation d d ∥ α p−p∥ = d ∴ ∥( α−1)p∥ = d ∴ | α−1| = ∴ α = 1± . ∥p∥ ∥p∥ Hence, d ! x = 1 ± p, ∥p∥ which should be the expected solution. Finally, if ∥p∥ < d, then the minimal value of f is attained in the interior at 0, and is f (0) = 0. In case ∥p∥ ≥ d, the minimal value of f is attained on the boaundary at 1 − d p, and is equal to ∥p∥2 + d 2 − 2 d∥p∥. (As a ∥p∥ sanity check, notice that in the boundary case ∥p∥ = d that last expression is 0, as it should be.) (b) The equation A x = b determines an affine subspace n R of n R . The appropriate Lagrange function now is L(x , λ) = xTx − λ T( A x − b). (Note that λ is now a column vector, with each A x = b component corresponding to one of the boundary conditions!) Its stationary points are solutions of ∂L = 2xT − λ T A = 0T, ∂ x ∂L = −( A x− b) = 0. ∂λ From the first equation we have x = 1 A T λ. We plug this into the second 2 equation to get 1 AA T λ = b. 2 If we assume that A is of full rank (so that AA T is invertible), we can express λ = 2 AA T−1 b, and therefore 1 x = A T λ = A T AA T−1 b. 2 (In general, i.e., for non-full rank matrices A, the solution is x = A+b, where A+ is the Moore–Penrose inverse of A.) CHAPTER 4. SOLUTIONS 91 Finally, the minimal value of f on that affine subspace is f A T AA T−1 b = bT AA T−1 AA T AA T−1 b = bT AA T−1 b for the case when A is of full rank. (That −1 superscript needs to be replaced by a + superscript for A which is not of full rank.) (c) Let’s consider the boundary case first, i.e., we min-n R imize f (x) = ∥x∥2 with respect to ∥x − p∥ = d and A x = b. The Lagrange function is L(x , µ, λ) = xTx − µ(∥x − p∥2 − d 2) − λ T( A x − b). ∥x − p∥ ≤ d A x = b Its stationary points are solutions of the system ∂L = 2xT − 2 µ(x − p)T − λ T A = 0T, ∂ x ∂L = −(∥x − p∥2 − d 2) = 0, ∂µ ∂L = −( A x − b) = 0. ∂λ Let’s start: From the first equation we get 1 (2 − 2 µ)x = A T λ − 2 µ p ∴ x = A T λ − 2 µ p . 2 − 2 µ A side note: This last expression is not well defined if µ = 1. The case µ = 1 implies that the first equation of the ‘Lagrange system’ does not depend on x. (The first equation simplifies to 2p− A T λ = 0.) That means that x is only determined by the second and the third equation which describe a sphere in the affine subspace given by A x = b. If that whole sphere consists of stationary points of f (x) = ∥x∥2, i.e., f is constant on that sphere, then the origin of that sphere is the orthogonal projection of 0 onto the affine subspace A x = b. (Namely, the origin of the sphere is given by p = A+b = A T AA T−1 b.) It is clear that in this case the minimal value of f is attained at that origin, i.e., in the interior of the ball bounded by that sphere and not on the boundary sphere itself. So, assuming µ , 1 we plug x from above into the third equation to get A A T λ − 2 µ p = (2 − 2 µ)b ∴ AA T λ = (2 − 2 µ)b + 2 µA p, hence λ = AA T−1 ((2 − 2 µ)b + 2 µA p) . CHAPTER 4. SOLUTIONS 92 Now we plug this into the expression for x and obtain 1 x = A T AA T−1 ((2 − 2 µ)b + 2 µA p) − 2 µ p 2 − 2 µ 1 = (2 − 2 µ) A T AA T−1 b + 2 µ( A T A p − p) 2 − 2 µ µ = A T AA T−1 b + ( A T A − I)p 1 − µ = A T AA T−1 b + α( A T A − I)p, where we introduced α = µ in the last line. Finally, we use the con-1− µ straint ∥x − p∥2 = d 2. Plugging the above expression for x into it and rearranging we obtain a quadratic equation for α, namely α 2 pT( A T A − I)2p + 2 α pT A Tb − pT A T( AA T)−1b − pT( A T A − I)p + + bT( AA T)−1b − 2pT A T( AA T)−1b + pTp − d 2 = 0 Solving this equation for α and plugging the solution into our expression for x, we finally obtain the points, at which extreme values of f (x) = xTx are attained. Conveniently, we leave that task to the reader. A final case to consider: f attains its minimal value at a point in the interior of the ball ∥x−p∥ ≤ d within the affine subspace A x = b. This can only happen if the orthogonal projection of 0 onto A x = b is contained in that ball, i.e., if ∥ A+b − p∥ < d. In that case, the minimal value is the same as in part (b) of this task. Document Outline Linear algebra A recollection of basic concepts Schur, Frobenius, Eckart–Young Vector spaces and linear maps Vector spaces Linear maps Functions of several variables Multiple integrals Local extrema of real multivariate functions Constrained extrema Solutions