Monday, December 15, 2008

INTRODUCTION TO INEQUALITIES


Abstract. This is a somewhat modified version of the notes I had prepared for a lecture on inequalities

that formed part of a training camp organized by the Association of Mathematics Teachers of India for

preparation for the Indian National Mathematical Olympiad (INMO) for students from Tamil Nadu.

1. Basic idea of inequalities

1.1. What we need to prove. An “inequation” is an expression of the form:

F _ 0

where F is an expression in terms of certain variables. An “inequality”’ is an inequation that is

satisfied for all values of the variables (within a certain range).

For instance:

x2 x + 1 _ 0

and

x2 x 1 _ 0

are both inequations. Among these, the first inequation is true for all real x, while the second

inequation is true for all values of x within a certain range.

Thus, when we talk of an inequality, we have the following in mind:

The underlying inequation

The range of values over which the inequality is true

A strict inequation is an inequation of the form:

F > 0

where F is an expression in terms of the variables.

Given any inequation F _ 0 we can consider the corresponding strict inequation F > 0.

Thus, when studying an inequality, we are interested in:

The underlying inequation

The range of values over which the inequality is true

The values for which exact equality holds

Some other points to note:

Any inequation of the form F _ G where F and G are both expressions can be written in the

standard form as F G _ 0. The original inequation is true for precisely those values for which

the standard form is true. The equality conditions are also the same.

An inequation of the form F _ G can be expressed as GF _ 0. Again, the original inequation

is true for precisely those values for which the standard form is true. The equality conditions

are also the same.

c Vipul Naik, B.Sc. (Hons) Math and C.S., Chennai Mathematical Institute.

1

1.2. No square is negative. This basic inequality states:

x2 _ 0

The range is all x 2 R and equality holds iff x = 0.

This can be generalized to something of the form:

(f(x1, x2, . . . , xn))2 + (g(x1, x2, . . . , xn))2 _ 0

The range is all x 2 R and equality holds iff f(x1, x2, . . . , xn) = g(x1, x2, . . . , xn) = 0.

Problem 1. Prove that x4 x2y2 + y4 _ 0 for all real x and y, equality holding iff x = y = 0.

Proof. We use:

x4 x2y2 + y4 = (x2 y2)2 + (xy)2

Thus, (x2 y2) plays th role of f above and xy plays the role of g

Clearly then, the left-hand-side is nonnegative, and is 0 if and only if x2 = y2 and xy = 0, thus forcing

x = y = 0. _

We can extend the idea to sums of more than two squares:

Problem 2. Prove that a2 + b2 + c2 + ab + bc + ca _ 0 with equality holding only if a = b = c = 0.

Proof. The left-hand-side can be expressed as 1/2(a2 + b2 + c2 + (a + b + c)2). So it is nonnegative and

can be zero only if a = b = c = 0.

Alternatively, the left hand side can also be written as 1/2((a+b)2 +(b+c)2 +(c+a)2) and is hence

nonnegative, taking the value 0 if and only if a = b = c = 0 _

Another problem (for which I’m not writing the solution here):

Problem 3. Prove that a2 + b2 + c2 (ab + bc + ca) _ 0 with equality holding only if a = b = c.

It turns out that one of the solution techniques for the previous problem can be applied to this one.

1.3. Manipulating about the inequality symbol. The following results are typically used for manipulating

inequalities:

We can add two inequalities. The greater side gets added to the greater side, the smaller side

to the smaller side. If either inequality is strict, the resultant inequality is again strict. More

generally, the set of values for which the resultant inequality becomes equality is the intersection

of the corresponding sets for each inequality.

We can multiply both sides of an inequality by a positive number. In general, however, we cannot

multiply two inequalities.

2. Mean inequalities

2.1. Definition of means. A mean is a good notion of average for a collection of numbers. A mean of

n numbers is thus typically a function from n-tuples of reals to reals, such that:

If all the members of the tuple are equal, the mean should be equal to all of them. That is, if

a = a1 = a2 = . . . an then the mean of a1, a2, . . . , an is a.

The mean is a symmetric function of all the elements of the tuple, that is, if the elements are

permuted, the value of the mean remains unchanged. That is, the mean of a1, a2, . . . , an is the

same as the mean of a_(1), a_(2), . . . , a_(n).

2

The mean of a collection of positive numbers should be between the smallest number and the

largest number. That is, if a1 _ a2 _ . . . _ an, the mean lies between a1 and an.

The mean is an increasing function in each of the arguments. That is, if ai _ a0i, then the mean of

a1, a2, . . . , ai1, ai, ai+1, . . . , an is less than or equal to the mean of a1, a2, . . . , ai1, a0i, ai+1, . . . , an.

We now define some typical notions of mean:

Definition. (1) The arithmetic mean(defined) of n real numbers a1, a2, a3, . . . , an is defined as:

a1 + a2 + . . . an

n

The arithmetic mean is a well-defined notion for any collection of real numbers (positive, negative

or zero).

(2) The geometric mean(defined) of n positive real numbers a1, a2, a3, . . . , an is defined as

(a1a2 . . . an)1/n

The geometric mean is defined only for positive numbers.

(3) The quadratic mean(defined) or the root-mean-square of n real numbers a1, a2, a3, . . . , an is

defined as:

r

a21

+ a22

+ . . . + a2

n

n

(4) The harmonic mean(defined) of n nonzero real numbers a1, a2, a3, . . . , an is defined as:

a1

1 + a1

2 + . . . + a1

n

n

1

For two positive reals a and b, these boil down to the formulas:

Name of the mean Value

Arithmetic mean (a+b)

2

Geometric mean pab

Quadratic mean

q

a2+b2

2

Harmonic mean 2ab

a+b

2.2. Inequalities for two variables.

Claim. For positive reals a and b, Q.M. _ A.M. _ G.M. _ H.M.

Proof. We prove Q.M. _ A.M. The remaining proofs follow along similar lines:

What we would like to show is that, for all reals a and b:

r

a2 + b2

2 _

a + b

2

Since the left side is nonnegative, it suffices to show that the square of the left side is greater than or

equal to the square of the right side. That is, we need to show that:

a2 + b2

2 _

(a + b)2

4

But the latter rearranges to (a b)2 _ 0. This tells us that the inequality is valid for all real a and b

with equality holding iff a = b. _

3

Let’s look at the pattern. The Q.M. is essentially obtained by taking the arithmetic mean of squares

and then taking squareroot. The A.M. is obtained by taking the arithmetic mean of first powers and

then taking the first root. The H.M. is obtained by taking the arithmetic mean of inverses and then

taking the inverse. This suggests a general definition:

Mr(a, b) =

_

ar + br

2

_1/r

Then the quadratic mean is M2, the arithmetic mean is M1, and the harmonic mean is M1.

By this definition, M0 does not make sense. But it turns out that, through a suitable limit argument,

we can take M0 as the geometric mean. In that case, we have:

M2 _ M1 _ M0 _ M1

We also know that:

2 _ 1 _ 0 _ −1

Does this suggest something?

2.3. The mean inequalities: an explanation. Let a and b be positive reals. What can we say about

the behaviour of Mr(a, b) as r varies from −1 to 1. It turns out that as r ! −1, Mr approaches

min{a, b}, and as r ! 1, Mr ! max{a, b}. Thus, as r steadily increases, Mr(a, b) steadily goes from

the minimum to the maximum.

The explanation for this can be sought by viewing the r as a kind of weighting of a and b. The greater

the value of r, the greater the dominance of the bigger term, and hence, the greater the mean is to the

bigger term.

2.4. The mean inequalities for many variables. The same phenomena which we observe for two

variables also generalize to more than two variables. We define:

Mr(a1, a2, . . . , an) =

_

ar

1 + ar

2 + . . . ar

n

n

_1/r

Again, as r ! −1, Mr approaches the minimum of the ans, and as r ! 1, Mr approaches the

maximum of the ans.

3. Cauchy-Schwarz inequality

3.1. Statement. Let (a1, a2, . . . , an) and (b1, b2, . . . , bn) be two n-tuples of real numbers. Then:

(

X

i

a2i

)(

X

i

b2i

) _ (

X

i

(aibi))2

With equality holding if and only if one of the tuples is zero or if bi = _ai for some fixed _ independent

of i (that is, the tuple of bis is a scalar multiple of the tuple of ais).

3.2. Vector interpretation. The vector interpretation of Cauchy Schwarz inequality looks at both

a = (a1,ma2, . . . , an) and b = (b1, b2, . . . , bn) as vectors in Rn. Then, the left-hand-side is:

|a|2 |b|2

where |a| denotes the magnitude or length of the vector a

The right-hand-side is the square of the dot product of the vectors, which is the same as:

(a.b)2 = |a|2 |b|2 cos2 _

where _ is the angle between the vectors. Since cos2 _ _ 1 and quality holds if and only if a and b are

collinear, we get a geometric proof of Cauchy-Schwarz inequality.

4

3.3. A trigonometric problem. Consider the following problem:

Problem 4. Maximize

a cos _ + b sin _

as a function of _ where a and b are fixed reals (and not both zero).

The idea is to view this as a dot product of vectors (a, b) and (cos _, sin _). We have:

(a2 + b2)(cos2 _ + sin2 _) _ (a cos _ + b sin _)2

Since cos2 _ + sin2 _ = 1, we obtain:

(a cos _ + b sin _) _

p

a2 + b2

A necessary and sufficient condition for the magnitude of the left-hand side to be pa2 + b2 is that

a/ cos _ = b/ sin _, giving tan _ = b/a. Among the two possible values for the pair (cos _, sin _) we must

pick the one making a cos _ + b sin _ positive.

3.4. A geometric problem. Consider the following problem:

Problem 5. Let A and B be two points in a plane at distance 1. Find the maximum length of a path

from A to B, comprising at most n line segments, with the property that at every stage, the distance

from B is reducing.

The answer is pn.

Proof. The idea of the proof is to use induction on n. Let f(n) denote the maximum value for a given n.

We observe that any such optimal path is memoryless in the following sense:

Suppose is a path from A to B comprising at most n line segments, and suppose that the first line

segment of ends at a point P. Now, the part from P to B must be composed of (n 1) line segments

with the property that at every stage, the distance from B is reducing.

Now, whatever path we choose, we could replace it by a path of maximum length from P to B

comprising (n 1) line segments and with the property that distance from B is reducing. Since the

original thing was longest, we conclude that the part from P to B must also be the longest one.

Now what is the longest possible path of (n 1) line segments from P to B? Since lengths scale, it

is the length PB times the value f(n 1). We thus get:

length of = AP + PBf(n 1)

Thus the maximum of the possible lengths of is the maximum over all P of the above expression.

Now, from the fact that along the path AP, the distance from P is steadily reducing, we obtain that

the angle \APB is either obtuse or right. Thus, in particular, for any given length AP, we have:

PB _

p

1 AP2

If equality does not hold, we could replace P by another point Q such that AQ = AP and such that

\AQB = _/2. Then, QB would be greater than PB, and hence, the length of the longest path would

increase. Hence, we conclude that equality does indeed hold for the longest path, viz \APB = _/2.

Let _ be \BAP. Then AP = cos _ and PB = sin _. We thus get:

length of = max

_

cos _ + f(n 1) sin _

Thus, applying the result of the previous problem:

f(n) =

p

1 + (f(n 1))2

Since f(1) = 1 (clearly) we get f(n) = pn. _

5

4. Rearrangement and Chebyshev inequality

4.1. Rearrangement inequality: statement. Let (a1, a2, . . . , an) and (b1, b2, . . . , bn) be two n-tuples

of real numbers such that a1 _ a2 _ . . . _ an and b1 _ b2 _ . . . bn. Let _ be a permutation of the

numbers 1, 2, . . . , n. Then:

X

i

aibi _

X

i

aib_(i)

In other words, the sum of pairwise products is maximum if we pair the largest with the largest, the

second largest with the second largest, and so on.

Equality holds if and only if, for each i, ai = a_(i) or bi = b_(i).

Further:

X

i

aib_(i) _

X

i

aibn+1i

In other words, the sum of pairwise products is minimum if we pair the largest with the smallest, the

second largest with the second smallest, and so on.

4.2. Idea behind the inequality. Think of it as a resource allocation problem. For instance, suppose

a thief has 3 bags and 3 kinds of coins (gold, silver, copper) to pack in them, and she must pack a

different kind of coin in each bag. Assume further that the coins are available in unlimited quantities.

Then, in order to maximize her loot, she will put the gold coins in the biggest bag, the silver coins in

the second biggest bag, and the copper coins in the third biggest bag.

The idea is: send the most to the best. Such an allocation principle is often called a greedy allocation

principle.

The Rearrangement inequality is best proved for two elements, and then extended by induction. Let

a1 _ a2 and b1 _ b2. Then we have:

(a1 a2)(b1 b2) _ 0

Manipulating this gives us:

a1b1 + a2b2 _ a1b2 + a2b1

The rearrangement inequality thus illustrates the general statement the principles of optimization and

equality are often at crossroads.

To use this to prove the result globally, we start with the expression

P

i aib_(i) and locate indices i, j

for which i <>but _(i) > _(j). We then change the permutation to one sending i to _(j) and j to _(i)

(and having the same effect as _ on the others). This local change increases the value of the expression

and hence it is clearly not the optimum value.

Note here that equality holds only if ai = aj or b_(i) = b_(j).

4.3. An application of rearrangement. Consider the following problem I had mentioned earlier:

Prove that a2 + b2 + c2 (ab + bc + ca) _ 0 with equality holding only if a = b = c.

This problem can also be solved using the rearrangement inequality. First observe that since the

expression is symmetric in a, b and c, we can assume without loss of generality that a _ b _ c.

Consider the triple (a, b, c). This is an ordered triple with the property that the elements are in nonincreasing

order. Then (b, c, a) is a permutation of this expression. Thus, by rearrangement inequality:

aa + bb + cc _ ab + bc + ca

Which gives us what we want.

Also note that in this case, equality holds if and only if a = b = c.

4.4. Chebyshev inequality. Chebyshev inequality says that sending the most to the best is better

than giving the average to the average. More formally, if (a1, a2, . . . , an) and (b1, b2, . . . , bn) are two

n-tuples of decreasing reals:

X

i

aibi _

P

i ai

P

i bi

n

Where equality holds iff either all the ais are equal or all the bis are equal.

6

4.5. Fundamental difference between Chebyshev and Cauchy-Schwarz. Both the Chebyshev

and the Cauchy-Schwarz inequalities are similar in the following sense:

They are both true for all reals

They both provide bounds of

P

i aibi

But they are different in the following ways:

In Chebyshev, it is important to order the ais and bis in descending order, whereas Cauchy-

Schwarz is applicable for any ordering

Chebyshev gives a bound in terms of

P

i ai and

P

i bi while Cauchy-Schwarz gives a bound in

terms of the sums of their squares.

Chebyshev provides a lower bound on

P

i aibi while Cauchy-Schwarz provides an upper bound

The equality case is different in both. In Chebyshev, equality holds if all the elements in one of

the tuples are equal. In Cauchy-Schwarz, equality holds if the two tuples are scalar multiples of

one another.

A word of caution, though, when deciding whether to apply Chebyshev or Cauchy-Schwarz. Just

because the inequality seems to require a lower bound on

P

i FiGi, does not mean that Chebyshev is the

one to be used. In fact, we could still use Cauchy-Schwarz by taking ai = FiGi and bi to be 1/Fi.

5. Nesbitt’s inequality

5.1. Statement of the inequality.

Problem 6 (Nesbitt’s inequality). For positive a, b and c, prove that:

a

b + c

+ b

c + a

+ c

a + b _

3

2

with equality holding if and only if a = b = c.

5.2. Applying Cauchy-Schwarz (direct application fails). To apply Cauchy-Schwarz we need to

put the terms a

b+c and its analogues on the left side, which means we should view each of them as a

square. Their squareroots are

q

a

b+c and its analogues. Thus, one tuple is:

r

a

b + c

,

r

b

c + a

,

r

c

a + b

!

We would like the other tuple to be ��pb + c,pc + a,pa + b something that cancels the denominator. A natural choice is

_

. Unfortunately, this fails to yield the answer, because the expression that we

get is upper-bounded, rather than lower-bounded, in the case of equality.

5.3. Applying Chebyshev. Consider the tuples (a, b, c) and ((b + c)1, (c + a)1, (a + b)1). We first

need to determine whether they are arranged in the same order. Assume without loss of generality that

a _ b _ c. Then b + c _ c + a _ a + b, and taking inverses, we obtain that the second tuple also has its

coordinates in descending order.

We are thus in a position to apply Chebyshev’s and obtain that the give expression is at least:

(a + b + c)((b + c)1 + (c + a)1 + (a + b)1)

3

Now using A.M.-H.M. inequality for the quantities (b + c), (c + a) and (a + b), we get the required

result.

5.4. A short proof. Another way of proving the result is to add and subtract 3, thus writing it as:

(a + b + c)

_

1

b + c

+

1

c + a

+

1

a + b

_

And now apply the A.M.-H.M. inequality.

7

6. A past IMO problem

6.1. The problem statement.

Problem 7 (IMO 1995). Prove that if a, b and c are positive reals such that abc = 1, then:

1

a3(b + c)

+

1

b3(c + a)

+

1

c3(a + b) _

3

2

The first trick is to put x = 1/a, y = 1/b and z = 1/c. The left-hand side becomes:

x2

y + z

+ y2

z + x

+ z2

x + y

6.2. Cauchy-Schwarz. After this point, the first possibility to consider is Cauchy-Schwarz. Since we

want to lower-bound the sum here, we must view x2

y+z and its analogues as squares of a tuple. The other

tuple is obtained by cancelling denominators from third tuple. We thus have tuples:

_

x

py + z

,

y

pz + x

,

z

px + y

_

and

��py + z,pz + x,px + y

_

We apply Cauchy-Schwarz to these tuples, and then use A.M.-G.M. inequality and the fact that

xyz = 1.

If we keep track of the inequality constraints at each step, we obtain that equality holds if and only

if x = y = z = 1, and hence a = b = c = 1.

8

Index

arithmetic mean, 2

Cauchy-Schwarz inequality, 3

Chebyshev inequality, 5

geometric mean, 2

harmonic mean, 2

mean

arithmetic, 2

geometric, 2

harmonic, 2

quadratic, 2

quadratic mean, 2

rearrangement inequality, 4

9