When we last saw our intrepid hero, he was differentiating functions. Here's a review of Chapter 2: A function is a relationship between pairs of numbers, you can draw a graph, and when you look at the graph there's only one Y value for any given X value. A continuous function doesn't have any gaps. A continuously differentiable function doesn't have any sharp points. The derivative of Xn is nXn-1. That's everything important from Chapter 2.
Here's some examples of differentiating some functions:
Y = X5 | dY/dX = 5X4 |
Y = 7X11 | dY/dX = 77X10 |
Y = 3X100 | dY/dX = 300X99 |
Y = 11X7 | dY/dX = 77X6 |
Y = 6 | dY/dX = 0 |
In this chapter, we're going to amplify on this result.
First, derivatives are linear. This means d/dx (A + B) = dA/dx + dB/dx. This is sometimes called the distributive law, but we just call it linear. This rule means if a function has several parts added together, you can differentiate them one part at a time, and add the results. Example:
Y | = | X5 | + | 7X11 | + | 3X100 | + | 11X7 | + | 6 |
dY/dX | = | 5X4 | + | 77X10 | + | 300X99 | + | 77X6 | + | 0 |
Here's a very clever little function, it's very proud of itself:
X2 | X3 | X4 | X5 | ||||
Y = 1 + X + | + | + | + | ||||
2 | 6 | 24 | 120 |
Let's differentiate it. Remember, we can do this term-by-term, it's very easy. The derivative of 1 is zero; the derivative of X is 1, and so on. Here's the result:
X2 | X3 | X4 | |||
Y = 1 + X + | + | + | |||
2 | 6 | 24 |
This is interesting: the function looks just like it did before we started, except the last term is missing. This is not a coincidence: I carefully picked this function so that it would have this property. Here's how I picked it: every time you differentiate Xn, you multiply by n. So, I made the denominator under X2 be 2*1; under X3 I put 3*2*1; under X4 I put 4*3*2*1; and under X5 I put 5*4*3*2*1.
This idea, you multiply all the numbers less than or equal to N together, it comes up a lot. There's a name for this function, and a special symbol for it. The name is factorial, and the symbol for N factorial is N!. In mathematics, an exclamation point does not mean that the number is excited, nor does it mean that the author is excited (in fact, mathematicians like to avoid strong emotions.) It means N * (N-1) * (N-2) * . . . The symbol ". . ." is called "ellipses", and is read out loud as "dot dot dot" or as "and so on." ". . ." means just keep going.
0! is 1, by definition. Negative numbers don't have factorials. Neither do fractional numbers. Only positive integers have factorials. Now, just as you might guess, mathematicians worried about the fractional numbers and negative numbers feeling left out. So, they made up a different function that takes care of them - it's called the gamma function. We're not going to worry about the gamma function in this book.
If we differentiate our cute little function again and again, we get:
X2 | X3 | X4 | X5 | ||||
Y = 1 + X + | + | + | + | ||||
2 | 6 | 24 | 120 |
dY | X2 | X3 | X4 | |||
= 1 + X + | + | + | ||||
dX | 2 | 6 | 24 |
d | dY | X2 | X3 | ||
= 1 + X + | + | ||||
dX | dX | 2 | 6 |
d | d | dY | X2 | |
= 1 + X + | ||||
dX | dX | dX | 2 |
Now, this suggests something to us: suppose our original function just kept going on and on, it never stopped.
X2 | X3 | X4 | X5 | |||||
Y = 1 + X + | + | + | + | + . . . | ||||
2! | 3! | 4! | 5! |
This function is what is known as an infinite series: it's an infinite number of terms, all added together. So, one of the terms in this series is
X9427 | ||
. . . + | + . . . | |
9427! |
And, we wouldn't want anyone to feel left out, there's a term for every integer. When we differentiate this function, we get back the same function. That's because when we differentiate it, the last term on the list disappears, but on this list there is no last term. This magic function is its own derivative. This is so neat, so special, that we have a special name for this function: we call it the exponential function, and we write it ex, which we read out loud as "e to the x." So,
X2 | X3 | X4 | X5 | |||||
Y = eX = 1 + X + | + | + | + | + . . . | ||||
2! | 3! | 4! | 5! |
dY
––– = eX
dX
This is the definition of ex: the infinite series is ex. ex is just shorthand for the infinite series.
Using the infinite series, we can evaluate this function ex for any value of x we choose. Since anything to the one power is the thing itself, it's interesting to try to evaluate e1. The answer is an irrational number, that is, a number which cannot be expressed as a fraction and which goes on forever. Here's a little piece of this number e, which pretty much everyone has memorized: 2.718281828. The "1828" repeats exactly once, so you get 10 digits for the price of 6, it's kind of like a discount. After the second "1828", the digits change and keep changing forever. In fact, some mathematician once proved that any set of digits you write down, no matter what the digits are, no matter how long, those digits appear in e somewhere just as you wrote them. I have no idea how you would prove such a thing.
We're having fun with this, so we're going to keep doing it a bit more. Let's figure out what e-x is. This is pretty easy: every time we raise (-X) to an even power, we have an even number of -1's multiplied together. An even number of -1's multiplied together is just 1. For example, (-X)2 = (-X)*(-X) = X2. (- X)4 = (-X)*(-X)*(-X)*(-X) = X4. Every time we raise (-X) to an odd power, we have an odd number of -1's multiplied together, and this is -1. So, (-X)3 = (-X)2 * (-X) = X2 * (-X) = -X3. So, we see that e-x is just like ex but with all the odd powers of X subtracted instead of added.
X2 | X3 | X4 | X5 | |||||
Y = e-X = 1 - X + | - | + | - | + . . . | ||||
2! | 3! | 4! | 5! |
And, the derivative is:
dY | X2 | X3 | X4 | X5 | |||||
= -e-X = -1 + X - | + | - | + | - . . . | |||||
dX | 2! | 3! | 4! | 5! |
So, when we took the derivative, the sign changed. How did this happen? This brings us into a very important rule of calculus: the chain rule. This rule seems a bit artificial at first, as in how often could this possibly come up? It turns out it comes up all the time. Neural network training algorithms are based on the chain rule. Many of the proofs and derivations in General Relativity are based on the chain rule. The chain rule is used in real life to take something you don't know much about, and break it up into smaller pieces that you can handle.
Suppose you have some function, Y = A(X), and some other function, Y = B(X). Now, since A(X) is just a number, and B(X) takes in a number for its argument, you can imagine cascading them to form some new function, Y = B( A(X) ). That was a little abstract, so here's an example, Y = -X and Y = eX can be cascaded to form Y = e (-X).
Here's the rule for differentiating this sort of thing: dY/dX = dY/dA * dA/dX. We know how to differentiate eX, the derivative is just eX. And, here's the best part, it doesn't matter what "X" is, it can be any kind of complicated thing you can imagine. We also know how to differentiate Y = -X, the derivative is -1. So, the derivative of e(-x) is e(-x) * d/dx (-x) = e(-x) * (-1) = -e-x.
Sometimes, the (anything) is quite complicated, so we have an alternative way to write ex: exp( x ). e (anything) = exp(anything). exp is short for exponential.
Ok, I went through that a bit fast, so we're going to do it again. the derivative of e(anything) is e (anything) * d/dx (anything). For example, d/dx exp(x2) = exp(x2) * d/dx (x2 ) = exp( x2 ) * (2x) = 2x exp(x2). exp( x2 ) is a very important function. It comes up all the time in probability, in statistics, in quantum mechanics, and a lot of other fields. It's called a Gaussian curve, or a Bell-shaped curve, or a Gaussian distribution, or a Normal distribution. Well, whatever it's called, we can differentiate it now.
How can we calculate eanything? Simple, we just use the definition of ex:
anything2 | anything3 | anything4 | anything5 | |||||
eanything = 1 + anything + | + | + | + | + . . . | ||||
2! | 3! | 4! | 5! |
As long as we can calculate anything*anything, we can calculate eanything. Get used to this idea now, because it's going to come up over and over. One of our favorite things is going to be, as soon as we learn about some new thing, we're going to exponentiate it. Just to give you a taste of what's to come, we can define the operator:
(d/dX)2 | (d/dX)3 | (d/dX)4 | (d/dX)5 | |||||
exp( d/dX ) = 1 + (d/dX) + | + | + | + | + . . . | ||||
2! | 3! | 4! | 5! |
(d/dX)3 means differentiate three times. The operator above, exp( d/dX ), computes something called a Taylor series. Don't worry, we'll get to that.
ex can be represented in many forms. Above we saw how to represent it as an infinite series. We can also represent it as a Limit:
( | x | ) | n | ||||
ex = | Limit | 1 + | |||||
n→∞ | n |
Later we'll use these two definitions a lot. The second definition says if we know how to take a very little step, then we can figure out how to make a big journey by raising e to the little step. The first definition tells us how to raise e to a little step. In the next chapter we'll see that we can approximate a any function with a straight line for a little while - the straight line we'll use is the derivative. We can approximate a function for a long while by raising e to the derivative, which is the taylor series. We'll see this in more detail in the coming chapters.
The chain rule is a general rule. For example, d/dx (anything)2 = 2(anything) * d/dx (anything). d/dx (anything)44 = 44(anything)43 * d/dx (anything).
Up above, we defined the functions ex and e-x as infinite series. What happens if we add them together? We also divide by two, to keep things simple:
X2 | X3 | X4 | X5 | |||||
eX = 1 + X + | + | + | + | + . . . | ||||
2! | 3! | 4! | 5! |
X2 | X3 | X4 | X5 | |||||
e-X = 1 - X + | - | + | - | + . . . | ||||
2! | 3! | 4! | 5! |
ex + e-x | X2 | X4 | X6 | ||||
= 1 + | + | + | + . . . | ||||
2 | 2! | 4! | 6! |
It's easy to see how to add them together if we do it term by term: (1 + 1) / 2 = 1. (X - X) / 2 = 0. (X2 + X2) / 2 = X2, and so on.
It's a bit cumbersome to write (ex + e-x) / 2 all the time, so we'll make up a name for this series, too. We call this series cosh(x), which is pronounced "hyperbolic cosine of x" or, if you're in a hurry, "cosh of x." This name may remind you of the dreaded sine and cosine functions from trigonometry - there's a reason for that, but we're not going to worry about the reason just yet. It will turn out that trigonometry is intimately related to this series ex, and it will also turn out that once we see this relationship, trigonometry will become a lot easier. But, that connection is for several chapters from now.
What we know how to do is take derivatives, so let's take the derivative of cosh(x). Remember, we're allowed to do this term by term.
X2 | X4 | X6 | ||||
cosh(x) = 1 + | + | + | + . . . | |||
2! | 4! | 6! |
d | X3 | X5 | X7 | ||||
cosh(x) = 0 + X + | + | + | + . . . | ||||
dX | 3! | 5! | 7! |
What is this series? It looks a lot like all our other exponential series, so we guess there must be some other way to describe it. There is. We could have differentiated cosh(x) from its exponential definition, instead of its series definition; let's do that.
cosh(x) = 1/2 (ex + e-x)
d/dX cosh(x) = d/dx 1/2 (ex + e-x) = 1/2 ( d/dx ex + d/dx e-x ) = 1/2 (ex - e-x)
X2 | X3 | X4 | X5 | |||||
eX = 1 + X + | + | + | + | + . . . | ||||
2! | 3! | 4! | 5! |
X2 | X3 | X4 | X5 | |||||
e-X = 1 - X + | - | + | - | + . . . | ||||
2! | 3! | 4! | 5! |
ex - e-x | X3 | X5 | X7 | ||||
= sinh(x) = X + | + | - | + . . . | ||||
2 | 3! | 5! | 7! |
We have a new series, we need a new name. We'll call the series 1/2 (ex - e-x) = sinh(x), which is pronounced "hyperbolic sine of x" or simply "sinh of x." "Sinh" is pronounced as if it's spelled "sintsch." Cosh and sinh are related through both their definitions, and through their derivatives: d cosh / dx = sinh; d sinh / dx = cosh.
3.1: Write down the first 6 terms of the series of each of the following
functions, and find the derivatives term by term:
a) ex
b) e-x
c) cosh x
d) sinh x
3.2: Find the derivative of exp( cosh( x ) ). Hint: use the chain rule.
answer: sinh(x) * exp( cosh(x) )
3.3: Find the derivative of exp( cosh( x2 ) ).
answer: 2x sinh( x ) exp( cosh( x2 ) )
3.4: Find the derivative of cosh2( x3 ). cosh2(x) = cosh(x) * cosh(x).
answer: 3x2 * sinh( x3 ) * 2 cosh( x3 )