Introduction to Numerical Analysis
We study the various methods of solving equations, approximating functions and study the errors corresponding to the aforesaid methods. For example, consider the approximation of e using its definition and the Taylor’s Series.
e=limn→∞(1+1/n)nf(x)=f(a)+f′(a)(x−a)+…+f(k)(a)k!(x−a)k+f(k+1)(c)(k+1)!(x−a)k+1Where c is a real number between x and a. We can also compute the number of terms that are required in the Taylor’s series to estimate its value with a given error using the final term.
Finite Digit ArithmeticPermalink
A 64 bit representation is generally used for a real number. The smallest magnitude possible by this representation is when (s,c,f)=(0,1,0). Note that the tuples (0,0,0) and (1,0,0) represent 0 and not 2−2013⋅1.
Bits | Symbol | Name | Use |
---|---|---|---|
1 | s | Sign Indicator | Indicating whether the number is +/- |
11 | c | Characteristic | Exponent of the number |
52 | f | Mantissa | Fractional part of the number |
Floating Point RepresentationPermalink
We shall represent numbers in the form:
±0.d1d2…dk×10nwhere 1≤d1≤9,0≤di≤9If a number has greater than k digits, we can bring it down by two methods:
- Chopping: All digits from dk+1 are dropped
- Rounding: We add 5×10n−(k+1) to the number and then drop the additional digits
There are two ways of finding the error of such approximations. Let p denote the actual value and p∗ denote the approximated value.
- Absolute - |p−p∗|
- Relative - |p−p∗|/p
Relative error is usually the better measure as it takes the actual value of p into account.
Significant DigitsPermalink
We say that the number p∗ approximates p to t significant digits when t is the largest non-negative integer such that
|p−p∗|p<5×10−tThat is, the number of significant digits NEED NOT BE EQUAL to the number of decimal places that a number has!
Errors Caused by Finite Digit ArithmeticPermalink
There are two main kinds of errors that are caused by the limitations of finite digit arithmetic, cancellation of nearly equal numbers and division by small numbers.
Consider the function f(x)=1−Cos(x) at x=1.2×10−5. The value of Cos(x) would be 0.999 999 999 9 with 10 digits, however, the function value equates to 0.000 000 000 1. We had started with 10 significant digits but ended up with only 1 by the end. The relative error that we get in this case would be very large.
|1−Cos(1.2×10−5)−0.000 000 000 1|1−Cos(1.2×10−5)=0.3889Errors also get magnified when divided with small numbers. For example, let a∗=a+ϵ be the approximation with error ϵ. When divided by b=10−n we can see that the error has increased drastically.
fl(fl(a)fl(b))=(a+ϵ)×10nIn the quadratic roots formula, we can rationalize it to avoid errors incurred when b2 is very large compared to 4ac. We use the first formula if b>0 and use the second formula if b<0.
−2cb+√b2−4ac−2cb−√b2−4acInstead of getting 0 outright, we now get −2c/b which is an improvement. This is highly case-specific however, and more ways of avoiding errors will be discussed subsequently.