Introduction to Numerical Analysis

We study the various methods of solving equations, approximating functions and study the errors corresponding to the aforesaid methods. For example, consider the approximation of e using its definition and the Taylor’s Series.

e=limn(1+1/n)nf(x)=f(a)+f(a)(xa)++f(k)(a)k!(xa)k+f(k+1)(c)(k+1)!(xa)k+1

Where c is a real number between x and a. We can also compute the number of terms that are required in the Taylor’s series to estimate its value with a given error using the final term.

Finite Digit ArithmeticPermalink

A 64 bit representation is generally used for a real number. The smallest magnitude possible by this representation is when (s,c,f)=(0,1,0). Note that the tuples (0,0,0) and (1,0,0) represent 0 and not 220131.

Bits Symbol Name Use
1 s Sign Indicator Indicating whether the number is +/-
11 c Characteristic Exponent of the number
52 f Mantissa Fractional part of the number
Number=(1)s2c1023(1+f)

image-20220126175545918

image-20220126175555508

Floating Point RepresentationPermalink

We shall represent numbers in the form:

±0.d1d2dk×10nwhere 1d19,0di9

If a number has greater than k digits, we can bring it down by two methods:

  1. Chopping: All digits from dk+1 are dropped
  2. Rounding: We add 5×10n(k+1) to the number and then drop the additional digits

There are two ways of finding the error of such approximations. Let p denote the actual value and p denote the approximated value.

  1. Absolute - |pp|
  2. Relative - |pp|/p

Relative error is usually the better measure as it takes the actual value of p into account.

Significant DigitsPermalink

We say that the number p approximates p to t significant digits when t is the largest non-negative integer such that

|pp|p<5×10t

That is, the number of significant digits NEED NOT BE EQUAL to the number of decimal places that a number has!

Errors Caused by Finite Digit ArithmeticPermalink

There are two main kinds of errors that are caused by the limitations of finite digit arithmetic, cancellation of nearly equal numbers and division by small numbers.

Consider the function f(x)=1Cos(x) at x=1.2×105. The value of Cos(x) would be 0.999 999 999 9 with 10 digits, however, the function value equates to 0.000 000 000 1. We had started with 10 significant digits but ended up with only 1 by the end. The relative error that we get in this case would be very large.

|1Cos(1.2×105)0.000 000 000 1|1Cos(1.2×105)=0.3889

Errors also get magnified when divided with small numbers. For example, let a=a+ϵ be the approximation with error ϵ. When divided by b=10n we can see that the error has increased drastically.

fl(fl(a)fl(b))=(a+ϵ)×10n

In the quadratic roots formula, we can rationalize it to avoid errors incurred when b2 is very large compared to 4ac. We use the first formula if b>0 and use the second formula if b<0.

2cb+b24ac2cbb24ac

Instead of getting 0 outright, we now get 2c/b which is an improvement. This is highly case-specific however, and more ways of avoiding errors will be discussed subsequently.