# Finite Digit Arithmetic

Computers do not have memory to store infinitely many digits. This can lead to errors if not taken care of properly. For example, $\sqrt{3}$ is irrational and thus cannot be stored in a computer exactly. For most cases, we chose a rational number whose square is not exactly equal to 3 but is reasonably close to 3 to pass off without any problem. This may lead to **Round Off Errors**.

### Representing Real Numbers

64 bits are used to represent a number.

- The first bit is a
**sign indicator**, denoted using $s$ - The next 11 bit exponent is called the
**characteristic**$c$ - The remaining 52 bit fraction $f$ is called the
**mantissa**

The final value of the exponent is given by $(c-1023)$ to ensure that negative exponents are allowed as well. \(\text{Number }= (-1)^s2^{c-1023}(1+f)\) $m$ has 52 bits, meaning that the precision of this method of representation is 16 digits.

The smallest positive number that can be represented by this notation would be given by $(s,c,f) = (0,1,0)$. The number itself would be $2^{-1022} \approx 0.22251\times 10^{-307}$. Note that both $(0,0,0)$ and $(1,0,0)$ correspond to $0$. Numbers smaller than this result in **underflow**.

Similarly, the largest number would be $2^{1023}\cdot(2-2^{-52})$. Numbers larger than this result in **overflow**.

## Floating Point Representation

We will use numbers of form \(\pm 0.d_1d_2\ldots d_k \times 10^n \qquad 1\leq d_1\leq9, 0\leq d_i\leq 9\) Converting a number $y$ which has more than $k$ decimal digits can be done in two ways;

- Chopping, wherein the additional digits are simply dropped
- Rounding, where we add $5\times10^{n-k-1}$ and then drop the additional digits.

Let $\rho$ be the real number and $\rho^*$ be the approximation.

Absolute Error | Relative Error |
---|---|

$\vert \rho - \rho^* \vert$ | $\frac{\vert \rho-\rho^* \vert}{\rho}$ |

## Significant Digits

We say $\rho^*$ approximates $\rho$ to $t$ significant digits if $t$ is the largest non-negative integer for which \(\frac{\vert \rho-\rho^* \vert}{\rho} < 5\times 10^{-t}\)

### Finite Digit Arithmetic

Operation | Meaning |
---|---|

$x\oplus y$ | $fl(fl(x)+fl(y))$ |

$x \ominus y$ | $fl(fl(x)-fl(y))$ |

$x \otimes y$ | $fl(fl(x)\times fl(y))$ |

$x (\div) y$ | $fl(fl(x)\div fl(y))$ |