# Hoeffding’s Inequality

Let $X$ be a random variable in $[0,1]$ with $E[X] = \mu$. Now, let’s say that we draw $u$ samples from this distribution, each $x_i$, and the empirical mean is $\bar{x}$.

Then, for any fixed $\epsilon>0$ we have

Intuitively, as $u$ increases, the probablity that empirical mean deviates should decrease. Here, $\epsilon$ acts as **Confidence Interval** around the true mean (which is unknown).

*Doubt in Slide 4, point 2; It doesn’t seem right*

Hoeffding’s Inequality can be extended to a random variable bounded in $[a,b]$ by defining a new random variable $y = \frac{X-a}{b-a}$, and applying to this. We get;

## KL Inequality

This gives a tighter bound for empirical mean. **Note that** $\mu\pm\epsilon$ **must be positive** in RHS of both the inequalities.

Note that both the inequalities are examples of *Chernoff bounds*.

# Analysis of UCB

We shall use the above inequalities to prove that the UCB algorithm is logarithmic in nature.

### Notation

- $\Delta_a = p_M - p_a$ - The difference between this arm and optimal
- $Z^t_a$ - An
*event*when arm $a$ is pulled at time $t$ - $z^t_a$ - Indicator for the above event (1 if event occurs, 0 otherwise)

- $u^t_a$ - Number of times arm $a$ has been pulled, till time $t$

- $\bar{u}^T_a$ - A constant used in the proof, sufficient number of pulls of arm $a$ for horizon $T$

*This proof truly be a mind = blown moment :/*