Hoeffding’s Inequality
Let $X$ be a random variable in $[0,1]$ with $E[X] = \mu$. Now, let’s say that we draw $u$ samples from this distribution, each $x_i$, and the empirical mean is $\bar{x}$.
Then, for any fixed $\epsilon>0$ we have
Intuitively, as $u$ increases, the probablity that empirical mean deviates should decrease. Here, $\epsilon$ acts as Confidence Interval around the true mean (which is unknown).
Doubt in Slide 4, point 2; It doesn’t seem right
Hoeffding’s Inequality can be extended to a random variable bounded in $[a,b]$ by defining a new random variable $y = \frac{X-a}{b-a}$, and applying to this. We get;
KL Inequality
This gives a tighter bound for empirical mean. Note that $\mu\pm\epsilon$ must be positive in RHS of both the inequalities.
Note that both the inequalities are examples of Chernoff bounds.
Analysis of UCB
We shall use the above inequalities to prove that the UCB algorithm is logarithmic in nature.
Notation
- $\Delta_a = p_M - p_a$ - The difference between this arm and optimal
- $Z^t_a$ - An event when arm $a$ is pulled at time $t$
- $z^t_a$ - Indicator for the above event (1 if event occurs, 0 otherwise)
- $u^t_a$ - Number of times arm $a$ has been pulled, till time $t$
- $\bar{u}^T_a$ - A constant used in the proof, sufficient number of pulls of arm $a$ for horizon $T$
This proof truly be a mind = blown moment :/