Hoeffding’s Inequality

Let $X$ be a random variable in $[0,1]$ with $E[X] = \mu$ . Now, let’s say that we draw $u$ samples from this distribution, each $x_i$ , and the empirical mean is $\bar{x}$ .

Then, for any fixed $\epsilon>0$ we have

$\begin{eqnarray} \mathcal{P}\{ \bar{X} \geq \mu+\epsilon} \leq \exp{-2u\epsilon^2} \\ \mathcal{P}\{ \bar{X} \leq \mu-\epsilon} \leq \exp{-2u\epsilon^2} \\ \end{eqnarray}$

Intuitively, as $u$ increases, the probablity that empirical mean deviates should decrease. Here, $\epsilon$ acts as Confidence Interval around the true mean (which is unknown).

Doubt in Slide 4, point 2; It doesn’t seem right

Hoeffding’s Inequality can be extended to a random variable bounded in $[a,b]$ by defining a new random variable $y = \frac{X-a}{b-a}$ , and applying to this. We get;

$\begin{eqnarray} \mathcal{P}\{ \bar{X} \geq \mu+\epsilon} \leq \exp{\frac{-2u\epsilon^2}{(b-a)^2}} \\ \mathcal{P}\{ \bar{X} \leq \mu-\epsilon} \leq \exp{\frac{-2u\epsilon^2}{(b-a)^2}} \\ \end{eqnarray}$

KL InequalityPermalink

This gives a tighter bound for empirical mean. Note that $\mu\pm\epsilon$ must be positive in RHS of both the inequalities.

$\begin{eqnarray} \mathcal{P}\{ \bar{X} \geq \mu+\epsilon} \leq \exp{-uKL(\mu+\epsilon, \mu)} \\ \mathcal{P}\{ \bar{X} \leq \mu-\epsilon} \leq \exp{-uKL(\mu+\epsilon, \mu)} \\ \\ KL(p,q) = pln(\frac{p}{q})+(1-p)ln(\frac{1-p}{1-q}) \\ \end{eqnarray}$

Note that both the inequalities are examples of Chernoff bounds.

Analysis of UCBPermalink

We shall use the above inequalities to prove that the UCB algorithm is logarithmic in nature.

NotationPermalink

$\Delta_a = p_M - p_a$ - The difference between this arm and optimal
$Z^t_a$ - An event when arm $a$ is pulled at time $t$
$z^t_a$ - Indicator for the above event (1 if event occurs, 0 otherwise)

$\mathcal{E}\[z^t_a\] = \mathcal{P}\[Z^t_a\]$

$u^t_a$ - Number of times arm $a$ has been pulled, till time $t$

$u^t_a = \sum_{i=0}^{t-1}z^i_a$

$\bar{u}^T_a$ - A constant used in the proof, sufficient number of pulls of arm $a$ for horizon $T$

$\bar{u}^T_a = \ceil{ \frac{8}{(\Delta_a)^2}ln(T) }$

This proof truly be a mind = blown moment :/

Thompson SamplingPermalink