Hoeffding’s Inequality

Let $X$ be a random variable in $[0,1]$ with $E[X] = \mu$. Now, let’s say that we draw $u$ samples from this distribution, each $x_i$, and the empirical mean is $\bar{x}$.

Then, for any fixed $\epsilon>0$ we have

$$\begin{eqnarray} \mathcal{P}\{ \bar{X} \geq \mu+\epsilon} \leq \exp{-2u\epsilon^2} \\ \mathcal{P}\{ \bar{X} \leq \mu-\epsilon} \leq \exp{-2u\epsilon^2} \\ \end{eqnarray}$$

Intuitively, as $u$ increases, the probablity that empirical mean deviates should decrease. Here, $\epsilon$ acts as Confidence Interval around the true mean (which is unknown).

Doubt in Slide 4, point 2; It doesn’t seem right

Hoeffding’s Inequality can be extended to a random variable bounded in $[a,b]$ by defining a new random variable $y = \frac{X-a}{b-a}$, and applying to this. We get;

$$\begin{eqnarray} \mathcal{P}\{ \bar{X} \geq \mu+\epsilon} \leq \exp{\frac{-2u\epsilon^2}{(b-a)^2}} \\ \mathcal{P}\{ \bar{X} \leq \mu-\epsilon} \leq \exp{\frac{-2u\epsilon^2}{(b-a)^2}} \\ \end{eqnarray}$$

KL Inequality

This gives a tighter bound for empirical mean. Note that $\mu\pm\epsilon$ must be positive in RHS of both the inequalities.

$$\begin{eqnarray} \mathcal{P}\{ \bar{X} \geq \mu+\epsilon} \leq \exp{-uKL(\mu+\epsilon, \mu)} \\ \mathcal{P}\{ \bar{X} \leq \mu-\epsilon} \leq \exp{-uKL(\mu+\epsilon, \mu)} \\ \\ KL(p,q) = pln(\frac{p}{q})+(1-p)ln(\frac{1-p}{1-q}) \\ \end{eqnarray}$$

Note that both the inequalities are examples of Chernoff bounds.

 

Analysis of UCB

We shall use the above inequalities to prove that the UCB algorithm is logarithmic in nature.

Notation

  • $\Delta_a = p_M - p_a$ - The difference between this arm and optimal
  • $Z^t_a$ - An event when arm $a$ is pulled at time $t$
  • $z^t_a$ - Indicator for the above event (1 if event occurs, 0 otherwise)
$$\mathcal{E}\[z^t_a\] = \mathcal{P}\[Z^t_a\]$$
  • $u^t_a$ - Number of times arm $a$ has been pulled, till time $t$
$$u^t_a = \sum_{i=0}^{t-1}z^i_a$$
  • $\bar{u}^T_a$ - A constant used in the proof, sufficient number of pulls of arm $a$ for horizon $T$
$$\bar{u}^T_a = \ceil{ \frac{8}{(\Delta_a)^2}ln(T) }$$

This proof truly be a mind = blown moment :/

 

Thompson Sampling