Hoeffding’s Inequality

Let X be a random variable in [0,1] with E[X]=μ. Now, let’s say that we draw u samples from this distribution, each xi, and the empirical mean is ˉx.

Then, for any fixed ϵ>0 we have

\begin{eqnarray}     \mathcal{P}\{ \bar{X} \geq \mu+\epsilon} \leq \exp{-2u\epsilon^2} \\     \mathcal{P}\{ \bar{X} \leq \mu-\epsilon} \leq \exp{-2u\epsilon^2} \\   \end{eqnarray}

Intuitively, as u increases, the probablity that empirical mean deviates should decrease. Here, ϵ acts as Confidence Interval around the true mean (which is unknown).

Doubt in Slide 4, point 2; It doesn’t seem right

Hoeffding’s Inequality can be extended to a random variable bounded in [a,b] by defining a new random variable y=Xaba, and applying to this. We get;

\begin{eqnarray}     \mathcal{P}\{ \bar{X} \geq \mu+\epsilon} \leq \exp{\frac{-2u\epsilon^2}{(b-a)^2}} \\     \mathcal{P}\{ \bar{X} \leq \mu-\epsilon} \leq \exp{\frac{-2u\epsilon^2}{(b-a)^2}} \\   \end{eqnarray}

KL InequalityPermalink

This gives a tighter bound for empirical mean. Note that μ±ϵ must be positive in RHS of both the inequalities.

\begin{eqnarray}     \mathcal{P}\{ \bar{X} \geq \mu+\epsilon} \leq \exp{-uKL(\mu+\epsilon, \mu)} \\     \mathcal{P}\{ \bar{X} \leq \mu-\epsilon} \leq \exp{-uKL(\mu+\epsilon, \mu)} \\     \\     KL(p,q) = pln(\frac{p}{q})+(1-p)ln(\frac{1-p}{1-q}) \\   \end{eqnarray}

Note that both the inequalities are examples of Chernoff bounds.

 

Analysis of UCBPermalink

We shall use the above inequalities to prove that the UCB algorithm is logarithmic in nature.

NotationPermalink

  • Δa=pMpa - The difference between this arm and optimal
  • Zta - An event when arm a is pulled at time t
  • zta - Indicator for the above event (1 if event occurs, 0 otherwise)
E\[zta\]=P\[Zta\]
  • uta - Number of times arm a has been pulled, till time t
uta=t1i=0zia
  • ˉuTa - A constant used in the proof, sufficient number of pulls of arm a for horizon T
ˉuTa=\ceil8(Δa)2ln(T)

This proof truly be a mind = blown moment :/

 

Thompson SamplingPermalink