# Reinforcement Learning

Looking at the MDP from the agentâ€™s point of view, it is not guaranteed to know the transition probabilities and the rewards associated with an action.

That is, the entire MDP (S, A, T, R, $\gamma$) is provided to the agent in the planning problem. In a learning setting, the agent knows only (S, A, $\gamma$) and sometimes R. It has to make inferences on T from experience.

Let a t-length history be defined as follows:

\[h^t = (s^0, a^0, r^0 \ldots s^t)\]## Learning Algorithm

A *Learning Algorithm* L is a mapping from the set of all histories to set of all probability distributions over arms. We would like to construct L such that:

The above problem is also known as the **Control Problem**.