Recurrent Neural Networks
Recurrent Neural Networks (RNNs) are used in situations where the sequence of data is as important (if not more) than the data itself. These networks essentially have a hidden “state” which stores the data about the information seen so far.
RNNs consist of a looped network whose output is the “hidden state”. That is, every word in a sequence of $n$ words is sent one-at-a-time; and at each iteration, the current hidden state is concatenated with the next word in the sequence to form the input.
Sticking with the same example, the “hidden state” gives memory to the network. This is quite useful in the case of NLP, as the meaning of a word quite heavily depends on the words that might have come before it.
Drawbacks with RNNs
Consider the sentence “I live in Germany. I speak __.” In this case, it is a very reasonable guess that the person speaks German. And it has been shown that RNNs predict the word “German” with a high degree of accuracy. However, in the sentence “I live in Germany. I am quite fond of watching movies and playing basketball. I am fluent in __.”
In this case, the word “Germany” is very far away from the blank, meaning that it is very likely for the RNN to have dismissed this information. Also, because the network has multiple iterations done, a problem of vanishing gradient arises which makes learning that “Germany” is important infeasible. This is a drawback of the technique.
Another drawback lies in the very nature of training. The words are input sequentially, meaning that a RNNs do not have information to the data which comes AFTER the word in the sentence. This can cause issues as the meaning of a word depends on what words are present on both of its sides.
Also, training RNNs is slow because the words are input sequentially. GPUs’ strongest suit is parallel processing, and this is not taken advantage of because each word is sent one after another.