Recurrent Neural Networks

Hui Lin @Google

Ming Li @Amazon

Types of Neural Network

Why sequency?

Speech Recognition \(\longrightarrow\) Get your facts first, then you can distort them as you please.
Music generation \(\emptyset\) \(\longrightarrow\)
Sentiment classification Great movie ? Are you kidding me ! Not worth the money. \(\longrightarrow\)
DNA sequence analysis ACGGGGCCTACTGTCAACTG \(\longrightarrow\) AC GGGGCCTACTG TCAACTG
Machine translation 网红脸 \(\longrightarrow\) Internet celebrity face
Video activity recognition \(\longrightarrow\) Running
Name entity recognition Use Netlify and Hugo. \(\longrightarrow\) Use Netlify and Hugo.

RNN types

Notation

Representing words

\(\left[\begin{array}{c} a[1]\\ aaron[2]\\ \vdots\\ and[360]\\ \vdots\\ Hugo[4075]\\ \vdots\\ Netlify[5210]\\ \vdots\\ use[8320]\\ \vdots\\ Zulu[10000] \end{array}\right]\Longrightarrow use=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right], Netlify=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ \vdots\\ 0 \end{array}\right], and=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0 \end{array}\right], Hugo=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0 \end{array}\right]\)

What is RNN?

Forward Propagation

\(a^{<0>}= \mathbf{o}\); \(a^{<1>} = g(W_{aa}a^{<0>} + W_{ax}x^{<1>} + b_a)\)

\(\hat{y}^{<1>} = g'(W_{ya}a^{<1>} + b_y)\)

\(a^{<t>} = g(W_{aa}a^{<t-1>} + W_{ax}x^{<t>} + b_a)\)

\(\hat{y}^{<t>} = g'(W_{ya}a^{<t>} + b_y)\)

Forward Propagation

\(L^{<t>}(\hat{y}^{<t>}) = -y^{<t>}log(\hat{y}^{<t>}) - (1-y^{<t>})log(1-\hat{y}^{<t>})\)

\(L(\hat{y}, y) = \Sigma_{t=1}^{T_y}L^{<t>} (\hat{y}^{<t>}, y^{<t>})\)

Backpropagation through time

Deep RNN

Vanishing gradients with RNNs

LSTM

LSTM

LSTM

LSTM

Word representation

\[\begin{array}{cccccc} Man & Woman & King & Queen & Apple & Pumpkin\\ (5391) & (9853) & (4914) & (7157) & (456) & (6332)\\ \left[\begin{array}{c} 0\\ 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ 0\\ 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ 0\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ 0\\ 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ \vdots\\ 1\\ \vdots\\ 0\\ 0\\ 0\\ 0\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ 0\\ 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right] \end{array}\]

Word representation

\[\begin{array}{cccccc} Man & Woman & King & Queen & Apple & Pumpkin\\ (5391) & (9853) & (4914) & (7157) & (456) & (6332)\\ \left[\begin{array}{c} 0\\ 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ 0\\ 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ 0\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ 0\\ 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ \vdots\\ 1\\ \vdots\\ 0\\ 0\\ 0\\ 0\\ 0 \end{array}\right] & \left[\begin{array}{c} 0\\ 0\\ 0\\ 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right] \end{array}\]

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Analogies1

Analogies

Analogies

Analogies

Analogies

\(e_{man} - e_{woman} \approx e_{king} - e_{?}\)

\(\rightarrow \underset{w}{argmax} \{sim (e_{w}, e_{king} - e_{man} + e_{woman})\}\)

Cosine similarity

\(sim(e_w, e_{king}-e_{man}+e_{woman})\) = ?

Cosine similarity: \(sim(a,b) = \frac{a^{T}b}{ ||a||_{2} ||b||_{2}}\)

Cosine similarity

\(sim(e_w, e_{king}-e_{man}+e_{woman})\) = ?

Cosine similarity: \(sim(a,b) = \frac{a^{T}b}{ ||a||_{2} ||b||_{2}}\)

Embedding matrix

Embedding matrix

Data Preprocessing

Data Preprocessing

Some Papers


  1. Mikolov et. al., 2013, Linguistic regularities in continuous space word representations↩