Hui Lin @Google Ming Li @Amazon
Speech Recognition | \(\longrightarrow\) | Get your facts first, then you can distort them as you please. | |
Music generation | \(\emptyset\) | \(\longrightarrow\) | |
Sentiment classification | Great movie ? Are you kidding me ! Not worth the money. | \(\longrightarrow\) | |
DNA sequence analysis | ACGGGGCCTACTGTCAACTG | \(\longrightarrow\) | AC GGGGCCTACTG TCAACTG |
Machine translation | 网红脸 | \(\longrightarrow\) | Internet celebrity face |
Video activity recognition | \(\longrightarrow\) | Running | |
Name entity recognition | Use Netlify and Hugo. | \(\longrightarrow\) | Use Netlify and Hugo. |
x: Use(\(x^{<1>}\)) Netlify(\(x^{<2>}\)) and(\(x^{<3>}\)) Hugo(\(x^{<4>}\)) .(\(x^{<5>}\))
y: 0 (\(y^{<1>}\)) 1(\(y^{<2>}\)) 0(\(y^{<3>}\)) 1(\(y^{<4>}\)) 0(\(y^{<5>}\))
\(x^{(i)<t>}\), \(T_x^{(i)}\) (\(i^{th}\) sample)
\(y^{(i)<t>}\), \(T_y^{(i)}\) (\(i^{th}\) sample)
\(\left[\begin{array}{c} a[1]\\ aaron[2]\\ \vdots\\ and[360]\\ \vdots\\ Hugo[4075]\\ \vdots\\ Netlify[5210]\\ \vdots\\ use[8320]\\ \vdots\\ Zulu[10000] \end{array}\right]\Longrightarrow use=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 1\\ \vdots\\ 0 \end{array}\right], Netlify=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ \vdots\\ 0 \end{array}\right], and=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0 \end{array}\right], Hugo=\left[\begin{array}{c} 0\\ 0\\ \vdots\\ 0\\ \vdots\\ 1\\ \vdots\\ 0\\ \vdots\\ 0\\ \vdots\\ 0 \end{array}\right]\)
x: Use(\(x^{<1>}\)) Netlify(\(x^{<2>}\)) and(\(x^{<3>}\)) Hugo(\(x^{<4>}\)) .(\(x^{<5>}\))
y: 0 (\(y^{<1>}\)) 1(\(y^{<2>}\)) 0(\(y^{<3>}\)) 1(\(y^{<4>}\)) 0(\(y^{<5>}\))
\(x^{(i)<t>}\), \(T_x^{(i)}\) (\(i^{th}\) sample)
\(y^{(i)<t>}\), \(T_y^{(i)}\) (\(i^{th}\) sample)
\(a^{<0>}= \mathbf{o}\); \(a^{<1>} = g(W_{aa}a^{<0>} + W_{ax}x^{<1>} + b_a)\)
\(\hat{y}^{<1>} = g'(W_{ya}a^{<1>} + b_y)\)
\(a^{<t>} = g(W_{aa}a^{<t-1>} + W_{ax}x^{<t>} + b_a)\)
\(\hat{y}^{<t>} = g'(W_{ya}a^{<t>} + b_y)\)
\(L^{<t>}(\hat{y}^{<t>}) = -y^{<t>}log(\hat{y}^{<t>}) - (1-y^{<t>})log(1-\hat{y}^{<t>})\)
\(L(\hat{y}, y) = \Sigma_{t=1}^{T_y}L^{<t>} (\hat{y}^{<t>}, y^{<t>})\)
\(e_{man} - e_{woman} \approx e_{king} - e_{?}\)
\(\rightarrow \underset{w}{argmax} \{sim (e_{w}, e_{king} - e_{man} + e_{woman})\}\)
\(sim(e_w, e_{king}-e_{man}+e_{woman})\) = ?
Cosine similarity: \(sim(a,b) = \frac{a^{T}b}{ ||a||_{2} ||b||_{2}}\)
\(sim(e_w, e_{king}-e_{man}+e_{woman})\) = ?
Cosine similarity: \(sim(a,b) = \frac{a^{T}b}{ ||a||_{2} ||b||_{2}}\)
Mikolov et. al., 2013, Linguistic regularities in continuous space word representations↩