Tuesday, September 1, 2020

Gated Recurrent Unit

  • Proposed by Cho et al. in 2014 as a simpler alternative to the LSTM
  •  No cell states, on each time step t, we have input $x^{(t)}$ and hidden states $h^{(t)}$
Update Gate: controls what parts of hidden state are updated vs preserved

$\boldsymbol{u}^{(t)}=\sigma\left(\boldsymbol{W}_{u} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{u} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{u}\right)$

Reset Gate: controls what parts of previous hidden state are used to compute new content

$\boldsymbol{r}^{(t)}=\sigma\left(\boldsymbol{W}_{r} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{r} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{r}\right)$

New hidden states content: reset gate selects useful parts of previous hidden state. Use this and current input to compute new hidden content

$\tilde{\boldsymbol{h}}^{(t)}=\tanh \left(\boldsymbol{W}_{h}\left(\boldsymbol{r}^{(t)} \circ \boldsymbol{h}^{(t-1)}\right)+\boldsymbol{U}_{h} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{h}\right)$

Hidden state: update gate simultaneously controls what is kept from previous hidden state, and what is updated to new hidden state content

$\boldsymbol{h}^{(t)}=\left(1-\boldsymbol{u}^{(t)}\right) \circ \boldsymbol{h}^{(t-1)}+\boldsymbol{u}^{(t)} \circ \tilde{\boldsymbol{h}}^{(t)}$