- Proposed by Cho et al. in 2014 as a simpler alternative to the LSTM
- No cell states, on each time step t, we have input $x^{(t)}$ and hidden states $h^{(t)}$

**Update Gate:**controls what parts of hidden state are updated vs preserved

$\boldsymbol{u}^{(t)}=\sigma\left(\boldsymbol{W}_{u} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{u} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{u}\right)$

**Reset Gate:**controls what parts of previous hidden state are used to compute new content

$\boldsymbol{r}^{(t)}=\sigma\left(\boldsymbol{W}_{r} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{r} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{r}\right)$

**New hidden states content**:

**reset gate**selects useful parts of previous hidden state. Use this and current input to compute new hidden content

$\tilde{\boldsymbol{h}}^{(t)}=\tanh \left(\boldsymbol{W}_{h}\left(\boldsymbol{r}^{(t)} \circ \boldsymbol{h}^{(t-1)}\right)+\boldsymbol{U}_{h} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{h}\right)$

**Hidden state**:

**update gate**simultaneously controls what is kept from previous hidden state, and what is updated to new hidden state content

$\boldsymbol{h}^{(t)}=\left(1-\boldsymbol{u}^{(t)}\right) \circ \boldsymbol{h}^{(t-1)}+\boldsymbol{u}^{(t)} \circ \tilde{\boldsymbol{h}}^{(t)}$