Friday, December 27, 2019

LSTM (2)

Long Short-Term Memory (LSTM)

On step t, there is a hidden state $h^{(t)}$ and a cell state $c^{(t)}$:
  • Both are vectors of length n 
  • The cell stores long-term information 
  • The LSTM can erase, write or read information from the cell 
Formulas(Note, all $\sigma $ represent sigmoid function)
Forget gate: controls what is kept vs forgotten, from previous cell state

$\boldsymbol{f}^{(t)}=\sigma \left(\boldsymbol{W}_{f} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{f} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{f}\right)$

Input gate: controls what parts of the new cell content are written to cell

$\boldsymbol{i}^{(t)}=\sigma \left(\boldsymbol{W}_{i} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{i} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{i}\right)$

Output gate: controls what parts of cell are output to hidden state

$\boldsymbol{o}^{(t)}=\sigma \left(\boldsymbol{W}_{o} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{o} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{o}\right)$

New cell content: this is the new content to be written to the cell

$\tilde{\boldsymbol{c}}^{(t)}=\tanh \left(\boldsymbol{W}_{c} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{c} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{c}\right)$

Cell state: erase (“forget”) some content from last cell state, and write (“input”) some new cell content

$\boldsymbol{c}^{(t)}=\boldsymbol{f}^{(t)} \circ \boldsymbol{c}^{(t-1)}+\boldsymbol{i}^{(t)} \circ \tilde{\boldsymbol{c}}^{(t)}$

Hidden state: read (“output”) some content from the cell

$\boldsymbol{h}^{(t)}=\boldsymbol{o}^{(t)} \circ \tanh \boldsymbol{c}^{(t)}$

LSTM Diagram. Source: 苏剑林