Long Short-Term Memory (LSTM)
On step t, there is a hidden state $h^{(t)}$ and a cell state $c^{(t)}$:
- Both are vectors of length n
- The cell stores long-term information
- The LSTM can erase, write or read information from the cell
Formulas(Note, all $\sigma $ represent sigmoid function)
Forget gate: controls what is kept vs forgotten, from previous cell state
$\boldsymbol{f}^{(t)}=\sigma \left(\boldsymbol{W}_{f} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{f} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{f}\right)$
Input gate: controls what parts of the new cell content are written to cell
$\boldsymbol{i}^{(t)}=\sigma \left(\boldsymbol{W}_{i} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{i} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{i}\right)$
Output gate: controls what parts of cell are output to hidden state
$\boldsymbol{o}^{(t)}=\sigma \left(\boldsymbol{W}_{o} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{o} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{o}\right)$
New cell content: this is the new content to be written to the cell
$\tilde{\boldsymbol{c}}^{(t)}=\tanh \left(\boldsymbol{W}_{c} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{c} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{c}\right)$
Cell state: erase (“forget”) some content from last cell state, and write (“input”) some new cell content
$\boldsymbol{c}^{(t)}=\boldsymbol{f}^{(t)} \circ \boldsymbol{c}^{(t-1)}+\boldsymbol{i}^{(t)} \circ \tilde{\boldsymbol{c}}^{(t)}$
Hidden state: read (“output”) some content from the cell
$\boldsymbol{h}^{(t)}=\boldsymbol{o}^{(t)} \circ \tanh \boldsymbol{c}^{(t)}$
LSTM Diagram. Source: 苏剑林 |
Ref: