## Long Short-Term Memory (LSTM)

On step t, there is a hidden state $h^{(t)}$ and a cell state $c^{(t)}$:

- Both are vectors of length n
- The cell stores
**long-term information** - The LSTM can erase, write or read information from the cell

Formulas(Note, all $\sigma $ represent sigmoid function)

**Forget gate**: controls what is kept vs forgotten, from previous cell state

**Input gate**: controls what parts of the new cell content are written to cell

$\boldsymbol{i}^{(t)}=\sigma \left(\boldsymbol{W}_{i} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{i} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{i}\right)$

**Output gate:**controls what parts of cell are output to hidden state

$\boldsymbol{o}^{(t)}=\sigma \left(\boldsymbol{W}_{o} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{o} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{o}\right)$

**New cell content**: this is the new content to be written to the cell

$\tilde{\boldsymbol{c}}^{(t)}=\tanh \left(\boldsymbol{W}_{c} \boldsymbol{h}^{(t-1)}+\boldsymbol{U}_{c} \boldsymbol{x}^{(t)}+\boldsymbol{b}_{c}\right)$

**Cell state**: erase (“forget”) some content from last cell state, and write (“input”) some new cell content

$\boldsymbol{c}^{(t)}=\boldsymbol{f}^{(t)} \circ \boldsymbol{c}^{(t-1)}+\boldsymbol{i}^{(t)} \circ \tilde{\boldsymbol{c}}^{(t)}$

**Hidden state**: read (“output”) some content from the cell

$\boldsymbol{h}^{(t)}=\boldsymbol{o}^{(t)} \circ \tanh \boldsymbol{c}^{(t)}$

LSTM Diagram. Source: 苏剑林 |

Ref: