Long short-term memory network


A long short-term memory or LSTM network is a type of neural network used to process time series of similarly structured inputs such as the images making up a film or the audio structures making up a recording. The LTSM network belongs to a wider class of recurrent neural networks. It uses a more complex topology to avoid the problems that occur with training more naive recurrent subtypes of the multilayer perceptron where the output produced during one operation step of the network is fed back to form part of the input to the next operation step. 

An LSTM network is made up of a chain of time-based units along which two separate sets of information flow. One set corresponds to the data that would normally be expected to flow through a neural network like a multilayer perceptron, while the second set can be broadly understood as corresponding to a small random-access memory (RAM) array. The two parts of the network interact within each unit using a combination of gates (typically an input gate, an output gate and a forget gate) and mathematical vector operations where values are added or multiplied together. While the main data is passed from unit to unit using the activation functions familiar from other types of neural network, the memory data remains largely unchanged between units unless it is changed by an access from the main part of the network.

Unlike a general random-access memory that is accessed explicitly by programs, the memory within a LSTM network is accessed using the weights familiar from other types of neural network: every read or write operation always addresses all memory locations at once. By learning a weight close to 1 for a single value and weights close to 0 for all other values, however, the network can execute an operation that effectively only involves a single location. 

Gated Recurrent Unit or GRU networks and peephole networks are LTSM subtypes that are distinguished by specific ways of arranging the gates and vector operations within each unit. A hidden LSTM or H-LTSM network, on the other hand, is characterised by gates that are themselves made up of complete neural networks, which contrasts with a classic LSTM where each gate consists of only a single layer of neurons.

An excellent general introduction to LSTMs complete with diagrams is available here.

Gated Recurrent Unit GRU Peephole LSTM Hidden LTSM H-LTSM
has functional building block
FBB_Classification FBB_Value prediction
has input data type
IDT_Binary vector IDT_Vector of quantitative variables IDT_Vector of categorical variables
has internal model
INM_Neural network
has output data type
ODT_Classification ODT_Vector of quantitative variables ODT_Vector of categorical variables
has learning style
has parametricity
PRM_Nonparametric with hyperparameter(s)
has relevance
sometimes supports
mathematically similar to