July 02, 2018
Sagnik Bhattacharya published in AI Informed

The Emergence of Memory Augmented Networks

Memory is a critical aspect of advanced machine learning. For Natural Language Processing or other sophisticated pattern recognition purposes, memory is essential to access historic patterns alongside current ones for human-like understanding of business use cases.

A fairly recent breakthrough in Long Short Term Memory (LSTM) networks has greatly increased AI’s memory capabilities. LSTMs are popular for predicting sequential, time-series data like video, or static data like images. However, their memory modules are one dimensional, moving through the temporal axis in a single dimension for limited writing and recall functionality.

Memory Augmented Networks overcome these limitations. They write to permanent memory while neural networks are trained on data, and capture context based relationships as well. They enable several advanced memory operations which LSTMs can’t perform for more thorough pattern recognition critical to NLP, speech recognition, and other progressive AI applications.


Memory Augmented Networks are neural networks that focus on memory almost as much as they do learning. They work by attempting to decipher the algorithms that lead to patterns found in data. In addition to learning about the data they’re trained on, these networks also learn to remember—and how to look up—different aspects of those datasets. Thus, they can write, lookup, and overwrite memory as needed in a content accessible memory bank, which LSTMs don’t have. For a reading comprehension use case with NLP, for example, Memory Augmented Networks can understand the different points of context found in different places in a passage. These networks can identify context pertaining to a city in the first paragraph, recognize that the next paragraphs are about something else, then identify (and remember) context about the initial city in the fourth paragraph. LSTMs can’t perform this task. Despite the simplicity of this example, it alludes to the complexity of the functionality Memory Augmented Networks enable.


The emergence of Memory Augmented Networks is a relatively new phenomenon. As such, they have three primary shortcomings.

  • Architecturally Sensitive: Memory Augmented Networks are extremely sensitive to changes in architecture. Even the slightest moderation in architecture can cause these networks substantial performance divergence. The architecture of Memory Augmented Networks specializes in highly specific tasks; thus, even a moderate change in architecture significantly alters the network’s output.
  • Algorithm Sensitive: These networks are highly sensitive to algorithms as well. Since they operate by deciphering algorithms, they might learn something unrelated to their desired purpose if they are unable to exactly learn a particular algorithm.
  • Calibration Time: Because of the specificity of tasks they perform, Memory Augmented Networks require a considerable amount of calibration time. They address many memory challenges including learning when it’s required to lookup memory for adequate context, when to append memory with certain context, when to overwrite it, and how to overwrite without overwriting pertinent information. All of those functions make building these networks time consuming.

Use Cases

There are several use cases in which Memory Augmented Networks issue tangible business value. In finance, they can provide insight into stock market trends by contextualizing current stock market fluctuations with long term, historical trends (accessible via memory). They can also enhance comprehension for NLP jobs. At a high level, NLP works by parsing through a passage, encoding words with math vectors, then generating transformations to link vectors to context as required. For example, if there’s a passage about a person in a bus, each of those entities gets encoded in two different vectors. The fact that the person is inside the bus would need to be linked to form a new vector. Memory Augmented Networks can help with these processes and others for more accurate NLP and readily accessible memory for cognitive computing.

Upcoming Developments

Memory Augmented Networks may very well play a pivotal role in the future of AI. Their capabilities are suitable for episodic memory—which is influential in conversational, speech applications—and work with the notion of attention mechanisms. Once their shortcomings are overcome, they should become even more useful for these applications and more.