By Rohan Bopardikar, Technical Director of AI, Razorthink, Inc.
The concept of attention is one of the more fascinating developments in the neural network space, both in terms of its utility and its demonstration of Artificial Intelligence’s learning capabilities.
Attention mechanisms are able to focus on varying parts of the input of deep learning models while simultaneously maintaining an awareness of those inputs as a whole. Similar to the way humans can pay attention to someone’s eyes while talking to them (yet still remain aware of the rest of the body while doing so), attention in deep learning can hone in on specific parts of input data while still considering the rest.
This capability is instrumental to solving a number of business problems related to document classification, Natural Language Processing, and image recognition. The fact that neural networks can simply learn which parts of data are most relevant for specific tasks is indicative of this technology’s cognitive capabilities and capacity to mimic human intelligence.
The notion of attention in deep learning emerged in the last couple of years partly out of necessity. When working with inordinately large input data, it was difficult to get models to focus on specific sections of the data. For image recognition systems, for example, modelers would use what is known as hard attention: cropping images to target a specific corner of a larger image that was important. Unfortunately, this approach excludes the rest of the image. This problem was equally troublesome in language-based text applications. When translating a German sentence to French, for instance, models would typically focus on a single word, store its significance in their internal memory, then go to the next. This approach had limited use for short sentences but was problematic with long sentences or entire paragraphs, in part because of the poor recall of certain neural networks.
The nuance of attention effectively solves these problems with a non-linear approach that can shift back and forth in the input data, from the beginning of a sentence to its end as needed, resulting in more accurate translations. Furthermore, these mechanisms can do so while remembering the entirety of the input data, sentence, or passage—which was previously not possible. Thus, models can read one word while accessing others, or focus on one part of an image while remaining cognizant of the rest. Best of all, neural network models learn which parts to focus on at particular points in time for their tasks. When translating sentences, for example, neural networks learn to place greater emphasis on the beginning of sentences at the start of translations, and to focus more on the end of sentences at the conclusions. The different levels of emphasis, or the varying weight that networks put on the different parts of the input data, are entirely learned by the networks themselves.
There are numerous practical applications of attention in deep learning. One of the more prominent may be in customer service, in which human agents answer calls for any variety of purposes. Attention networks can readily identify which part of a specific text (such as a customer service procedure manual for agents) relates to a question. Thus, agents can readily identify where the information is needed to facilitate a specific customer request. Without this method, organizations run the risk of incurring delays, decreasing customer satisfaction, and increasing employee turnover—particularly in large financial service institutions in which agents are accountable for voluminous amounts of information. Attention is also useful for document classification and analysis for purposes such as regulatory compliance. Attention for neural networks gained headlines a few years ago when it was used to caption images, explaining what was happening in them by focusing on the parts of the images pertaining to the specific parts of the description.
Attention variants are an increasingly important part of deep learning and neural networks in general. They enable data modelers to maximize the effects of input data for neural network models, producing outputs that create better business outcomes.