July 16, 2018
Rudresh V published in AI Informed

The Promise of Dynamic Co-Attention (Neural) Networks

By Rudresh V,  AI Engineer, Razorthink Inc.

The notion of question answering has presented particular challenges for text analytics and certain aspects of Natural Language Processing. A few different types of neural networks were previously used for this purpose (such as Long Short Term Memory networks), but delivered mediocre results—partly because they weren’t expressly designed for this use case.

Dynamic co-attention networks were created to solve this specific business problem. The co-attention of these networks focuses on both the question and the documents containing the potential answers. The networks use the words in the question to find answers contained in the documents, and are instrumental in rapidly answering questions for customer service, search engines, regulatory and legal compliance, and a host of additional use cases.


The basic architecture of dynamic co-attention networks consists of a decoder and an encoder. The latter takes in the question and the document, and is essential for analyzing the question. For each word in the question, the network gets information from the document. For instance, if there’s a question containing the word ‘who’, the network would integrate this word as a person in the document and parse through it looking for people. The decoder is responsible for analyzing the documents to find answers to the questions. The documents and the question (taken into the encoder) are the two primary inputs for the network’s attention. These are encapsulated in a matrix the decoder uses to answer a question. Therefore, when attempting to answer the question: “who is the physicist in a certain institution”, the dynamic co-attention network uses this process to ascribe importance to the words in the question to get the right answer by finding words that might be names. The network will first focus on the word ‘who’ to get names, then ‘physicist’ to get potential names of physicists, and so forth.


The two main limitations of this approach to answering questions via text analytics are network confusion and scale. However, there are measures being taken to address both of these limitations. Initially, dynamic co-attention neural networks only worked well on a single document of 400 to 1,000 words; more than 1,000 words caused inordinately slow responses. Improvements have been made and currently these networks can go through approximately 10 documents to find the answer for a single question.

The issue of network confusion occurs in two ways. The first is if there’s more than one potential answer for a question. In the preceding example of searching through documents to find the name of an institution’s physicist, for instance, the network might get confused if it finds a handful of names from which to choose. In this case, the decoder uses an iterative process to issue confidence scores for each answer, then selects the best one as the correct answer. Network confusion can also occur when the network looks through documents which don’t contain the correct answer, yet still select an answer anyway.


The applications for dynamic co-attention networks involve almost any task in which question answering is required. In customer service, for example, agents can use this deep learning technique to quickly go through manuals to answer customers’ questions. Another compelling use case involves open domain question answering for search engines. The scale limitations of dynamic co-attention networks requires a means of extracting information from the top 10 or 20 search results prior to analyzing them for detailed findings. Still other applications abound for regulatory or legal compliance. Legal professionals have innumerable documents to search through to find detailed information about a particular charge, ordinance, or case. Using dynamic co-attention networks they can simply ask questions or conduct searches for specific words or clauses, getting the results in seconds. These networks provide those same benefits for regulatory compliance and other use cases.

Horizontal Utility

Ultimately, dynamic co-attention networks are so valuable because they fulfill basic search capabilities for any organization, regardless of industry. There’s always a need to quickly extract information from text. Dynamic co-attention networks enhance this functionality by quickly answering questions found in text, thereby increasing the overall worth of text analytics for any array of use cases.