How to make a chatbot more intelligent and able to think contextually.

Copyright by

SwissCognitive, AI, Artificial Intelligence, Bots, CDO, CIO, CI, Cognitive Computing, Deep Learning, IoT, Machine Learning, NLP, Robot, Virtual reality, learningBack in the days of Eliza, Alice and Jabberwocky, the first chatbots developed in the 90s, capability was still rudimentary. When confronted with the complexities of human communication, they got very easily confused. Ultimately they were flowcharts and their responses resulted from relatively rigid if/then scripts. If asked what is your name, then answer Alice.

Fast forward thirty years and research in AI has given rise to various conversational interfaces in machine learning and natural language processing. As a result over the last few years, there has been an exponential growth of models to detect patterns in human language and determine intent, especially when what is said doesn’t quite match what is meant –in other words, what we have termed ‘Contextual AI’.

The rise of contextual AI

One model that is changing the way that bots communicate is WordVec2, an algorithm designed by Google using a neural net structure to learn word associations for large bodies of text to derive sentiment analysis and natural entity recognition from word similarity.

The way that this is done is through exploiting a linguistic concept called sentiment proximity – simply put, the concept that similar words occur together more frequently than dissimilar words.

As the name implies, Word2Vec represents each distinct word with a particular list of numbers called a vector. The model defines the dictionary using a vector space of these words in 300 dimensions where the similarity in direction of two vectors holds information as to the similarity in sentiment of two vector words.

This is done through a process of looping through the sample text it is trying to read, fitting a model based on neighboring words of a pre-defined number either side. To do that the neural net is used, saving the weights from the first layer of training. Words do not need to be next to each other to be detected as similar after a large enough training time – if generally they are surrounded by similar words it can be assumed linguistically that they have similar meaning.

The model create links between the surrounding and target words in a body of text using skip-gram and CBOW (continuous bag of words) models of processing. These methods use neural nets of high weighting to distil and train semantic information about the language in a text by training based off of their relationship to surrounding words. They do this by iteratively trying to predict a target word from the words around it, and by trying to predict surrounding words from a target word respectively. This semantic information is then stored in the first weighting layer of the neural nets used in CBOW and skip-gram called the embedding layer. This can be then multiplied by a pre found representation of a word to extract Word vectors for any of the input words in the training text. This creates the vectors to be mapped into the space described above.

Thank you for reading this post, don't forget to subscribe to our AI NAVIGATOR!


Word2Vec’s dual architecture method is very effective. The use of the CBOW method allows for faster processing of confidence values and has better representation for more common words in a text body, while the skip-gram method works well with smaller datasets and allows for strong representational values for rarer words in a text body.

It can also create mapping of the word vectors in as many dimensions as comparison would entail. To simplify mathematically, principle component analysis maps the vector space into a graphing system of the coder’s choice where each dimension or axis are picked to represent the most useful data (the data with the largest variance on the data value plotted) where the “principle” components are chosen and the other dimensions of the vector are ignored for clarity. This can allow for interesting sentimental mapping of a datasets. It can provide a nuanced description of proximity, and by extension similarity of vocabulary that more rudimentary methods of NLP such as entity/intent based recognition may miss. […]

Read more: