In the previous blog post, we looked at how chatbots work and the different types of chatbots. Now, we’ll learn how to build one ourselves.
What’s readily available?
The chatbot frameworks are much like any other software frameworks that provide us with tools and utilities. They are usually implemented for a certain programming language. In addition, some of the bot frameworks also have hosted and interactive development environments to facilitate creating bots to even bigger extent. AI services are independent, cloud-hosted platforms, often exposing GUI for interactive creation of chatbot logic, featuring Machine Learning powered NLP capabilities, and enabling communication via RESTful API.
We looked at two major frameworks: Botkit for node.js and Rasa NLU for Python. Of these botkit is really a toy case that doesn’t lend itself for real use cases while Rasa AI services, as mentioned above, are cloud-hosted solutions for NLP needs and building smart bots that can predict flows of complex conversations. They provide UI for construction of prediction models and training models of the Machine Learning based understanding of language entities. We looked at Wit.ai, api.ai, LUIS.ai, IBM Watson. We’ve tabulated our results below,
SNIPS NLU, a new entrant, decided to take this status quo on and surveyed the scene first. Then they came up with their own benchmarks as well and published the results. What do we do? We take SNIPA and others on, in turn! And, voila, we beat them, even! We compare our performance with that of others, which SNIPA had tabulated. You can see it for yourself!
How did we do this? This is relatively straightforward and our entire program can be defined in the
Step 1: We downloaded the pre-trained word vectors learned on different sources, crawl-300d-2M.vec, that’s available for download on the Internet. It contains 2 Million words, each word is represented as 300 dimension vector.
Step 2: Chatbot training data is embedded within this file. That is we’re using the 2 Million words as our training set.
Step 3: We used the LSTM since what we need here is an ability to remember for long once something is learned while learn new things slowly. This Neural Network Intent model is created with the embedded training data intents from the data above. Thus, one part of the chatbot, that is intent recognition, is taken done.
Step 4: The next part of the chatbot, namely, entity recognition is created for each intent. This is a bidirectional recurrent LSTM neural network model. That is for each intent we build multiple bidirectional entity structures so that an intent with any entity can then flow as a seamless conversation.
Step 5: Test data is obtained again from the above file, after splitting accordingly.
Step 6: Intent and entities are predicted from the intent model and corresponding entity model.
Voila we have a chatbot!
Why is that one developer who tinkered with the code was able to beat industry standards?
Those large projects we’ve benchmarked ourselves against are open to users world over and they should have performance that is top notch for that. The hardware requirements for a large cloud based SaaS model that has many possible customers simultaneously will become unfathomably expensive for a complex mathematical model.
So the large Data Science teams of large projects spends months trying to build a simple model that generalizes well. In our case we have a mathematically complex model that is difficult to train but our deployment is a simple test case and thus it works better. And this is easier to achieve as a programming task as well!
So building a chatbot is about a trade off – do you want a generalizable model that is mathematically simple and thus difficult to build or do you want model that is mathematically complex and thus generalizes poorly but is easy to build?
Your answer will determine the chart above!