Search That Means Business

Bringing “Watson Question/Answering” to Your Business (Part 2)

You might recall from yesterday’s blog that we defined a simple 5 step process by which IBM Watson answered the questions on Jeopardy! and explored how that processed worked. Today, we’ll talk more about how natural language processing software, such as EasyAsk apply these same principles to business solutions.

Answering Questions from Structured Data

Fortunately, the process of answering questions from structured data is very similar to the process we described earlier.

  1. Understand the question
  2. Find relevant databases
  3. Extract the answer from each relevant database
  4. Pick the best answer and display it

Understanding the question is easier in the non-Jeopardy case. We can assume there are no “trick” questions or the use of puns. But still, as shown in the 2nd example above, the questions are not always trivial to express.

Finding relevant databases is a significant challenge in its own right. Users asking the questions can’t be expected to know of the existence of specific databases much less which one their question best pertains to. But just like the case with documents, the full set must be very quickly restricted to just the relevant ones, that are worthy of further analysis to extract an answer. Unlike the case with documents, Watson cannot “index” databases on a word-by-word bases in the same way that it indexes documents. Data within databases is constantly changing, although its structure remains the same.

Extracting the answer from each relevant database requires generating a SQL query unique to each relevant database. This portion of the problem is a well-understood problem. The challenge is being able to dynamically generate the SQL based on the meaning of the question and the unique structure of each relevant database.

Picking the best answer is not so easy either. Unlike the answer to Jeopardy questions, the answers to structured data questions are not a single word or phrase. They are often lists or small tables. This makes it hard, if not impossible, to allow databases to “vote” on the right answer like Watson allows documents to do.

Watson will require extension in all of these areas before it will be able to answer questions from structured data. Clearly, IBM is one of the world leaders in structured databases, and Watson’s language processing is already up to the task of understanding the question, but it will still take time so address these issues, during which Watson can be deployed in the first 2 phases we have described.

Question/Answer of Structured Data Today

Fortunately, there is already a product in the market that performs question/answer against structured data and it has been doing it for years.

The product is EasyAsk.

EasyAsk works by using a dictionary customized for each database that it will be accessing. Each dictionary contains two parts: the user’s view of the database (Conceptual View) and the DBMS’s view of the data (Logical View). EasyAsk uses the conceptual view information to understand the questions and it uses the logical view information to generate the SQL to answer the question. Wit these components in mind let’s see how EasyAsk performs the steps we mentioned earlier:

  1. Understand the question
  2. Find relevant databases
  3. Extract the answer from each relevant database
  4. Pick the best answer and display it

Understand the Question

Rather than try to understand the question in the abstract, EasyAsk tries to understand the question in the context of each specific database. It does this by broadcasting the original English question to each dictionary that it is using. Each dictionary responds (within a tenth of a second) whether it can understand the question in light of the database the dictionary specializes in and whether it can successfully generate a SQL query to answer from that database.

Find Relevant Databases

Of course, the set of relevant databases is the set of databases the responded positively to the broadcast query. Note that this is not a commitment to answer the question, only an assertion that it makes sense in the context of this specific database. For example, both a dictionary for a Sales database and a dictionary for a Marketing database may respond positively to the same question, but a dictionary for a human resources database is unlikely to respond positively to the same question.

Extract the answer from each relevant database

The first step is to generate the SQL that was promised by the initial linguistic analysis. The second step is to execute that SQL. Note that this does not guarantee an answer, since the result set may be empty. Also, the time it takes to execute the SQL depends on the size of the database and how well the database itself is indexed. In general, these times would longer that the time allowed in Jeopardy to answer the question. However, in a business situation, answering a question in under a minute is often quite sufficient.

Pick the best answer and display it

The good news is that there is no requirement to come up with a short pithy answer that must be phrased in the form of a question. Most business users would prefer their answer to be in a spreadsheet rather that voiced to them anyway. This allows each relevant database that can answer the question to put its answer on a different tab within the spreadsheet. The user can then decide which answer is the best answer for their purposes.

Combining Watson and EasyAsk

Since the two questions/answer engines, Watson and EasyAsk, fit together in such a complementary fashion, it makes sense to think about combining them within an environment that is seamless to the user. Since the EasyAsk architecture is already distributed across so many dictionaries and databases, it is easy to add Watson to the list of engines to receive the original natural language query. Its results could appear in a tab of its own that represents the best answer from the unstructured side of the company’s data.

If you were as impressed as I was watching Watson’s performance on Jeopardy and feel that question/answer technology would be a benefit to your company, you may be a lot closer to that goal than you think. While you’re keeping an eye out for IBM’s announcement on when it will be shipping a version of Watson to its corporate customers, you can begin building EasyAsk dictionaries for the structured databases than you already have. This incremental approach will dovetail nicely with Watson once in becomes available.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>