Basic Overview

What Discovery is

Discovery is software that performs natural language processing—a branch of artificial intelligence concerned with the interactions between humans and computer using human languages such as English. Its main challenge is to enable computer programs to understand human text or speech as it is written or spoken, at least as far as is possible for a machine.

Put another way, natural language processing (or NLP) attempts to mimic a human's understanding of human language.

What Discovery does

At its most basic, a user can type an English sentence, whether statement or question, and Discovery

determines the part of speech of each word as it applies to the sentence, particularly as a word may independently have more than one,

disambiguates each word, i.e. determine at a minimum which senses or dictionary definitions apply in sentence context, and

diagrams the sentence, identifies different series of words in the sentence as distinct grammatical constructions—different kinds of phrases and clauses—and allocates these to construct a complete hierarchical context-dependent diagram of the sentence to determine each word’s relationship to every other. This is not much different from the sentence diagramming we learned in grade school.

However formidable and intricate this process has become, Discovery’s programming has been streamlined and modularized to the point that it typically provides a complete analysis of a sentence—fulfilling all three of these objectives—in less than one second.

Why try to get computers to "understand" language...

...whatever "understanding" would mean to a computer? Why not? Language is the means by which we humans encode and exchange, among ourselves, knowledge of any kind of subject matter—our thoughts, ideas, plans, experiences, beliefs and logic. Without it, the most rudimentary forms of thought would be impossible. Yet with it, humanity has established and carried forth an ever-advancing global civilization.

Given that the mass of knowledge today dwarfs that of any previous era in human history, a means of storing and organizing knowledge would enable us to achieve what we already achieve through language, but better. NLP has made inroads for applications to support human productivity in service and e-commerce, but this has largely been made possible by narrowing the scope of the applications—minimizing the vocabulary and limiting the sentence structures they use. Discovery’s English proficiency, on the other hand, has no such limitations.

Problems to overcome in Natural Language Processing

The main challenge in natural language processing is to enable to enable computer programs to mimic—as closely as possible—a human's understanding of language. But how would a machine be able to "understand" human language?

To answer that question, we should first ask how we humans do so.

When we wish to convey knowledge of an idea, opinion or event, we instantly and subconsciously select words representing the things, actions and qualities involved that most accurately convey that knowledge. Then we just as instantly combine those words in a particular order to convey our meaning.

One big problem in NLP, though, is that words are highly ambiguous. A word could have more than one part of speech. And as a particular part of speech, a word could have more than one meaning. So how would a computer program examine words in a sentence and figure out which parts of speech and definitions apply?

The same way we humans do: we see or hear how words are arranged by grammar and syntax, and infer from sentence structure which parts of speech and definitions are the ones intended.

We have a common, unspoken agreement about what words in a language mean, as well as how they should be arranged to convey meaning. This is the only means by which we’re able communicate through language at all. But with it, we’re able to form a practically infinite number of sentences to express anything we wish.

How Discovery can be used

Knowledge management

Knowledge management is defined as a method of codifying what employees, suppliers, business partners and customers know, and then sharing that knowledge with employees and other companies to devise best practices. In a broader sense, it's a way for any group to improve the creation, retention, sharing and reuse of knowledge, its insights and intellectual assets. In conventional methods, like in-person discussions, email exchanges and forums, that content often gets lost.

In contrast, ordinary information management systems manage only a specific range of data, depending on their purpose. New data can be added into such systems, but unless its developers consistently produce updates, the items of information they were designed to store is fixed and unalterable.

Discovery breaks through this information "straightjacket." With its unrestricted ability to analyze English sentences, we would like to make a feature available to existing knowledge management systems on the market already in use. Discovery presents an even better, more natural approach to knowledge management: a system in which can collect and manage the content of any body of knowledge expressed in English, about virtually any subject matter, and allow users to retrieve that knowledge simply by asking the system English questions. In effect, one could conduct knowledge management by having a chat- or messenger-like conversation with the system, much like one would with a human being.

Language translation

At best, when translating text from one language to another, software like Google Translate merely make their best guess at translatable words and sentence structures.

The extent of Discovery's ability to analyze sentences, on the other hand, provides a unique means to build translation software with occasional—but more importantly minimal—assistance from the user. The result would be translations as accurate as the words, grammar and syntax of the other language will allow.

Versions of the original WordNet in English, which Discovery now uses, exist in a number of different languages, Spanish being one. The associations between directly translatable word entries between these two versions, in addition to associations between grammatical structures in both languages, will provide the means to construct a Spanish-to-English/English-to-Spanish prototype. As proof of concept, it could serve as the basis of translation software for other language pairs.

Voice recognition

The same sentence analysis capability may also serve to increase the accuracy of speech recognition software, such as that used in Siri.