Table of
Contents
1 Introduction 1
2 High Hopes 2
2.1 See Emily Play 2
2.2 Keep Talking 4
2.3 Signs of Life 6
3 Marooned 13
3.1 Lost for Words 14
3.2 Learning to Fly 15
3.3 A New Machine 18
3.4 Poles Apart 22
4 Conclusion 25
5 Bibliography 27
Abstract
One key element of improved future human-computer interaction is the development of a natural language parser and generator. To be useful, an intelligent agent must be employed to aid in the understanding of language. The goal of this project is to develop an unprecedentedly robust interface for an intelligence. We used unorthodox development platforms, techniques, and paradigms to attempt to accomplish this goal. We discovered a fundamental flaw in natural language processing, and propose the next step toward a solution.
“What this field needs is a few more well-documented failures.” --Anonymous
In the primary stages of this project, we expected to do very much a “traditional” MQP, one where there is a well-defined problem stated, an implemented- and implementable- solution, and a report on the results and conclusions of the project. What we ended up with, as we are sure any reader will agree, is very unconventional in that light.
Our original problem statement related the need for an interface for an artificially intelligent entity “with a robustness that surpasses all other attempts to date.” To accomplish this, we decided, we needed to
use more unorthodox, experimental methods of creating a natural language interface for an intelligent agent. We chose, as our weapons, two seemingly very unique tools for the implementation of our natural language parser and generator. The first is a development environment that is seldom, if ever, used for serious programming endeavors. The second is our choice of an uncommon “native” language for the intelligent agent. Both of these are described in detail in the following paper.
With what we deemed a satisfactory conclusion to our first goal completed, we moved on past the interface of the agent, and started to work on the intelligent agent itself. What we intended to create was, overall, a “completely empathic” intelligent agent. What we found, however, was what we consider to be a flaw with the use of natural language. Going back over our research, then, for the first part of the project, we decided we needed to do more research. In the end, we concluded that we could not get past this stumbling block, and were left only with the option to make conjectures as to how one may get past it at a later date. We documented our thoughts on the subject, and hope to be able to implement them within the near future.
Not withstanding any of the above, we submit for approval, “Speak to Me: An Experiment in Artificial Intelligence.”
This paper, much like the entire project, is divided into two major sections. In the first section, we will discuss the experimental portion of the project. This section deals with our choice of development platform, the natural language we chose to use during development, and the details of the implemented parser.
The second section of the paper deals with the more theoretical portion of the project, detailing what we have seen as the point of impassability in dealing with natural language. We also discuss in this section a proposal to a potential solution.
In this section of the paper we will discuss our two major design decisions, the use of MOO as a development environment, and our exploitation of the natural language Esperanto as a basis for the parser and prose generator for our intelligent agent. Following descriptions of these choices and our explanations for them, we will discuss in detail the mechanics of the parser and prose generator.
Emily, the working name for our intelligent agent, was not developed in any of the more traditional programming languages used in the field of artificial intelligence. Instead, we chose to develop on a platform traditionally used for gaming, known as MOO. This section will provide the reader with a general history of the MOO and an explanation of why such an unorthodox design decision was made.
MOO is an acronym, for "MUD, Object-Oriented." MUD, in turn, is an acronym for "Multi-User Dungeon." The basic concept is that a MOO is a text game, of sorts, supporting multiple players over the Internet. Steven White of the University of Waterloo, the developer of the original MOO, coded a minimal server and an interpreter for the language that drives the internal objects. Shortly after, the project was passed off to Pavel Curtis of the Xerox Palo Alto Research Center in 1990. LambdaMOO, as the project was called, was maintained and developed by Pavel and others over six years into the server that is used in this project. By this time, the MOO server had evolved a robust programming language, an object-oriented database, and complete Internet connectivity. Although this gaming environment may not seem to be an ideal development platform for such a sophisticated experiment in artificial intelligence, a more technical analysis of the internals of a MOO server will reveal that this misconception is far from the case.
A more abstract look at the MOO server, removing the guise of the gaming environment that it was originally used for and replacing it with a new paradigm, reveals a complex software development environment. All the data contained in the LambdaMOO server is represented in a fully programmable objected-oriented database. The structure of the nameless language internal to the database can best be thought of as a cross between QuickBASIC and C++. Although the language is geared towards ease of use and learning for the beginning programmer, it is not power-limited. The language supports multi-threading, lists, sets, objects, inheritance, recursion, aggregate data types, and TCP/IP network connections, all in a real-time, dynamic environment. This gives the developer all of the flexibility and power of any of the more traditional AI languages, such as LISP and Prologue, in a simple, robust, C-like language- complete with all the benefits of having a built-in object-oriented database. These are a superset of the necessary properties that a language should have for undertaking this type of project. The current LambdaMOO server and original site is still available for public consumption at telnet://lambda.moo.mud.org:8888/.
Having knowledge of the power of the MOO server, we chose to use it as our development platform. The main reason for doing so was the seeming robustness of the system, when compared to LISP and Scheme. LISP and Scheme both lack internal object-oriented databases, which would have forced us to create our own systems for handling frames and other data objects. Although we acknowledge that creating an object-oriented database is not impossible within LISP or Scheme, we felt that our development time would be better spent using a proven database.
Secondly, the MOO server is inherently multi-threaded and network-aware. Since we would be able to develop in a multi-threaded, multi-user environment, we would gain the potential for experimenting with demons, and conversations with multiple users as opposed to strict dialogues. Furthermore, we would be able to experiment with listening to conversations between multiple third parties, as opposed to the agent always being involved in the conversation. In short, we would be able to recreate many of the same social situations that we encounter on a daily basis. To take it one step further, we would be able to have multiple agents conversing with one another, all using the same basic medium for communication as a human “agent.” Some of these experiments may be possible in LISP, Scheme, or Prolog, but not at such a minimal cost of development time.
To help facilitate the understanding of some of the details of the project, we need to explain some of the fundamental components of MOO programming. Most importantly, a clear description of the composition of objects needs to be documented to avoid confusion. Objects within the MOO, then, have a set of common characteristics. First, they all have unique object numbers. This allows the MOO to address each of the objects, no matter what name is given to them. Secondly, all objects have parents and the potential to be parents. As this feature is only exploited on the lowest levels of the code, we will not discuss them now. Next, each object has a set of properties. Properties are loosely typed entities within the object used to store data. Each property may be a string, integer, object number, or list. Lists, of course, are a heterogeneous collection consisting of any of the four major data types, including lists. Finally, the methods or functions declared on a MOO object are called verbs. For the purposes of this paper, and to avoid confusion with natural language verbs, any reference made to a MOO-type verb will be denoted as :verb(), in the same format that a MOO :verb() would be denoted within a piece of code.
What is important at this point is an explanation of why we chose to use Esperanto as opposed to the more traditional language for intelligent agent interfaces, English. Esperanto is a planned language, meaning that the constructs that form it were engineered. They were developed in such a way as to minimize ambiguity so that at no time could the intentions of the speaker be lost in the expression of the language. Although it is a well-planned language, it is not perfect. We used Esperanto as a basis for the language of our interface, but ultimately modified the rules slightly in areas that were more vague than others. Most importantly, by studying the rules of the language, we realized that this natural language was superior to any of the unplanned languages we had to choose from.
Esperanto was created in 1887 by Ludwig Lejzer Zamenhof, a Russian philologist. During his early life growing up in Eastern Europe many different people, all whom spoke a different language, constantly surrounded him. Hoping to unite everyone under a common language, Zamenhof created Esperanto. Esperanto was fabricated from the ground up, with an emphasis on being easy to learn, and easy to speak - sounding similar to Spanish with a touch of French.
Esperanto is what is known as a "grammar encoded." In Esperanto, you can tell exactly what type of word something is, and what type of context it has in the sentence, merely by examining the word. Every word in Esperanto has what is known as a "root word.” The root words make up the base of the language. The next major group of objects in the language is the affixes that can be applied to those root words. Esperanto has a number of standard prefixes and suffixes that may be applied to any root word to change its meaning. For example, affixes are used for simple things like pluralization, but they also have more complex uses, and can even change the meaning of the word entirely. Affixes are also used on a direct grammar level, and are used to identify the nouns, verbs, and indirect objects, on any given sentence. Here is a list of some of the most common affixes, and what they are used for.
Prefixes:
MAL Opposite
BO Relationship by marriage, -in-law
Suffixes:
O Identifies word as a noun
J Identifies that word is plural, or augmenting something plural (adjectives and adverbs
must agree with what they modify, both in singularity/plurality and in direct/indirect object sense)
N Identifies that word is an indirect object, or augmenting something that is an indirect
object
I Identifies word as an infinitive verb
IS Identifies word as a verb in the past tense
AS Identifies a word as a verb in the present tense
OS Identifies a word as a verb in the future tense
IN Female / Feminine
As you can see by this very small set of affixes, just adding a few different suffixes can change the meaning of a word. For example, take the root word: vend, which is sell.
Esperanto English
Vendi To sell
Vendis Sold
Vendas Selling
Vendos Will sell
Vendo (A) seller, one who sells
Vendino (A) seller (who is female)
Vendoj Many sellers
Vendinoj Many sellers (who are female)
Vendon A seller (who is the indirect object in a sentence)
Sometime the entire word can shift meaning with the addition of a single affix. For example:
Esperanto English
Filo Son
Filino Daughter
Knabo Boy
Knabino Girl
Patro Father
Patrino Mother
Bopatro Father-in-law
Bopatrino Mother-in-law
The list goes on and on. For a complete list of Esperanto affixes, visit http://www.aitec.edu.au/~bwechner/Documents/Esperanto/FEC.html/Affixes.html
The decision to use Esperanto as our primary language was not an easy one. Our original plan was to use English, simply because it is our native language. Stepping back, and looking at it from a scientific aspect, English was not the greatest choice for a language. It is a difficult language to learn, essentially because for every of the nearly countless grammar rules, there are one or more exceptions. "I before E, except after C, except in neighbor, weigh, height" is just one example. Also, word order is extremely important, especially when determining the meaning in any given sentence. For instance, "He was OK" is much different than "Was he OK?" Complex sentences can often be difficult to understand, especially when the reader/listener is uncertain which object is the direct object, and which is the indirect object. In English, these must be determined by understanding the sentence, and even then people can often misinterpret them. An additional challenge is encountered when dealing with a strictly text-oriented medium for communication. Without the ability to detect, much less interpret, inflections in the voice of the speaker, one must exploit word order to make a distinction between statements and questions. Punctuation, although the first seemingly viable alternative, is unreliable. The system must be robust enough to cope with those who speak, or in this case type, their sentences incorrectly.
As was stated before, every word in Esperanto is grammar encoded. There is a small set of affixes that determine what type of word something is. Determining if a given word is a noun, adjective, verb, or adverb is much simpler than in English. Also, word order in Esperanto is only semi-important. Adjectives and adverbs are usually grouped near whatever they modify, and even then, they are grammar encoded to match whatever they modify. Other than that, word order is unimportant. The appropriate suffix determines if a noun is a direct or indirect object, not where it is located in the sentence. For example, “Mian fraton lavis mia patrino.” is equivalent to “Mia patrino lavis mian fraton.” In English, the first sentence could be translated as “My brother was washed by my mother.” The second sentence could be translated as “My mother washed my brother.” Although slightly different in English, in Esperanto the two sentences are analogous.
Taking all of this into account, we determined that a slightly modified version of Esperanto would be ideal for our project. Not only does the language itself provide a certain degree of robustness, but it also is easy enough to learn to be a practical solution to the natural language parsing problem. Although English may be our native language and Esperanto both planned and foreign, it is still a natural language. As we have demonstrated experimentally in the following parser, Esperanto is a more than adequate solution.
Emily’s parser is implemented as a series of :verb() compositions. Each “level” of the composition relies on the :verb()’s of the lower level to provide arguments to it. By using this composition methodology, coupled with an extremely flexible, real-time update policy, we have designed an efficient and robust parser. Not only can we, as developers, tweak the code on the fly, but an intelligent agent would also be able to change the code without the intervention of a programmer and with minimal effort in the future.
There are five different functional levels to the Esperanto parser we have created, all of which will be expounded upon in the following sections. The first of which is the interface to the rest of the MOO, the Generic AI Bot Player Class. The second level is the Esperanto Dictionary, the container for all the words in the language and the algorithm to parse them. The third, fourth, and fifth levels deal with extracting the sentences, phrases, and words from a line of text.
Level 1 - Generic AI
Bot Player Class
This object is, of course, the abstraction of the AI we reference as Emily throughout the project. On the parser level, it is largely responsible for taking in a line of text from its environment, invoking the parser, and taking the output from the parser and shipping it off to be processed for context. Emily "reads" something when some external :verb() invokes her :tell() :verb(), providing that she is awake, or connected to the MOO. In order to determine what it is she is reading, the text must be interpreted, and meaning eventually extracted from it. The parser is responsible for taking in the text and converting it to a standard format for use with the rest of the system. The parser itself only deals with one "line" at a time, being one complete sentence or one sentence fragment. The ability to deal with multiple sentences is a responsibility of the context system, and is outside the scope of the parser. Emily's :tell() :verb(), then, takes in the line of text, and then sends it off to her default dictionary, the Esperanto Dictionary, for parsing. The :tell() then waits on a response from the dictionary object, which will be in the form of a Sentence object, the standard internal format for prose. By the nature of the code and the MOO server, Emily can parse several different sentences at the same time without any conflict.
Level 2 - The
Esperanto Dictionary
The dictionary class in this system is very loosely defined, allowing for the robustness required by some languages. In fact, the only requirement imposed is that there is one generic interface, :parse(), for the entire system. :parse() must take in a string, where the string may represent any piece of text that the parser is to interpret. :parse() must return a Generic Parsed Sentence Object as well. Past that, the dictionary class is a black box, completely definable by the programmer. The dictionary that we have defined, the Esperanto Dictionary, serves multiple purposes simultaneously. It provides us with definitions of the parts of speech of Esperanto, it holds all the words of the language, and implements :verb()’s to convert prose to our standard internal format. The code for this level, as well as the code for Levels 3 through 5 of the parser, which will be discussed later, are largely implemented in this object. Certain utility :verb()’s are programmed on other objects, such as the Generic Parsed Word and Generic Parsed Phrase objects, but these :verb()’s are largely destructors, pretty-printing routines, and interface overlays. The main :verb()’s within the dictionary will be discussed at the appropriate levels, including the :parse() :verb() at this level.
On the whole, the :parse() :verb() serves as the interface to the parser. It is responsible for taking in text from the :tell() :verb() and parsing it appropriately. This includes any pre-parsing which must occur, such as down-casing each of the characters in the line of text, and potentially stripping out or interpreting punctuation. After the pre-parsing is completed, the :parse() :verb() invokes a composition of :verb()’s to parse the text, and waits on the return, a Sentence object. The Sentence object, normally returned back to :tell(), is instead displayed and then recycled. Typically, :tell() would accept it and complete the processing of the text by passing it to the context system. To accomplish the parsing, :parse() calls the following :verb() composition on its input:
:get_sentence(:get_phrases(:get_roots(text)))
Each of these levels, and the objects that they return, will be discussed in the levels to follow.
Level 3 - Sentences
The :verb() :get_sentence() takes in for input a list of two lists, the first of which is a list of Generic Parsed Phrases, and the second is a list of words that have not been resolved into phrases. Upon receiving this input, :get_sentence() creates and initializes a Generic Parsed Sentence Object, adding these two lists into the appropriate places in the object. The unresolved word list plays a vital role later on. This is where any word that does not belong to the language, by either not being in the vocabulary or by being misplaced syntactically, is listed. This list can then be used by the context system to determine how its presence modifies the sentence. After the creation of the Sentence object, it is returned to :get_sentence()'s calling :verb(), typically the :parse() :verb().
Level 4 - Phrases
The :get_phrase() :verb() takes in a list of Generic Parsed Word objects, normally supplied by the Level 5 implementation of :get_roots(). The :verb() :get_phrase() is really just a procedural :verb() that calls upon three sub-:verb()’s, :find_verb_phrase(), :find_prepositional_phrase(), and :find_noun_phrase(), in that order, to extract each of the phrases from the list of root words. Each of these three :verb()’s is repeatedly called until no more of their respective phrase types can be found. All three are fairly similar. Each scans the list for the respective "phrase root," a verb, preposition, or noun. :find_verb_phrase() and :find_noun_phrase() then look to the left of the root word for adverbs and adjectives, respectively. :find_preposition_phrase(), on the other hand, scan the words after the root for a single noun phrase. Each of these :verb()’s bundles up the root of the phrase and the modifiers of the phrase into Generic Parsed Phrase objects, tag the Phrase objects as their appropriate type, and return them to :get_phrases(). The :get_phrases() :verb(), as this is happening, reduces the working list of words to those that have not yet been encapsulated in phrases. The list of phrases and any left over words are returned to the calling :verb(), typically :get_sentence().
As one may notice, this level of the parser contradicts the notion that word order in Esperanto is irrelevant. Any student of the language will agree that this notion to begin with is not a completely accurate statement. In any case, we decided that a slightly modified version of Esperanto would better suit our needs. The only two parts of the language that we modified is a specification that adjectives and adverbs must precede the root word, and the noun phrase of a prepositional phrase must succeed the preposition. Although we cannot conclusively say that these modifications were necessary, the fact that we felt a need to make them provides us with insight that although Esperanto may be a spiffy language to work with, it is not ideal.
Level 5 - Words
Lastly, or first- depending on your frame of reference- the line of text is taken in by the :verb() :get_roots(). The purpose of :get_roots() is to convert each of the words in the line of text to the Generic Parsed Word internal word format. :get_roots() is essentially responsible for breaking down the line of text to a list of words, sending each to be processed individually by :get_root(), and use the return values to set the properties of the word appropriately. For example, based on the return values of :get_root(), :get_roots() will set the .noun, .plural, or other appropriate flags on the Word object to 1. After this has been done, a list of the root words is returned to the calling :verb(), typically :get_phrases(). The sub-:verb() :get_root() is responsible for taking in a single text word and extracting the root of the word, while keeping track of the prefixes and suffixes it had to remove to find the root. In order to accommodate the language, we first seek the word in a list of known words, then strip away valid suffixes so long as we don't have a valid word in the language. If we still have not found the root word, we remove prefixes in a similar manner until we have either found a root word or eliminated the possibility of the word being valid within the language. The net result is a Word object, which consists of the root word, a list of suffixes, and a list of prefixes. This Word object, or an indication of failure to parse, is returned to the calling :verb(), typically :get_roots().
Since that may seem somewhat confusing, what follows is an example of a trip through the parser. First, someone in the room “says” something, let’s say “Mia patrino lavis mian fraton.” This text is broadcast to everyone in the room, invoking the :tell() :verb() on all objects within the room that possess a :tell() :verb(). For normal players, the :tell() :verb() eventually displays the spoken text to the screen. For Emily, however, it is much more complicated. Emily’s :tell() :verb() takes the sentence and parses it using the default dictionary, the Esperanto Dictionary. The dictionary takes the sentence and is responsible for parsing it into a structure that Emily can understand. It calls its own :verb() :pasrse(), which handles parsing the sentence. This is Level 2 mentioned above. Again, :parse() calls :get_sentence(:get_phrases(:get_roots(text))), or, in this case, :get_sentence(:get_phrases(:get_roots(“Mia patrino lavis mian fraton.”))).
Instead of the order of calling, we will for this example discuss the process in the order of function resolution. Since :get_roots() must complete before :get_phrases() can finish executing, we will examine it first, and so on. This opposes the order of the previous definitions of Levels 3 through 5, in the name of clarity.
So, :get_roots() is called with the input “Mia patrino lavis mian fraton.” :get_roots() breaks each word down into its root word and affixes. “Mia” becomes “Mi” + “a”, “patrino” becomes “patr” + “in” + “o”, and so on. :get_roots() returns a list of “word” objects, a special object that we created to store the root word and its affixes. The order of words in this list is the same as the order that the words appeared in the sentence.
:get_phrases() receives as its input the list of words from :get_roots(). :get_prases() parses this list of words into phrases: verb phrases, prepositional phrases, and noun phrases, in that order. Each phrase type has its own helper function to parse the list of words. So, :get_phrases() would first call :find_verb_phrase() with the list of words as input. :find_verb_phrase() does just that: it searches the list of words and finds the verb phrase. It takes the verb phrase that it finds and creates a “phrase” object, another special object that we created. A phrase object consists of a ‘type’ property, a ‘root’ property to hold the word object of the word that is the root of the phrase, and a ‘modifiers’ property, which is a list of word objects of the words that modify the root word in the phrase. Lastly, :find_verb_phrase() returns a list of the newly created phrase objects. Since there is generally only one verb phrase per sentence, this list usually only has one object in it. But, the logic exists to parse multiple verb phrases from the list of words. In the above example, the list of phrase objects would look like this. It would consist of one object. This object would have its type set to ‘verb,’ its root would be the word object for ‘lavis,’ and its modifiers list would be empty. Now, back in :get_phrases(), :get_phrases() removes the words in the phrase objects from the original list of words, and calls :find_prepositional_phrase() with the newly modified word list. :find_prepositional_phrase() is very similar to :find_verb_phrase(), except that it returns a list of phrase objects, with each phrase object’s type set to ‘prepositional.’ In the above example, :find_prepositional_phrase() would return an empty list, since there are no prepositional phrases in this sentence. Now, back in :get_phrases(), the list of words in the new list of phrases objects are removed from the original list of words. Lastly, :find_noun_phrase() is called with what is left of the list of words. Like its sister :verb()’s, :find_noun_phrase parses the list of words, this time searching for noun phrases. It returns a list of phrase objects, with each object’s type set to ‘noun.’ In the above sentence, there are two noun phrases. The first phrase object would be of type ‘noun’ and contain as its root the word object for “patrino.” The modifier for this root is the word object for “mia.” The second phrase object created by :find_noun_phrase() would also be of type ‘noun,’ but would contain the word object for “fraton” as its root. The modifiers list would again contain only one word object, the object for “mia.” :find_noun_phrase() would then return this list of phrases to :get_phrase(). Again, :get_phrases() removes all of the words from the original list that were used in the noun phrases. :get_phrases() then returns a list of all the phrases, and any words of the original word list that have not been resolved, to :get_sentence().
:get_sentence() packages the return from :get_phrases() into our sentence object. The sentence object consists of a name, which is always “Sentence ” + object number of the sentence object. The lists of phrases returned from :get_phrases() is placed into the ‘phrases’ property of the sentence object, and any unresolved words are placed in a list in the ‘unresolved’ property of the sentence. This sentence object is returned back to the :tell() :verb() via the :parse() :verb().
The original sentence was “Mia patrino lavis mian fraton.” The following is a sample transaction between a user and Emily’s parser. Normally the parser would not display this type of detail on how the sentence was broken down, but it aids in the understanding of the internals of the system.
"Mia patrino lavis mian fraton
You say, "Mia patrino lavis mian fraton"
Sentence #2717 (#2717)
Consists of:
Phrases:
Verb Phrase #2796
(#2796)
Phrase root:
lavis (#6916)
Word root:
lav
Word
prefixes:
Word
suffixes:
is
End of word.
Phrase
modifiers:
End of phrase.
Noun Phrase #3036
(#3036)
Phrase root:
patrino
(#6875)
Word root:
patr
Word
prefixes:
Word
suffixes:
o
in
End of word.
Phrase
modifiers:
Mia (#6851)
Word root:
Mi
Word
prefixes:
Word
suffixes:
a
End of word.
End of phrase.
Noun Phrase #2835
(#2835)
Phrase root:
fraton
(#2675)
Word root:
frat
Word prefixes:
Word
suffixes:
n
o
End of word.
Phrase
modifiers:
mian (#6937)
Word root:
mi
Word
prefixes:
Word
suffixes:
n
a
End of word.
End of phrase.
Unresolved:
End of sentence.
Emily->MQP says, "Mia patrino lavis mian
fraton"
One of the many features of our parser, partially due to our defined aggregate data types and the natural beauty of the Esperanto language, is the simplicity with which we can generate a sentence in Esperanto. By doing nothing more than collapsing the tree that composes a Generic Sentence Object, we are able to create a perfectly syntactically correct sentence in Esperanto. We can, with careful consideration of word order, give Emily a “dialect” of sorts.
At any rate, Emily’s ability to say something intelligent is still outside of the scope of this portion of the project. Her ability to do so is hampered by her inability to comprehend. Although she can, at this point, comprehend Esperanto as if she has been a native speaker of the language all of her life, she cannot apply any meaning towards the symbols she understands. This brings us, then, to the second part of our project.
After the successful completion of the interface to an intelligent agent, namely the parser, we then took up the task of creating an intelligent agent to use the natural language system of our design. Moreover, not only was this agent intended to be intelligent, but the overall goal was to create a completely empathic agent. In essence, the agent would understand the human experience, at least from its rather unique perspective. This, as we stated before, was where we ran into a rather large stumbling block. The problem we encountered is the ambiguity of natural language.
We had implemented and were prepared to use some of the more traditional methods of AI programming as a basis for our agent. For our knowledge base, we had elected to utilize the MOO’s object-oriented database as a support structure for frames. Each frame would inherit properties from its parent in the hierarchy, and would have a series of demons associated with each property to facilitate learning. We had already established and begun to implement agencies for rule evaluation, modeled after Minsky’s mind-society.
Something was amiss, however. It became apparent to us that no matter how much Esperanto we “taught” our agent, Emily could understand a string of Esperanto no better than we could. In fact, we found that she had an even larger disadvantage. Whereas we could translate the string of foreign symbols into a language more suitable to our tastes, Emily had no frame of reference by which to obtain any sort of meaning. We had very clearly made a bad assumption along the way, we decided, and went back to trace it down.
The problem did not seem to be within the way that we translated the words. Our frames and rule agencies seemed to be behaving quite well. With this is mind, we took a closer look at our task. We were trying to ground symbols that have meaning only in the context of the real world, with only other ungrounded, abstract symbols at our disposal. It then became clear that we needed a better solution.
Unfortunately, the proposed solution we decided had the best shot of furthering our efforts would take more time, resources, and expertise than we would have available in the fleeting days left for the project. At that point, we decided it would be best to document our findings. These findings, we hope, will serve as at least an alternative view of the problem at hand. Although the conjectures contained throughout the rest of this paper are mostly hypothetical, and borderline philosophical, we will attempt at some later date to implement our ideas, and hope they serve as the basis for an MQP yet to come.
It seems that English is the predominant language for use in the development of intelligent agents. This conclusion was reached solely because we have found no references to agents developed to speak other natural languages. Since we have not found any references that give us the reasoning behind this design decision, we must conclude that such a strategy was implemented simply because English is the canonical example of a natural language. Perhaps the reason it is seen as such is due to the fact that such a large percentage of the world speaks English. In any case, it has been shown with the development of this Esperanto natural language parser that English is far from the ideal language. We were able to implement a parser for a large, dynamic natural language in less than five weeks. Not only were we able to implement it, but we have engineered it in such a way that we can go directly from a data structure to a line of prose by simply collapsing a tree. The development of this parser gives us reason to believe that other models for natural languages may be more efficient than English. These alternative natural languages, like Esperanto, should be explored to find the optimum set of languages for intelligent agent development.
Perhaps the underlying reason for the ease of which Esperanto could be modified to suit our needs is that our version of the language is very nearly a Context Free Grammar. Since our modified Esperanto grammar can be easily formalized into one of Chomsky’s formal languages, and given that for every Context Free Grammar there is a Turing Machine, the basic model of computation, that represents it, we could state from the beginning with confidence that our Esperanto was computationally possible.
Context Free Grammars can be written to represent large segments of natural language. Chomsky’s work reveals that for most natural languages, such as English, there can be no complete formalization.[6] Our modified version of Esperanto defies this, however, by the rigid yet relaxed style of the grammar, as has been discussed in previous sections. The general relaxation of word order, at least on the phrase level, gives us the ability to easily see and represent similarly sentences such as “John opened the door” and “The door was opened by John.” This integration of semantics into syntax, much like what has been done in Esperanto, is one of the attributes often credited to the success of systems such as LIFER and LADDER, as described by their creators, Hendrix and Sacerdoti.[6] To further aid our efforts to parse the language correctly, the vocabulary of Esperanto is such that only there is a one-to-one relationship between the symbols and their meaning. Little, then, is lost to ambiguities of the natural language. Read, read, and reed are all represented by different symbols in Esperanto. Although this does not remove the problem of non-determinism completely from the language, it does help tremendously.
Of course, there will be arguments that Esperanto is not a natural language. What makes a “natural” language anyway? First, it must be someone’s native language. Therefore, Esperanto must not be a natural language. The assumption that Esperanto is not anyone’s native language is false. According to the sci.lang FAQ[2], speakers of Esperanto have passed it on to their children, making Esperanto their children’s native language. Dismissing Esperanto solely because it is not the language of any one nation is ludicrous. Esperanto is also a planned language. Ah, so Esperanto must not be a natural language. Does this really make the language any less natural? No. English, by that criterion, is very much an unnatural language as well. It is not well planned, but planned nonetheless. The only truly natural language, it would seem, is that of nearly meaningless noises or of hand gestures. Language is merely a protocol for the encoding and decoding of ideas. If nothing else, our use of Esperanto changed only the protocol by which communication happens, not the communication itself. Using Esperanto as opposed to English is no less natural, then, than using Latin.
Under the presumption, then, that language is nothing more than a protocol for the encoding and decoding of ideas, we begin to understand why many of the attempts at natural language understanding have come to less than successful conclusions. Language itself is only a medium for carrying representations of ideas. With only a language, a set of denotations, one cannot successfully interpret language. One may understand the words spoken, and have some sense as to how to decode them. However, this does not imply understanding the meaning of the words. To better differentiate between understanding a language and understanding meaning, we will borrow the term grok to represent understanding meaning. From the Jargon File:
grok /grok/, var. /grohk/ /vt./
[from the novel "Stranger in a Strange Land", by Robert A. Heinlein, where it is a Martian word meaning literally `to drink' and metaphorically `to be one with'] The emphatic form is `grok in fullness'. 1. To understand, usually in a global sense. Connotes intimate and exhaustive knowledge. Contrast zen, which is similar supernal understanding experienced as a single brief flash. See also glark. 2. Used of programs, may connote merely sufficient understanding. “Almost all C compilers grok the void type these days.”[9]
Many intelligent agents in existence today make only an attempt to understand language, not to grok it. Some of these agents use sophisticated hashing functions on the line of text to determine what to say next based upon the words, or more appropriately symbols, contained therein. Others use Markov Chaining to determine which words commonly appear with others and generate their response based upon the chains they have experience with. Others search only for keywords and return the appropriate result based on only a few words of the input line. To successfully derive the meaning of a line of text, one must not just have the denotations of the words involved, but a set of connotations as well. Without any type of connotations of the sentence, or the context that it is expressed in, there can be no grokking.
The first major obstacle in any attempt to create an intelligent agent that understands, or more appropriately, groks, natural language is establishing a set of denotations for the words in the agent’s vocabulary. Whether the implementation involves a dictionary, translating one word, or symbol, into another set of symbols, or an abstract scheme to relate commonly linked words and phrases together, denotations ultimately fail. There are no sufficiently comprehensive definitions. Every denotation must be accompanied by an interpretation of connotations and context. This interpretation, we will see, is the decadence of the system.
Our first complement to the incomplete denotation is the connotation. This level of interpretation gives us a, for lack of a better term, secondary denotation. This is usually derived from previous experiences with the symbol involved. For example, if we know the symbol “red” has been previously associated with the “hate,” “fire,” “love,” and “stop” symbols, we can attempt to pick from this diverse set the element which best fits into the context. Confused yet? Your AI sure is. Not only is this list a seemingly random group of words, but some of the connotations directly oppose each other! David Abram said that the linguist Ferdinand de Saussure “described the structure of any language as a thoroughly interdependent matrix, a web-work wherein each term has meaning only by virtue of its relation to other terms within the system.”[1] Supposing that we have connotations for each of these words, we can reduce the list to yet another list of words. Where does this recursion stop? Without a base case, some point of reference, one can derive an exhaustive, and explosive, list of completely meaningless symbols. Clearly, an understanding of context may be a bit more than important.
With a failed attempt at grokking by the way of connotations, we must rely on using context to derive the meaning of the group of symbols. Context relies upon understanding the situation, what has led up to the current situation, and the expected goals and outcomes of the situation. Not only is this required for the agent, but the agent must make at least an attempt to determine each of these for every participant in the scenario as well. To do so with any degree of precision requires the analysis of a list of abstract concepts, a list which quickly approaches infinite in size. To simply capture certain words from previous sentences and call that a “context” is bogus. At best, this only provides a system for “filling in the blanks.”
Failing to complement denotations with both connotations and context, we must reexamine our very denotations, and their nature. Aficionados of the psychological studies may well insist that our problem is that we never achieve what they call gestalt. According to Funk and Wagnalls Standard Desk Dictionary, 1984, gestalt is “a synthesis of separate elements of emotion, experience, etc., that constitutes more than the mechanical sum of the parts.” Loosely speaking, we never understand the “big picture” and are doomed to a limited understanding brought about by our use of micro-worlds, which will be discussed at length later in the paper.
Our failure to grok the sentence, then, is due to our very elaborate implementation of a scheme for avoiding the real issue- grounding our symbols. It is extremely difficult, if not impossible to take a set of symbols and forge relationships between them without at least an initial point of reference. We need, then, a less precise method for capturing the essence of what words mean. It becomes more apparent that though we are trying to simulate human intelligence in a machine, we are severely handicapping our efforts by attempting to impress upon the machine our own interpretations of the world around us, instead of letting the machine interpret the world for itself.
By impressing upon the machine our own interpretations of the world, we seriously impede our own efforts. How, then, do we even form our interpretations of the world? This is something that cannot be taught, it must be perceived through our senses. A classic experiment in sensory experience actually comes from Plato, in a story called The Parable of the Cave, from his book Republic. Book VII begins with a dialog between Socrates, the first speaker, and Glaucon (taken from William Poundstone’s Labyrinths of Reason[13]):
And now, I said, let me show in a figure how far our nature is enlightened or unenlightened: -- Behold! Human beings living in an underground den, which has a mouth open towards the light and reaching all along the den; here they have been from their childhood, and have their legs and necks chained so that they cannot move, and can only see before them, being prevented by the chains from turning round their heads. Above and behind them a fire is blazing at a distance, and between the fire and the prisoners there is a raised way; and you will see, if you look, a low wall built along the way, like the screen which marionette players have in front of them, over which they show the puppets.
I see.
And do you see, I said, men passing along the wall carrying all sorts of vessels, and statues and figures of animals made of wood and stone and various materials, which appear over the wall? Some of them are talking, others silent.
You have shown me a strange image, and they are strange prisoners.
Like ourselves, I replied; and they see only their own shadows, or the shadows of one another, which the fire throws on the opposite wall of the cave?
True, he said; how could they see anything but the shadows if they were never allowed to move their heads?
And of the objects which are being carried in like manner they would only see the shadows?
Yes, he said.
And if they were able to converse with one another, would they not suppose they were naming what was actually before them?
Very true.
And suppose further that the prison had an echo which came from the other side, would they not be sure to fancy when one of the passersby spoke that the voice which they heard came from the passing shadow?
No question, he replied.
To them, I said, the truth would be literally nothing but the shadows of the images.
Now, let us modernize this story. Picture a prisoner, locked away in a cell, chained to a wall, for the prisoner’s entire life. Picture a cell with no windows, no doors, and no physical connection to the outside world. Add to this cell a television screen up on the opposite wall. The television is connected in such a way as to rotate as the prisoner rotates his head, so that the television is always the center of the prisoner’s vision. Now, connect this television to a camera somewhere. The prisoner’s entire reality would be what he sees through the television. He would have no idea anything was wrong if we were to place a mirror in front of the camera, reversing all images from left to right. If there was a time delay between what actually happened and when it appeared on the television screen, the prisoner would never be the wiser. In fact, we could even invert the image by flipping it upside down, and the prisoner would never realize that his reality was upside down. To him, his reality is correct. His entire life would be what he saw through this television.
Now, let us take this one step further. Suppose that this prisoner merely has a text monitor to view the outside world. All it would display is words. The prisoner’s entire reality would be based upon symbols that held no meaning whatsoever. Soon, if the prisoner were of exceptional intelligence, he would start to recognize patterns of symbols. He may soon start to realize that the symbol of two up-down bars connected by a straight bar, or H, often followed the singular up-down bar with the straight line at the top, or T. Given supreme intelligence and a lot of time, he may soon realize that the set of symbols “THE SUN IS RISING” appeared at regular intervals. Still, there is absolutely no way for the prisoner to extract meaning from that sentence. He has no idea what a sun is, or what it means to rise; he is merely matching patterns. Someone could sit down and describe the sun for all eternity, but there is still no way the prisoner would ever truly understand what a sun is without being able to experience it for himself.
If a human being could never understand anything without experiencing it first hand, how could we possibly expect a computer to? The traditional intelligent agent is nothing more than the prisoner with a text monitor. It is foolish to think a computer could possibly understand anything the way we do without everything that makes us human. The difference is that we have the ability to directly interact with our environment.
To quote Hubert
Dreyfus, “Intelligence requires understanding, and understanding requires
giving a computer the background of common sense that adult beings have by
virtue of having bodies, interacting skillfully with the material world, and
being trained into a culture.”[4]
Any solution, then, according to Dreyfus, to the general problem of creating an artificial intelligence that understands the world in the same manner that we do will, at least at some point, include two important aspects. First, a human-like body, or avatar, by which the computer will interact with the outside world, both by manipulating the environment and getting sensory feedback from it, must be created. The second necessary aspect of the solution, then, would be the training of the intelligence into a culture. What follows are our brief, largely unproven, and completely unimplemented proposals to solutions for these problems. These proposals are not meant to be seen as complete solutions to the problem at hand, rather as the next step in the evolution of our intelligent agent.
The problem of creating an avatar relies largely on the creation of a collection of devices by those who work in robotics. One of the major stumbling blocks in the creation of the avatar will reside in the creation of the sensory devices. It seems, speaking as an outsider to the field, fairly trivial to build the device to take in, for example, the ambient temperature. This data should be processed, at some level outside the control of the intelligent agent, to a representative signal denoting some qualitative term such as “hot,” “warm,” “cool,” “cold,” and the like. The basis of what this qualitative value should be would be relative. For example, after being in a 30 degrees Fahrenheit environment for a prolonged period of time, 65 degrees Fahrenheit would seem warm. Similarly, going from a 90 degrees Fahrenheit environment to 65 degrees would seem quite cool. Typically, however, most people consider a 65 degrees Fahrenheit environment comfortable. This is largely due to our internal temperature. In other words, the qualitative comfort level of the avatar in the ambient temperature can be defined as, or scaled to, the change in the internal temperature of the avatar.
It is important to remember in this example, however, that there are two sources of temperature fluctuation to be considered. The first is the ambient temperature, the second is the internal temperature of the avatar. As the environment can have an effect on the internal temperature of the avatar, the avatar would have a, real or assumed, effect on its own internal temperature. The internal temperature of our body is subject to the changes in temperature induced by our surroundings, as well as the heat it produces. The only time, then, that our scale’s value will reach zero, appropriately, will be when our bodies are producing the exact amount of heat that results in the optimal heat transfer between the environment and our bodies.
Perhaps what is needed at this point in the discussion is an illustration of how the variables influence one another. Consider that the optimal ambient temperature is the temperature such that the rate at which our body creates heat is the same as the rate at which we dissipate heat. If the ambient temperature were to be too high, the rate at which we create heat would become greater than our rate of heat dissipation, increasing our internal temperature. This increase of internal temperature would register as a positive value on our scale. Likewise, if the ambient temperature were too low, the rate at which we dissipate heat would be greater than the rate at which we create heat. This, appropriately, would decrease our internal temperature and become a negative value on our scale.
Intuitively, if the intelligence is to understand our experiences of the world in the same way we do, it would have to be subject to the way we take in information. We are more likely to notice the heat of the outdoors on a summer day and make a conjecture as to the approximate temperature, rather than knowing the temperature and guessing that it must be hot. Using the previous example, the greater the value on our scale, the higher temperature value we will guess. For instance, if the ambient temperature was extremely high, such that the rate at which we create heat is much greater than the rate we dissipate heat, we are going to guess the temperature is extremely high. Correspondingly, if we lose heat much faster than we create heat, we will guess the temperature is extremely low.
This leads, then, to the interpretation of the qualitative value by the agent. Typically, we react to the conditions of the environment in positive, negative, or neutral ways. Although the interpretation, or reaction, will be entirely context dependent, we may be able to derive yet another potentially useful figure from the relation. The rate of change of the previous scale would, in our model, determine the intensity to which the qualitative value is experienced. In more mathematical terms, it may be represented as the second derivative of the function of change. The more quickly the temperature changes, in our previous example, the more intense the avatar’s sensation thereof. After time, then, the intensity of the sensation would diminish, but the sensation itself would not.
In the life-world[1], the sensuous, experiential world that we dwell in, there exists no “degree.” A degree of temperature is a fictitious representation created by man for a more exacting representation of temperature. A temperature of “one-hundred and ten degrees Fahrenheit,” we are sure the reader will be relieved to find out, is merely an illusory device. In order, then, to grok in fullness “one-hundred and ten degrees Fahrenheit,” one must first grok the more elementary concept of “hot.” The converse of this statement, however, is not true. It is for this reason that we see subjecting the intelligent agent to our mode of taking in information is crucial. To do so effectively, we must concern ourselves with more than just the temperature.
Modeled after Marvin Minsky’s mind-society[11], our proposal would take advantage of as many sensory inputs as can be made available. To collect temperature data, as we said earlier, seems a simplistic task for the enterprising engineer. However, in order to simulate the sensory organ of “skin” we obviously need more than an input of the ambient temperature. We also need devices to detect degrees of roughness, degrees of sharpness, degrees of viscosity, and the like. Not unlike the agent described for the interpretation of the ambient temperature, agents would be created to interpret the stimuli, modeled as closely as possible to our own senses. The agents, then, could be combined in agencies to form the virtual skin, tongue, nose, ears, and eyes of our avatar. These senses, then, would all be agents of the body agent of the intelligence.
Using, and undoubtedly expanding, these relatively simple ideas to interpret raw data and convert it to a more qualitative format, we hope to be able to build a set of devices that will mimic the five human senses. Though this is far from a solution to the infinitely large tasks of, for example, machine vision and speech recognition, we hope to be able to obtain more general information about our environment. We are only searching for “bright” and “dark” as opposed to finding edges and the like. In our wildest speculations, we may hope that an intelligent enough system could advance beyond the rudimentary concepts of light and dark and begin to interpret images, but that is clearly outside the scope of our proposal.
Since, however, we are already writing of our wildest speculations, it may be beneficial at this point to document how we hope to see our intelligence evolve past very simplistic senses, such as “light” and “dark,” to a more intelligent agent. Although this path is extremely uncertain, we hypothesize that if an adequate method for the storage of an experience can be either found or developed, we will be able to exploit the agent’s memories of past experiences, coupled with a sloppy pattern-matching system, a simulation of gestalt can be achieved.
We are less concerned, at least initially, with the ability to manipulate the environment. Although we recognize that for a machine to have the ability for autonomous exploration of natural phenomenon, it will require the ability to interact with its surroundings. Until such time as it becomes feasible to create an, at least partially, mobile avatar for the intelligent agent, placing an object into the field of the sensors should be sufficient.
Essentially, the next few steps in the evolution of our intelligent agent will include:
1.) An array of sensory devices taking in information about the environment,
2.) A mathematical system for scaling quantitative data into a more qualitative format; and
3.) The eventual addition of autonomy for the intelligent agent’s avatar.
This ability to relate to the physical world will, as we stated, be the grounding that gives our agent the ability to relate to us, at least on a linguistic level. It seems plausible, then, that if we are able to successfully capture the gestalt of the physical world, it can be imported into other intelligences, without the need for each agent to go through this exploratory phase. The only foreseeable need to repeat the exploratory phase for other agents, then, would be to tweak the parameters of the system, so the qualitative values are slightly different for a different agent.
Largely, however, it does not matter what the qualitative values are. It is certainly arguable that when one individual sees the color blue, he is seeing what another would call red. The fact, however, that they both call it “green” can be completely ignored, assuming that they both consistently label it in accordance to the symbol table of colors previously defined. The importance, then, of being able to take in stimuli from the physical world lies within the ability to label the stimuli, opposing, of course, taking in a symbol and guessing what the stimuli must be.
The resolution, then, of our problem seems to depend not only upon those who are advancing the state of artificial intelligence, but also upon those in robotics. Although artificial intelligence theorists and engineers will undoubtedly advance the state of the art without the aid of an android in which to let their programs run, these intelligences could seemingly only hope to pretend to reach the same understanding of the world that we have. It is arguable, admittedly, that being subjected to the same methods of developing interpretations about the world around us would only hamper the progress of an intelligent agent. It is not difficult to imagine that an advanced intelligence would have no need to relate to our human experience. It is also not difficult to imagine that we would mistake the lack of relationship as unintelligent behavior on the behalf of the agent, and vice-versa. It seems, then, that at least an initial common ground, as it were, is essential.
The level of difficulty that the problem of creating an intelligent agent exhibits is often conducive to the development of less general and more practical solutions to smaller problems that fall under the large category of artificial intelligence. Many of these more practical solutions may provide us with answers to the more general problem of creating an intelligent agent, and almost always provide useful, alternative ways of seeing the problem. However, we have found that some of the concepts, like that of the micro-world, lead us to less than successful resolutions.
“Usefulness times generality equals a constant.” –Terry Winograd
The concept of a micro-world was first introduced to us through Dreyfus in his work, What Computers Can’t Do[4]. Micro-worlds can be thought of as a view of a problem where some or all of the context of lesser importance is discarded. A micro-world approach to natural language processing does not consider all the context of the situation that the sentence is spoken in. By throwing away context before we attempt to extrapolate meaning from a sentence, we dramatically decrease the possibility of successfully resolving the text. Again, to quote Dreyfus, “Intelligence requires understanding, and understanding requires giving a computer the background of common sense that adult beings have by virtue of…being trained into a culture.”[4] Using micro-worlds, then, as a method for interpreting the text dismisses the majority of the context. Therefore, any understanding reached about the world may only be valid within the scope of that micro-world. This is supported, according to Dreyfus, by Minsky who he quotes as writing:
Each model - or “micro-world” as we shall call it - is very schematic; it talks about a fairyland in which things are so simplified that almost every statement about them would be literally false if asserted about the real world.[4]
He continues to quote Minsky, saying:
Nevertheless, we feel that they are so important that we are assigning a large portion of our effort toward developing a collection of these micro-worlds and finding how to use the suggestive and predictive powers of the models without being overcome with their incompatibility with literal truth.[4]
Bearing all this in mind, we will look closer at how micro-worlds may have a negative effect on understanding natural language, but may also be essential to our understanding and modeling of other aspects of artificial intelligence.
By considering the connection between text and context to be nearly arbitrary, we may be again impeding our efforts to create a system that truly groks natural language. Natural language, no matter what the language may be, is strongly tied to the situation in which it appears. Some of the ties are seemingly simple, such as the intonation of a person’s voice, or their body language. Other examples of context are so apparent, however, they are often given little thought, such as the social position of the speaker and the listener, the medium of communication, and the very culture of those involved in discussion. Moreover, we have to give the agent some view of itself, defining some of these important social properties.
One somewhat famous AI project is Julia[10]. Julia is known as a “chatter bot,” created to attempt to pass the Turing Test. The Turing Test, created by Alan Turing, is a simple test devised as a way to test the “realness” of an artificial agent. A number of judges sit at computer monitors, talking to the person on the other side of the connection. The “intelligence” at the other end of the connection would either be human or artificial. If the majority of the judges believed an artificial agent was actually a human, then that artificial agent “passed” the Turing Test. One of the problems with this test is that it is far too simple; the test is generally limited such that the judge is only allowed to speak on one topic. So although Julia did well, she was merely a master of one topic, speaking coherently on that one topic for only a few minutes. This is a micro-world. “Usefulness times generality equals a constant.” Extremely useful if I wanted to know anything about cats (which was the topic Julia was programmed with), yet extraordinarily specific, which, in the search for artificial intelligence, is not remarkably useful.
As is the case with most micro-world applications, although they work extremely well for the domain that they were intended, natural language programs modeled after micro-worlds lack a certain essential robustness. Whereas being robust enough to understand the daily trials and tribulations of the user was never a key issue in the block-world of SHRDLU, an intelligent agent that fully groks language needs a sound understanding of the context of the real world.
It would seem, then, that micro-worlds cannot be readily applied to the problem of natural language understanding. Unlike the Society of Mind as proposed by Minsky[11], micro-worlds are not just pieces that can be assembled into a whole. Micro-worlds encompass the entire domain with total disregard for anything outside it. Successful systems, such as Terry Winograd’s SHRDLU, the Naval database LADDER, Julia, and such other programs that focus on a specific domain manipulate micro-worlds.
Micro-worlds, however, currently seem to be an essential tool in the field of artificial intelligence. For very domain-specific applications, micro-worlds are not only desirable, but they appear to be necessary. It would, indeed, be wasteful for the developers of a machine vision system intended to scan the Alaskan pipeline for cracks to give their device a working knowledge of politics. Micro-worlds currently give us the ability to look at otherwise completely ambiguous situations through a particular paradigm, thereby enabling us to make more practical strides toward a more intelligent agent.
For a natural language interpreter, however, a working knowledge of politics may be a necessity, depending on the application. Some natural language systems, such as LADDER, STUDENT, and the like, have little need to know anything about politics- they perform a specified task within a problem domain. Of course, if theirs is the domain of politics, the example falls apart. At any rate, Emily, and other intelligent agents like her, have a more lofty goal that they are trying to obtain. Robert Schank classifies the type of understanding that we want to achieve as “complete empathy.”[6] It is the case, then, that Emily does not need just to understand a few subjects really well, she needs to be able to be able to grok the human experience itself.
By the nature of the evolution of our project, we ended up with two separate and distinct parts. The first, practical. The second, then, is theoretical. What we hoped to have shown by both of these sections of the paper is that although the current paradigm of artificial intelligence is not failing by any means, there is still room to improve it.
The first half of this project dealt with the implementation of the natural language parser and generator. We have shown that it is at least worth looking into alternative development platforms and languages in the pursuit of the ever-elusive intelligent agent. Quite admittedly, neither the MOO nor Esperanto should be thought of as an optimal tool for solving these problems. However, we hope they serve as useful alternatives.
The second half of our project, then, should stimulate a return to exploring the underlying theoretical paradigm. We argue that there can be little progress in the advancement of the completely empathic intelligent agent without more advanced methods for gathering information about the outside world. This impeded progress, we state, is due to our inability to ground the symbols of any natural language to actual objects. Although much philosophical speculation and debate can be made over the issue, we have tried to shy away from that. Our proposal, then, for the next part of the project involves the creation of an array of sensory devices by which the intelligent agent will gain a presence in the physical world. We have seen, however, that in order to further progress the state of the art, we may need to reevaluate the very foundations of artificial intelligence and natural language processing.
By and large, the main thrust of this project was to attempt previously unexplored avenues in the development of an intelligent agent and evaluate their usefulness. In this respect, the experiment in artificial intelligence was a success. We first found these unorthodox modes of viewing the subject matter, implemented them, and have since evaluated them. Although, again, they may not be optimal, they are an alternative, and little else matters.
On the other hand, the experiment may also be viewed by some as a failure, as we were unable to put an intelligent agent behind the interface. To those who would agree with the above, we would again state that the thrust of this project was to look for alternate paradigms. Even though we have little to demonstrate for our research efforts, aside from a large number of conjectures, we have presented yet another proposed course for solving the problem at hand. Although it is largely hypothesis, it can be implemented, tested, and revised as necessary. Given the needed resources, we hope to do this ourselves within the near future.
The moral of this project, then, is to shake off the traditional paradigm every once in a while and explore the unknown. Although we make no claims that we have found an optimal solution to any of the problems encountered in natural language processing, or in artificial intelligence in general, we do put forth that we have at least come up with a unique “failure.”
1 Abram, David, The Spell of the Sensuous, Vintage Books: 1997
2 Artificial Intelligence Natural Language Processing FAQ: 1995
3 Brooks, Rodney, “Intelligence without Representation,” Artificial Intelligence, Volume 47, Pages 139-159, 1991
4 Dreyfus, Hubert L., What Computers Can’t Do: The Limits of Artificial Intelligence, Harper Colophon Books: 1979
6 Firebaugh, Morris, Artificial Intelligence: A Knowledge-Based Approach, PWS-Kent Publishing: 1989
7 Funk and Wagnalls Standard Desk Dictionary, Funk and Wagnalls, Inc. 1984
8 Hovy, Edward, Natural Language Generation in Artificial Intelligence and Computational Linguistics: Approaches to the Planning of Coherent Text, Kluwer Academic Publishers: 1991
10 Julia Papers (Moved to Unknown Location)
11 Minsky, Marvin, The Society of Mind, Simon & Schuster, Inc.: 1988
12 Moore, Johanna, and Swartout, William, Natural Language Generation in Artificial Intelligence and Computational Linguistics: A Reactive Approach to Explanation, Kluwer Academic Publishers: 1991
13 Poundstone, William, Labyrinths of Reason, Anchor Press: 1988
14 Reed, Ivy, Esperanto: A Complete Grammar, The Scarecrow Press, Inc: 1968
15 Winston, Patrick Henry, Artificial Intelligence, Addison-Wesly: 1993
Generic AI Bot Player
Class:
;;#138.("trace") = 0
;;#138.("connection") = #-1
;;#138.("host") = "127.0.0.1"
;;#138.("port") = 1234
;;#138.("controls") = {#2183, #3687}
;;#138.("monitors") = {}
;;#138.("hypothesis_list") = {}
;;#138.("rootrule") = #-1
;;#138.("mode") = 0
;;#138.("slow_mode") = 1
;;#138.("dictionary") = #148
@chmod #138."passwd" c
;;#138.("passwd") = 0
;;#138.("parse") = 0
@args #138:"_wake" this none this
@chmod #138:_wake xd
@program #138:_wake
if (!(player in this.controls))
return E_PERM;
endif
if (this.connection != #-1)
player:tell(this.name, " is already awake.");
return E_INVARG;
endif
this.connection = $network:open(this.host, this.port);
notify(this.connection, (("con " + this.name) + " ") + this.passwd);
.
@args #138:"monitor" this none none
@program #138:monitor
if (!(player in this.controls))
return E_PERM;
endif
if (player in this.monitors)
this.monitors = setremove(this.monitors, player);
player:tell("Removed from monitors list for ", this.name, ".");
else
this.monitors = setadd(this.monitors, player);
player:tell("Added to monitors list for ", this.name, ".");
endif
.
@args #138:"notify" this none this
@program #138:notify
if (this in $builtin:connected_players())
for each in (this.monitors)
each:tell(this.name, "->", @args);
endfor
endif
pass(@args);
.
@args #138:"@command" this to any
@program #138:@command
if (!(player in this.controls))
return E_PERM;
endif
for each in (this.monitors)
each:tell(this.name, "-> \"", args[3], "\" sent by ", player:title(), ".");
endfor
notify(this.connection, args[3]);
.
@args #138:"_sleep" this none this
@program #138:_sleep
if (!(player in this.controls))
return E_PERM;
endif
boot_player(this);
this.connection = #-1;
.
@args #138:"log" this none this
@program #138:log
player:tell(@args);
.
@args #138:"@trace" this none none
@program #138:@trace
this.trace = !this.trace;
this:log("Tracing toggled on by ", player:title(), ".");
.
@args #138:"substitute" this none this
@program #138:substitute
return args[1];
.
@args #138:"tell" this none this
@program #138:tell
if (this in $builtin:connected_players())
line = argstr;
if (this.parse)
null = this.dictionary:parse(line);
endif
endif
return pass(@args);
.
@args #138:"log_lines" this none this
@program #138:log_lines
for each in (args[1])
this:log(each);
endfor
.
@args #138:"wake" this none none
@program #138:wake
if (player in this.controls)
this:_wake();
else
player:tell(E_PERM);
endif
.
@args #138:"sleep" this none none
@program #138:sleep
if (player in this.controls)
this:_sleep();
else
player:tell(E_PERM);
endif
.
Esperanto Dictionary:
;;#148.("word_list") = { <LOTS of words go here... about 9 pages worth...> }
;;#148.("suffixes") = {"os", "as", "is", "in", "o", "a", "j", "n", "i", "e"}
;;#148.("prefixes") = {"mal", "bo", "vir"}
;;#148.("generic_parsed_word") = #6776
;;#148.("generic_parsed_phrase") = #6777
;;#148.("generic_parsed_sentence") = #6778
;;#148.("suffix_tuples") = {{"o", "noun"}, {"j", "plural"}, {"as", "present"}, {"is", "past"}, {"os", "future"}, {"e", "adverb"}, {"a", "adjective"}, {"i", "infinitive"}, {"n", "indirect"}, {"in", "feminine"}}
;;#148.("prefix_tuples") = {{"bo", "in-law"}}
;;#148.("suffixes_verbs") = {"os", "as", "is", "i"}
;;#148.("modifiers") = {"ne", "pli", "plej", "la"}
;;#148.("phrase_types") = {"verb", "prepositional", "noun"}
;;#148.("prepositions") = {"en"}
;;#148.("destroy") = 1
;;#148.("display") = 1
;;#148.("aliases") = {"Esperanto Dictionary"}
;;#148.("object_size") = {0, 0}
@args #148:"build_word_list" this none this
@program #148:build_word_list
words = {};
for each in (this.contents)
$command_utils:suspend_if_needed(0);
if ($object_utils:isa(each, #146))
words = setadd(words, each.name);
endif
endfor
this.word_list = words;
.
@args #148:"get_root" this none this
@program #148:get_root
target = args[1];
word = $recycler:_create(this.generic_parsed_word);
word.name = target;
word.aliases = {target};
word.prefixes = {};
word.suffixes = {};
if (target in this.word_list)
word.root = target;
return word;
endif
found_one = 1;
while (found_one)
found_one = 0;
for suf in (this.suffixes)
$command_utils:suspend_if_needed(0);
if ((length(suf) < length(target)) && (!found_one))
if ((target[(length(target) - length(suf)) + 1..length(target)] == suf) && (!(target in this.word_list)))
target = target[1..length(target) - length(suf)];
word.suffixes = setadd(word.suffixes, suf);
found_one = 1;
endif
endif
endfor
endwhile
if (target in this.word_list)
word.root = target;
word.aliases = setadd(word.aliases, target);
return word;
endif
found_one = 1;
while (found_one)
found_one = 0;
for pre in (this.prefixes)
$command_utils:suspend_if_needed(0);
if (length(pre) < length(target))
if ((target[1..length(pre)] == pre) && (!(target in this.word_list)))
target = target[length(pre) + 1..length(target)];
word.prefixes = setadd(word.prefixes, pre);
found_one = 1;
endif
endif
endfor
endwhile
if (target in this.word_list)
word.root = target;
word.aliases = setadd(word.aliases, target);
return word;
endif
word.protected_from_recycle = 0;
$recycler:_recycle(word);
return #-1;
.
@args #148:"get_phrases" this none this
@program #148:get_phrases
words = args[1];
words = setremove(words, #-1);
phrase_list = {};
"It should be noted that this.phrase_types is order-dependant!";
for type in (this.phrase_types)
while (valid(phrase = this:(("find_" + type) + "_phrase")(words)))
phrase_list = setadd(phrase_list, phrase);
words = setremove(words, phrase.root);
for each in (phrase.modifiers)
if ($object_utils:isa(each, this.generic_parsed_phrase))
words = setremove(words, each.root);
for wrd in (each.modifiers)
words = setremove(words, wrd);
endfor
else
words = setremove(words, each);
endif
endfor
endwhile
endfor
return {phrase_list, words};
.
@args #148:"parse" this none this
@program #148:parse
text = argstr;
sentence = this:get_sentence(this:get_phrases(this:get_roots(text)));
if (this.display)
sentence:print_self();
endif
if (this.destroy)
sentence.protected_from_recycle = 0;
$recycler:_recycle(sentence);
endif
.
@args #148:"get_roots" this none this
@program #148:get_roots
roots = {};
for each in ($string_utils:explode(argstr))
roots = setadd(roots, root = this:get_root(each));
if (valid(root))
for each in ({"suffix", "prefix"})
for tuple in (this.(each + "_tuples"))
if (tuple[1] in root.(each + "es"))
root.(tuple[2]) = 1;
endif
endfor
endfor
endif
endfor
return roots;
.
@args #148:"get_sentence" this none this
@program #148:get_sentence
sentence = $recycler:_create(this.generic_parsed_sentence);
sentence.name = "Sentence " + tostr(sentence);
sentence.phrases = args[1][1];
sentence.unresolved = args[1][2];
return sentence;
.
@args #148:"find_prepositional_phrase" this none this
@program #148:find_prepositional_phrase
preps = {};
words = args[1];
phrase = $recycler:_create(this.generic_parsed_phrase);
phrase.type = "prepositional";
phrase.name = "Prepositional Phrase " + tostr(phrase);
for each in (words)
if (valid(each))
if (each.root in this.prepositions)
preps = setadd(preps, each);
endif
endif
endfor
if (length(preps))
prepointerest = preps[1];
phrase.root = prepointerest;
if (valid(np = this:find_noun_phrase(words[(prepointerest in words) + 1..length(words)])))
phrase.modifiers = setadd(phrase.modifiers, np);
endif
return phrase;
endif
phrase.protected_from_recycle = 0;
$recycler:_recycle(phrase);
return #-1;
.
@args #148:"find_noun_phrase" this none this
@program #148:find_noun_phrase
nouns = {};
words = args[1];
phrase = $recycler:_create(this.generic_parsed_phrase);
phrase.type = "noun";
phrase.name = "Noun Phrase " + tostr(phrase);
for each in (words)
$command_utils:suspend_if_needed(0);
if (valid(each))
if (each:get_noun())
nouns = setadd(nouns, each);
endif
endif
endfor
if (length(nouns))
nounointerest = nouns[1];
phrase.root = nounointerest;
i = (nounointerest in words) - 1;
while ((i >= 1) && (words[i]:get_adjective() || (words[i].root in this.modifiers)))
$command_utils:suspend_if_needed(0);
phrase.modifiers = setadd(phrase.modifiers, words[i]);
i = i - 1;
endwhile
return phrase;
endif
phrase.protected_from_recycle = 0;
$recycler:_recycle(phrase);
return #-1;
.
@args #148:"find_verb_phrase" this none this
@program #148:find_verb_phrase
verbs = {};
words = args[1];
phrase = $recycler:_create(this.generic_parsed_phrase);
phrase.type = "verb";
phrase.name = "Verb Phrase " + tostr(phrase);
for each in (words)
if (valid(each))
if (each:get_verb())
verbs = setadd(verbs, each);
endif
endif
endfor
if (length(verbs))
verbointerest = verbs[1];
phrase.root = verbointerest;
i = (verbointerest in words) - 1;
while ((i >= 1) && (words[i]:get_adverb() || (words[i].root in this.modifiers)))
phrase.modifiers = setadd(phrase.modifiers, words[i]);
i = i - 1;
endwhile
return phrase;
endif
phrase.protected_from_recycle = 0;
$recycler:_recycle(phrase);
return #-1;
.
Generic Agent:
@args #136:"call" this none this
@program #136:call
"A generic interface for all agents.";
"I hope.";
"The only argument should be which bot's making the call. I might find a";
"way later to do it through more finessed means, but let's just use what we";
"have now, shall we?";
"The return value, however, should always be a set of possiblities, which";
"will in turn be evaluated later by the calling agency.";
return {};
.
Generic Rule:
;;#140.("conditions") = {}
;;#140.("result") = #-1
;;#140.("unless") = {}
;;#140.("providing") = {}
@args #140:"print_self" this none this
@program #140:print_self
player:tell("Rule object: ", this.name, " (", this, ")");
for each in ({"conditions", "result", "unless", "providing"})
player:tell(" ", each);
if (this.(each))
if (typeof(this.(each)) == LIST)
for directive in (this.(each))
player:tell(" ", directive[1], " ", directive[2], " ", directive[3]);
endfor
elseif (typeof(this.(each)) == OBJ)
player:tell("--Object--");
this.(each):print_self();
else
player:tell(" ", this.(each));
endif
else
player:tell(" No directives in this section.");
endif
endfor
.
@args #140:"look_self" this none this
@program #140:look_self
return this:print_self();
.
Generic Frame:
@args #133:"print_self" this none this
@program #133:print_self
player:tell("Memory object: ", this.name, " (", this, ")");
if (properties(this))
for each in (properties(this))
player:tell(" .", each, "==> ", this.(each));
endfor
else
player:tell(" This memory object currently has no assertions.");
endif
if (parent(this) != #1)
parent(this):print_self();
endif
.
Generic Word:
;;#146.("links") = {}
Generic Parsed
Sentence:
;;#6778.("phrases") = {}
;;#6778.("unresolved") = {}
@args #6778:"recycle" this none this
@program #6778:recycle
for each in (this.phrases)
$command_utils:suspend_if_needed(0);
each.protected_from_recycle = 0;
$recycler:_recycle(each);
endfor
for each in (this.unresolved)
$command_utils:suspend_if_needed(0);
if (valid(each))
each.protected_from_recycle = 0;
$recycler:_recycle(each);
endif
endfor
.
@args #6778:"print_self" this none this
@program #6778:print_self
indent = 0;
if (length(args))
indent = tonum(args[1]);
endif
sp = $string_utils:space(indent, " ");
player:tell(sp, $string_utils:nn(this));
player:tell("Consists of:");
player:tell(sp + " ", "Phrases:");
for each in (this.phrases)
each:print_self(indent + 4);
endfor
player:tell(sp + " ", "Unresolved:");
for each in (this.unresolved)
if (valid(each))
each:print_self(indent + 4);
endif
endfor
player:tell(sp, "End of sentence.");
.
Generic Parsed Phrase:
;;#6777.("type") = ""
;;#6777.("root") = #-1
;;#6777.("modifiers") = {}
@args #6777:"print_self" this none this
@program #6777:print_self
indent = 0;
if (length(args))
indent = tonum(args[1]);
endif
sp = $string_utils:space(indent, " ");
player:tell(sp, $string_utils:nn(this));
player:tell(sp + " ", "Phrase root:");
this.root:print_self(indent + 4);
player:tell(sp + " ", "Phrase modifiers:");
for each in (this.modifiers)
each:print_self(indent + 4);
endfor
player:tell(sp, "End of phrase.");
.
@args #6777:"recycle" this none this
@program #6777:recycle
if (valid(this.root))
this.root.protected_from_recycle = 0;
$recycler:_recycle(this.root);
endif
for each in (this.modifiers)
$command_utils:suspend_if_needed(0);
each.protected_from_recycle = 0;
$recycler:_recycle(each);
endfor
.
Generic Parsed Word:
;;#6776.("root") = ""
;;#6776.("prefixes") = {}
;;#6776.("suffixes") = {}
;;#6776.("noun") = 0
;;#6776.("plural") = 0
;;#6776.("present") = 0
;;#6776.("past") = 0
;;#6776.("future") = 0
;;#6776.("adverb") = 0
;;#6776.("adjective") = 0
;;#6776.("infinitive") = 0
;;#6776.("in-law") = 0
;;#6776.("indirect") = 0
;;#6776.("feminine") = 0
@args #6776:"print_self" this none this
@program #6776:print_self
indent = 0;
if (length(args))
indent = tonum(args[1]);
endif
sp = $string_utils:space(indent, " ");
player:tell(sp, $string_utils:nn(this));
player:tell(sp + " ", "Word root:");
player:tell(sp + " ", this.root);
for bah in ({"prefixes", "suffixes"})
player:tell(sp + " ", ("Word " + bah) + ":");
for each in (this.(bah))
player:tell(sp + " ", each);
endfor
endfor
player:tell(sp, "End of word.");
.
@args #6776:"get_*" this none this
@program #6776:get_
target = verb[index(verb, "_") + 1..length(verb)];
if ($object_utils:has_property(this, target))
return this.(target);
else
if (target == "verb")
return ((this.past || this.present) || this.future) || this.infinitive;
endif
endif
.