Artificial Intelligence Techniques as Enablers for the Web

Institut de Sciences de la Matière et du Rayonnement
GREYC
6, boulevard du Maréchal Juin
F-14050 CAEN, FRANCE

Teleconferencing

Real time teleconferencing -- audio or text -- was one of the first features of the Internet that was broadly covered by the mass media. Various instances exist and the most popular is the Internet Relay Chat (IRC).

IRC notably consists of forums you can connect to and participate in to discuss ideas, make friends, find information, etc. The number of forums is now huge and it makes it difficult for a novice user to find the one that can potentially interest her/him. I still have a bad memory of the first time I tried it. After connecting and experimenting some commands, I got stuck into one of the discussion groups. Some people detected my presence and sent me messages asking me to identify myself. I found myself unable to select the right command to move back and I had to zap the window to escape. I swore then I would not touch it again, even with a pole.

Later, when I heard about the WWW5 Artificial Intelligence workshop, I attempted to connect to the discussion forum that was created for it. This forum was organized as a set of conference rooms. You select one room or another according to the discussion topic that is posted in the main hall. I hopelessly tried commands to get the room list to proceed. But unable to go further, I remained in the lobby.

Virtual Reality

Virtual reality (VR) is rapidly getting one of the core components of information highways. This emergence is largely due to the wide acceptance of the VRML standard. Virtual reality enables a user to visualize and interact in real time with computer data.

Virtual reality systems rely much on the performance of their hardware and software components. However, most of these components are not yet affordable for many of the users. And the best choice for the base components is still a matter of debate.

Some people think that many devices are necessary to bring a reality feeling: Exoskeleton, gyroscopic goggles, haptic sensors, etc. I believe that the very core does not lie in these devices. As noted by Philippe Quéau [1], virtual reality simply brings the cognition power of images to the mere users. It enables to transform gigabytes of floating point numbers or any complex data set into a tractable sketch.

Teleconferencing and Virtual Reality

Virtual reality was immediately spotted as a mean to improve the understandability of Internet forums. It resulted into the re-creation of meeting rooms or more complex scenes using virtual environments. There, participants are embodied within these virtual worlds using more or less realistic 3-D icons. They can move about or get from one virtual room to another. As a result, users could immediately realize the complexity of a situation.

Provided you have enough of graphic processing power on your desktop to render 3-D scenes, the idea is appealing. The counterpart is that it is much more difficult to interact with the interface. This interaction is even almost impossible for a novice. The first time I tried to move in virtual worlds, it didn't take long before I got seasick, my ``body'' upside-down, crying for mercy.

Navigation

In investigating teleconferencing and virtual reality for a Computer Supported Cooperative Work project, we found that navigation was a big difficulty [2]. We tried to alleviate it by adding an agent understanding spoken natural language commands and acting consequently within a virtual environment on behalf of the user. We built a virtual world Ithaque using the DIVE environment and we collected a corpus of spoken interactions. We implemented a conversational agent with navigation capabilities and we incorporated it within the user's embodiement to help her/him navigate within the world.

The Ithaque World

Ithaque's agent structure is similar to that of many other interactive dialogue systems. It features speech recognition and speech synthesis devices, a syntactic parser, semantic and dialogue modules. In addition, this agent has a reference resolver that works in coordination with the user's gestures enabling her/him to name and finger objects, a geometric reasoner to understand the world, and an action manager to bring the user in a relatively continuous motion where she/he wants to go.

Conversational Agents

Some people predict that social groups made of people and agents will populate the Web under the form of virtual shops, campuses, etc. However, a Web populated with agents unable to converse could be a nightmare. Icons, or any other representations, have not always the potential to clarify at the first sight the information or faculties they embody. They must be augmented with new interaction capabilities and paradigms combining gesture and voice. In the Ithaque project, we are addressing these paradigms to make the user feel more comfortable with her/his avatar -- embodiment.

Conversational agents are a subject of active research. Other projects are addressing issues similar to ours. Most of them are not specifically aimed at the Web yet. They range from the control of a disk changer to flying with an air fighter:

Persona at Microsoft research;
TRAINS at the University of Rochester;
Nautilus at the Navy research laboratory;
Diverse at the Swedish Institute of Computer Science;
AnimNL at the University of Pennsylvania.

Basically, these agents should be able to respond to natural language commands, therefore have syntactic capabilities, and beyond, implement various sorts of faculties. Faculties are not the same for all the agents. They should enable, for instance, navigation in virtual worlds or search for a specific page. Many of them however resort to language understanding and to reasoning: Syntax, planning, time and geometric reasoning, learning, intention, etc. Such techniques are currently assembled under the name of Artificial Intelligence.

In conclusion, I believe that these possibly malformed compounds, artificial intelligence, as criticized by John Searle [3], virtual reality, a paradoxical oxymoron, and linguistics, that sometimes has simply been a subject of dogmas and doctrines (see for instance [4], Introduction), could be the enablers of a Web for the masses.

References (Hyperlinks are embedded withing the text)

Ph. Quéau (1996) Entretien, Le Monde Informatique, 664:24-24.
C. Godéreaux, P.O. El Guedj, F. Revolta, & P. Nugues (1996) Un agent conversationnel pour naviguer dans les mondes virtuels, Humankybernetik, 1:39-51.
J. Searle (1980) Minds, Brains, and Programs, Behavioral and Brain Sciences, 3:417-424.
I. Mel'cuk (1988), Dependency Syntax:Theory and Practice, State University of New York Press.