*** Welcome to piglix ***

Multimodal interaction


Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data. For example, a multimodal question answering system employs multiple modalities (such as text and photo) at both question (input) and answer (output) level.

Multimodal human-computer interaction refers to the "interaction with the virtual and physical environment through natural modes of communication", i. e. the modes involving the five human senses. This implies that multimodal interaction enables a more free and natural communication, interfacing users with automated systems in both input and output. Specifically, multimodal systems can offer a flexible, efficient and usable environment allowing users to interact through input modalities, such as speech, handwriting, hand gesture and gaze, and to receive information by the system through output modalities, such as speech synthesis, smart graphics and others modalities, opportunely combined. Then a multimodal system has to recognize the inputs from the different modalities combining them according to temporal and contextual constraints in order to allow their interpretation. This process is known as multimodal fusion, and it is the object of several research works from nineties to now. The fused inputs are interpreted by the system. Naturalness and flexibility can produce more than one interpretation for each different modality (channel) and for their simultaneous use, and they consequently can produce multimodal ambiguity generally due to imprecision, noises or other similar factors. For solving ambiguities, several methods have been proposed. Finally the system returns to the user outputs through the various modal channels (disaggregated) arranged according to a consistent feedback (fission). The pervasive use of mobile devices, sensors and web technologies can offer adequate computational resources to manage the complexity implied by the multimodal interaction. "Using cloud for involving shared computational resources in managing the complexity of multimodal interaction represents an opportunity. In fact, cloud computing allows delivering shared scalable, configurable computing resources that can be dynamically and automatically provisioned and released".

Two major groups of multimodal interfaces have merged, one concerned in alternate input methods and the other in combined input/output. The first group of interfaces combined various user input modes beyond the traditional keyboard and mouse input/output, such as speech, pen, touch, manual gestures, gaze and head and body movements. The most common such interface combines a visual modality (e.g. a display, keyboard, and mouse) with a voice modality (speech recognition for input, speech synthesis and recorded audio for output). However other modalities, such as pen-based input or haptic input/output may be used. Multimodal user interfaces are a research area in human-computer interaction (HCI).


...
Wikipedia

...