Mitsubishi Electric Corporation (TOKYO:6503) announced today that the company has developed what it believes to be the world’s first technology capable of highly natural and intuitive interaction with humans based on a scene-aware capability to translate multimodal sensing information into natural language. The novel technology, Scene-Aware Interaction, incorporates Mitsubishi Electric’s proprietary Maisart® compact AI technology to analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language.
The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR. To prioritize these different categories of information, Mitsubishi Electric developed the Attentional Multimodal Fusion technology, which is capable of automatically weighting salient unimodal information to support appropriate word selections for describing scenes with accuracy. In benchmark testing using a common test set, the Attentional Multimodal Fusion technology used audio and visual information to achieve a Consensus-Based Image Description Evaluation (CIDEr) score that was 29 percentage points higher than in the case of using visual information only. Mitsubishi Electric’s combination of Attentional Multimodal Fusion with scene understanding technology and context-based natural language generation realizes a powerful end-to-end Scene-Aware Interaction system for highly intuitive interaction with users in diverse situations.
Scene-Aware Interaction for car navigation, one target application, will provide drivers with intuitive route guidance.
For the full text, please visit: www.MitsubishiElectric.com/news/