Technology that Identifies at a Glance the Place Being Explained within Presentation Materials

It is difficult to identify the place being explained within presentation materials from a remote location

There are many opportunities when a speaker explains materials displayed on a screen in company meetings or other presentations. Recently, opinions are often exchanged based on materials shared with participants in remote locations using a network. This being the case, it is necessary to communicate so that listeners understand the materials quickly, clearly, and easily.

Pointing methods, such as using a mouse cursor, are available to point “the area in the presentation material being discussed” or “the part to be emphasized by the speaker.” However, a mouse cursor is small and difficult to see, and it is also difficult for remote listeners to understand the place where the presenter manually points with his/her finger. There is a conventional method to “extract words with a high frequency of use from text being read aloud.” However, it is difficult to identify at a glance the place being explained based on only a few spoken words from the speaker. Also, with current speech-recognition technologies, a misrecognition rate of up to 10% is unavoidable.

Real-time speech-recognition technology

To improve the efficiency of work-related communications, Fujitsu Laboratories has been working to develop a system for supporting communication involving text materials that uses speech-recognition technology to recognize in real time what the speaker is saying in order to provide the appropriate information. Fujitsu has developed technology that compares what a speaker is saying with text materials shared in videoconferences and accurately detects in real time the place in the materials being explained.

A challenging aspect of speech recognition is that since the pronunciation and tone of many short words are similar, there is an increased likelihood that errors in recognition will occur. Fujitsu solved this problem by combining these short words with the words located in their immediate proximity and storing them in a speech-recognition dictionary as single words. Doing so has reduced recognition errors by roughly 60% compared to previous technologies*. When the content being discussed exceeds a certain distance from a point in the materials, the frequency that the spoken presentation transitions to that place drops precipitously. Using this characteristic, this technology is able to filter the candidates for the next part of the presentation, and can accurately infer a correspondence with the spoken presentation, even with only a few spoken words being recognized.

*:When compared to previous technologies of Fujitsu Laboratories

How characteristics of presentation sequence and word frequency are used to infer spot in presentation

Useful communication-support system for remote conferences

Applying this technology, Fujitsu prototyped and evaluated an "automatic pointing system" that highlights the section of the materials corresponding to the spoken explanation, for use with shared slide materials in a videoconference. Using this technology has boosted the detection accuracy to 97%, up from the previous 70%**. When evaluated in comparison to existing pointing methods, such as using a mouse cursor, this technology was found to increase ease of understanding by 30% and cut irritating display issues in half, demonstrating its usefulness as a communication-support system for remote conferences.

Fujitsu Laboratories aims to have a practical implementation of this technology in a remote communications-support system, and is advancing verification tests with universities and other learning institutions for use in education. In addition, when combined with the sightline-detection technology and translation technology developed by Fujitsu, this technology has a broad range of potential applications to help businesses run more efficiently, such as supporting operators in call centers by providing information related to frequently asked questions, or providing information-desk support or educational support.

**:When settings were made to display the information to be emphasized within roughly two seconds from the start of an explanation.

The automatic pointing system being used in a remote conference