Using AI to Summarize News Stories to Speed Up Internet News Delivery

Accurately Summarizing Articles Requires a Human Eye

Newspapers are part of our daily lives. In recent years, newspaper companies have come to distribute information in forms beyond traditional paper media, such as Internet news articles and social media posts. Reporters cover a wide variety of topics, including politics, economics, international issues, education, culture, and sports. In modern times, speedy information delivery is a necessity, so reporters are continuously training themselves to further speed up information delivery.

The process of distributing news to (non-paper) media involves five main steps: deciding which articles to deliver, sending them to a media editing system, summarizing the articles, creating headlines, and proofreading (*). The most time-consuming step is summarizing the articles. Because different media have different limits on the number of words or characters that can be used, news companies must summarize each article by extracting important sentences with a pair of human eyes, which takes time and effort.

The conventional automated summarization method is called the "lead method." Since this method summarizes articles by extracting sentences starting from the top of the target article until it reaches the word or character count limit, in some cases important sentences are not extracted because they appear in the middle or at the end of the article. Automatically summarizing articles quickly while extracting the important parts--this is a specialty of AI.

  • *: An example of the process of the Shinano Mainichi Shimbun's news distribution service for cable TV

Achieving Accuracy Equivalent to Manual Work by Combining Fujitsu's Unique Technology with AI

Recently, the Shinano Mainichi Shimbun and Fujitsu Laboratories conducted a field trial of an AI-based automated summarization technology using articles distributed by the Shinano Mainichi Shimbun.

From among the various types of article distribution services, this field trial placed particular focus on summarizing text-based news for distribution to cable TV. Fujitsu created a model by applying natural language processing technology and machine learning technology developed by Fujitsu Laboratories to approximately 2,500 sets of past articles from the Shinano Mainichi Shimbun and their manually compiled summaries for use in the distribution service.

First, indicators were determined: "sentences including important words," "sentences close to the article's beginning or end," and "similar sentences that appear repeatedly" were regarded to be important. Extra points were given to sentences beginning with phrases such as "That is to say" or "In other words," while points were subtracted for sentences beginning with phrases such as "For example." By applying machine learning to such indicators, Fujitsu created an important sentence extraction model that evaluates content's importance on the level of individual sentences. This enables important elements to be extracted from the middle or end of an article as well as facilitates the generation of a highly accurate summary equivalent to one that has been prepared manually.

Example of summary generated automatically by extracting important sentences from the other parts in addition to the top of the article

Instantly Completing a Task that Used to Take 3 to 5 Minutes

During the trial, article summarization that used to take 3 to 5 minutes was completed instantaneously. This summarization system is expected to reduce the time required for the entire process by about half.

This automatic summarization system enables automatic summary models to be applied to various media, including text-based news distribution for cable TV, news tickers on electronic billboards, and social media. Going forward, Fujitsu aims to make this technology available as a general-purpose AI function through an API offered on the Fujitsu Cloud Service K5 Zinrai Platform Service for diverse industries, including municipalities, manufacturing, transportation, medicine, and media.