Using Supercomputers to Provide Faster Access to Big Data Analyses

Massive Increase in Scope of Big Data Analyses

Big Data analysis in industry is steadily transitioning from the verification phase to the implementation and commercial phase, accompanied by a rapid expansion in the breadth and variety of data sets used for analysis. More and more businesses are turning to terabyte-scale data analysis. But large-scale analyses in the order of tens or hundreds of millions of customers can be very slow, even with the most powerful servers.

Most businesses will make do by using a reduced data set extracted for analysis, but one-to-one marketing and plant error detection, for example, often needs the entire data set, thus requiring a faster analysis platform.

Huge-volume Data Tabulation Time Reduced from Seven Days to Five Hours

Fujitsu has developed a parallel processing system with a CPU of over 1,000 cores designed for tabulation-intensive analysis tasks such as statistical processing and machine learning associated with the Big Data analysis and curation service.

The system employs HPC (high-performance computing) technology developed for the K computer, among others, and delivered through dedicated Big Data HPC clusters at Fujitsu data centers.

With multiple computers linked by a super-fast network and tabulation tasks allocated efficiently in accordance with server availability, the system is able to execute terabyte-level data-heavy analysis tasks up to 30 times faster than current technology. Computations involving enormous data sets that currently take around seven days can now be completed in as little as five hours.

Processing time for data-heavy analysis tasks

The system can be used to perform Big Data analyses and produce valuable predictions and insights across a range of industries. Examples include: forecasting health risks for the entire Japanese population of 127 million people, generating new marketing indices, predicting membership cancellations, developing sales and inventory forecasts, identifying loyal customers and predicting trends in inbound call center traffic.

Fujitsu will use the new data curation service to boost prediction accuracy for supplied formulae using more analysis repetitions and larger data sets.