New Value Generated by Big Data (Part 1)

Today, as a result of the widespread use of the Internet and the evolution of technologies such as sensing networks, several hundred terabytes (approximately 1 trillion bytes) of data are being generated worldwide. In 2020, the amount of data is set to increase to 40 zettabytes (1 zettabyte is 1 billion times 1 trillion) - a figure that is extremely difficult for the human mind to even comprehend (Source: Report from a Survey on the Digital Universe Commissioned by IMC for IDC, a U.S.-based Survey Company).

Commonly known as Big Data, such data is attracting global attention as a potential resource that may generate new value when it is properly analyzed. So how do we analyze massive amounts of data to achieve ground breaking innovation? We asked Mika Kawai and Isamu Watanabe to discuss the current state of Big Data and its prospects for the future. Both of the people we interviewed are leading figures in the Big Data field. Ms. Kawai is Director of the Fujitsu Big Data Initiative Center, and Mr. Watanabe is a member of the Big Data Initiative Center and General Manager of the Fujitsu Laboratories Second Solution Research Division.

Could you explain the concept of Big Data?

All over the world today, there are high expectations that the utilization of data will bring about innovation in business. This is partly due to the emergence of companies like Google and Amazon, which deal with massive amounts of data. In fact, in the era of the Internet of Things (IoT) where everything is connected through the Internet to enable monitoring and control, people and companies will use data in a wider social context than ever before.

Along with the data that society generates daily, companies also have accumulated large amounts of data over many years. These databanks may or may not be big enough to call "big" data. But at any rate, there are data utilization techniques like business analytics (BA) and business intelligence (BI) to handle the data. Even today, however, companies don't use most of this data; in many cases, there's no chance of it being used for business management. There's a public demand for reviewing this wealth of data in society and companies, in order to use it more effectively. When we use the term "utilization of big data," that's what we're referring to.

During business negotiations, we often still have customers who say that they have no such "big" data. They put too much emphasis on bigness, so they end up thinking that they have no massive amounts of data that could compare to Google or Amazon.

At any rate, apart from how "big" their data is, the fact is that company-owned data is left totally unused in many cases. In other words, companies are holding onto a wealth of unused data, but the business owners aren't even aware of its value.

When we visited one of our customers, they presented us with a data set as the only data available at the company. The data itself seemed to be not much use, indeed. However, by focusing on the information that was the source of the data and by analyzing the data in combination with that information, we were able to turn the data into a valuable source of innovation. The value of data varies greatly depending on who analyzes it from what perspective. The customer, actually, was the most surprised at the results of our analysis. The most important thing is what we intend to use the data for.

What is "Big Data?" It depends on the person, really. Reviewing any existing data that is left unused in your own company is probably the best way to start utilizing data (Watanabe).

How does Big Data analysis lead to the creation of new markets?

In the not-so-distant future, we will see the advent of a Big Data society in the true sense of the term. That's because the Internet of Things (IoT) will soon spread throughout our entire society at an unprecedented pace. According to the numbers as of 2013, 10 billion devices were connected to the Internet; the number is predicted to exceed 50 billion in 2020.

Even today, when we're online shopping, we already see books and products automatically recommended before we start searching. This means that information as to who bought what and when at what price is stored as data. Gathering such data on hundreds of thousands and millions of people will entirely change the meaning of the data. Analyzing the trends for a product by generation, gender, and occupation makes it possible to predict what products individuals will buy next. Analyzing massive amounts of group data enables us to reach individual customers.

However, the most important thing about predicting demand is whether the predictions actually increase sales and enhance business performance. There have always been attempts to predict demand from data. However, predicting what will occur in the future doesn't always give clear results. So we can't deny the fact that predictions sometimes fail. The best we can do is to raise the probability of making correct predictions.

Demand prediction is closely related to decision-making regarding order placement and production. That's why - in the current way of doing things up to now - demand predictions are often made by on-site experts with deep knowledge in their field of specialty. On-site experts have unique skills in predicting demand, based on the experience and intuition they've developed over many years. What is important is skilled experts and their experience.

The question is, how can ordinary people without much experience become able to perform this work at a level close to the experts? This is why companies are trying to substitute data for long-term expert knowledge and skills in a variety of areas.

There's no way that this kind of experience and intuition could ever be replaced by numbers. But if we can use data to visualize it in a way that anyone can understand, we'll be able to use big data as a back-up for that kind of experience and intuition - for example, in predicting demand or in forecasting the probability of breakdowns in industrial products. And that's going to increase the accuracy of business predictions.

Allow me to give you an example. They do strength tests for tunnels and other railroad facilities all over Japan. In the final analysis, these tests depend on engineers doing manual operations. They hit railroad rails with tools like hammers, and listen to the sounds to measure the strength of the rails, and how durable they are. There's an initiative underway now to gather and analyze the audio-visual data on their work, to predict the strength and durability of tunnels based on the data. If we can increase the quantity and quality of the data gathered, we may be able to improve the efficiency and accuracy of the engineers' work in the future.

They did sample surveys to gather data in the past. The thing is, unfortunately, that the statistics they got from the data they collected were averages and emergency data values. In contrast, Big Data is gathered from all workers. So, it becomes possible to calculate not just averages, but abnormal values as well, along with trends for parts failures, and even values for individual engineers' performance. One of the reasons why Big Data is getting used in businesses today is the rapid advancement of ICT and other technologies for collecting and storing data (Kawai).

Why is Big Data attracting so much global attention?

To understand the nature of Big Data, you need to understand the three-V concept - Variety, Volume, and Velocity.

"Variety" means, literally, combining a variety of multiple pieces of data. Let's say you're predicting the sales of a specific product being sold at a supermarket. The accuracy of your prediction varies greatly depending on how the prediction is being made. Is it just based on past sales information? Or are you combining and analyzing a large variety of data available at the current moment. This might include what kind of fliers the competing stores are giving out in the neighborhood, as well as the demographics of the target area.

The word "volume" basically means physical capacity or amount. However, here it refers to the amount of data or information. The capacity of hardware (memory media) that stores data has increased dramatically over the past few years, while the hardware itself became available at lower prices; as a result, the amount of information that can be stored in hardware has increased enormously.

"Velocity" refers to the speed and frequency at which data is gathered. For example, weather forecasts are based on observation data like daily temperatures, hours of sunshine, and rainfall. This data used to be gathered every ten minutes at average intervals of 20 kilometers. However, the most advanced high-resolution radars have made it possible to observe data every minute at average intervals of 250 meters. The amount of information gathered for each 250-meter square is about 10,000 times the amount of information gathered for each 20-kilometer square. As well as that, the frequency has also increased tenfold, from 10 minutes to one minute. So, the two sets of data differ by about 100,000 times. These increases in the speed and frequency you can gather data means it's possible to forecast extraordinary weather events of short duration in local areas, such as torrential downpours in urban areas.

This Three-V concept makes it clear that Big Data doesn't simply refer to the size of the data. Some people complain that Big Data technologies are making traditional expert skills obsolete. However, it's the on-site experts who really understand the value of visualizing the kinds of experience and intuition that could never be visualized like this before. The amount of information available in society just keeps on increasing. However, the real opportunity we have with Big Data is analyzing massive amounts of data to optimize things more accurately than ever before (Watanabe).

So, now we've got a basic grasp of the concept of Big Data. In the second part, we'll ask our two interviewees to discuss issues that need to be resolved to use Big Data for social innovation, including Fujitsu's initiatives for Big Data utilization.

New Value Generated by Big Data (Part 2)