The Personal Information Protection Act Amended for the First Time in 12 Years to Promote the Use of Big Data through De-Identification of Personal Information

Improving the Quality of New Products and Services Using Personal Information as Big Data

Personal information is close to us all. Personal information means information about a living individual that can identify specific individuals by name, date of birth or other descriptions. The Act on the Protection of Personal Information (Personal Information Protection Act) was officially announced in 2000 and fully implemented in 2005, which became a major topic.

Recently, the Personal Information Protection Act has been revised for the first time in a while and is expected to go into effect on May 30, 2017 as the Amended Act on the Protection of Personal Information. This amendment mainly outlines "clarifying the definition of personal information" and "promoting the use of big data."

Of particular note is that it will become permissible to provide third parties with personal information that has been de-identified, even without the individual's consent. For example, in marketing, by utilizing passenger traveling data collected by public transportation systems, research institutions and advertising/event management companies can manage events safely and efficiently, while taking into account accidents and disasters. In this way, under the Revised Personal Information Protection Act, it becomes possible to flexibly improve the quality of new products and services through different organizations working together.

Providers of personal data must be prepared for the risks associated with de-identification processing, such as checking whether they have met the guidelines for each industry, or if an individual could be identified from the de-identified data. It is not easy, however, for data providers to evaluate the risk that an individual could be identified from de-identified data, so evaluation and confirmation were previously left to manual work by experts, and the time required became an issue. There are reports, for example, of cases where it took more than six months to prepare de-identified data held by healthcare institutions outside Japan for use in medical research.

The reason the evaluation of risks is not easy is that even if the name is deleted, an individual could be identified by combining other attributes, such as height and age (Figure 1). Also, as it takes a huge amount of time to calculate many combinations of attributes, it has been difficult to search for attribute combinations in a short time.

(Figure 1) Identifying individuals through combinations of multiple attributes

For these reasons, Fujitsu Laboratories decided that, in order to quickly evaluate the risk and take countermeasures, it is important to analyze the attributes (such as gender, telephone number, and address) that make it easiest for an individual to be identified, and then apply appropriate de-identification methods.

Industry's First Technology that Reduces the Time for Identifying Individuals by Eliminating a Huge Amount of Attribute Combination Calculation

Based on data distribution, Fujitsu Laboratories has developed the industry's first technologies to automatically search for combinations of attributes that make it easiest to identify individuals, as well as quantifying that ease of identification, in a realistic timeframe (Figure 2).

The first technology developed is a technology for efficiently analyzing privacy risks by extracting the attributes that should be assessed and prioritizing from the combinations of attributes. For example, a record that could be identified with just "age and occupation" could naturally also be identified from "age, occupation, and permanent address," so analysis of the latter combination can be omitted. This eliminates the need to calculate huge numbers of combinations of attributes. The second is a technology that searches for combinations of attributes in the data that make it easiest to identify individuals, and that can quantify that difficulty level to compare the ease of identification by individual. This makes it possible to quickly see which attributes should be prioritized for de-identification.

(Figure 2) Newly Developed Technology

Based on these two technologies, Fujitsu Laboratories developed technology to calculate potential damages if data is leaked *, as well as to determine compliance with the various de-identification guidelines. With these technologies, users can evaluate broad personal information-related risks, and easily carry out appropriate de-identification processing based on those risks. This makes it possible to utilize data in new ways, such as advanced analysis by providing de-identified personal data to third parties and data integration and analysis for purposes other than the original intent. Use of this technology can contribute to faster and safer provision of de-identified personal information to third parties, not only in the healthcare field, but also in finance, local government, and others.

*: Calculated based on an information value quantification model from the Japan Network Security Association (JNSA)

Utilizing Personal Information for Co-creation between Different Industries, such as Healthcare, Finance and Local Government

This technology enables safe data sharing and quick data provision, which previously took time. In the future, this can be expected to lead to improved quality of services and products through co-creation between different industries. For example, by integrating and analyzing de-identified public transportation passenger data and de-identified customer sales data in local shopping streets with digital marketing analytics, it is possible to sell optimum products and hold events according to the time zone and the location segment. These activities through co-creation promote the revitalization of local communities.

Fujitsu Laboratories is planning to verify the effects in a real environment and bring it into practical implementation around fiscal 2017.