Marcus Hartmann is Partner at PwC Germany and Chief Data Officer for PwC Germany and Europe.
The right use of data opens up enormous economic opportunities and is the key to digital transformation. In the future, cross-industry data spaces such as Gaia-X, the major program for European data sovereignty, will make it possible to securely share data and work with it in a closed environment.
The aim is a data economy in which the constantly growing amounts of data can be used more efficiently in order to generate data-based decisions, process optimization and new business models. The estimated value creation potential is in the hundreds of billions.
It’s a significant potential available to organizations as they make more use of their data. However, modern data protection technologies are a prerequisite for this. Because intensive use of customer and business data is only possible if data protection and data sovereignty are secured.
The original data is not changed, but reproduced
The protection of personal data is essential and legally guaranteed in Germany and Europe. There are also signs of stricter data protection laws internationally, which are increasingly based on the European standard. Personal information can no longer be used or shared without restrictions, even if it can be extremely valuable for companies and society.
One option to solve this dilemma is to anonymize the data using synthetic data. While in the past attempts have often been made to resolve data protection concerns through legal processing, the use of synthetic data allows compliance with all legal conditions and at the same time generates a high level of benefit from the resulting data sets.
>> Read about this: Dax corporations and medium-sized companies create cloud platforms for the automotive industry
Synthetic data is machine-generated data sets. For this purpose, an artificial intelligence (AI) first records the structure of the original data before, in a second step, completely new data is created from what has been learned.
Depending on the purpose, this can be tables, text, images or videos. Accordingly, the original data is not changed, but new data is generated that was modeled on it.
This newly generated data is almost like real data in its application. They have comparable statistical characteristics and also allow complex questions to be answered without allowing any conclusions to be drawn about the personal reference contained in the original data.
Their use has proven itself in a test study: Here, in 92 percent of business processes, the same decisions were made on the basis of synthetic data as when using the original data.
Synthetic data is also suitable for privacy-intensive industries
Commercial providers of solutions for generating synthetic data guarantee data protection by extensively testing the newly created data for possible re-identification of personal information.
Synthetic data is therefore ideal for examining business transactions in a sensitive environment, such as when using health data. By consistently protecting the privacy of patients, they enable medical and pharmaceutical research to carry out an in-depth analysis for the design of new therapies and drugs.
Synthetic transaction data is also of great help in the strictly secured IT infrastructure of banks or insurance companies. They provide valuable insights into behavioral patterns and help, for example, to prevent fraud in the financial sector.
The fields of application for synthesizing technologies are diverse. Value potential lies, among other things, in the optimization of marketing processes, in cooperation with external consultants or agencies and in the development and improvement of our own products.
This technology will also make an important contribution to the development of shared data spaces and the use of artificial intelligence. By reliably protecting sensitive data and business secrets, synthetic data is an important key to resolving information asymmetries and establishing mutual trust in the market.
We’re still in the early stages, but according to Gartner, the technology is expected to completely replace real data in AI models by 2030.
The author:
Marcus Hartmann is Partner at PwC Germany and Chief Data Officer for PwC Germany and Europe.
More: Kai-Fu Lee and Qiufan Chen design future scenarios for the year 2041