Self-aware Data – Securing Data across its Life Cycle

Increasingly costly data breaches in recent years have shown the importance of data protection and privacy in the age of the data economy. While organizations have accelerated their pace in adapting to the increased levels of security and data sharing, much still needs to be done. IBM’s 2019 Cost of Data Breach Report showed that the global average cost to an organization of a data breach was US$3.92 million, a 12% increase over five years. The latest attack on the European Medicines Agency (EMA) – in which hackers successfully penetrated and stole important information regarding the COVID-19 vaccine – is just one of the many examples of ever-increasing cyberthreats.

Where are the gaps?

Indeed, the key ways in which organizations still fail to secure data – even after so many advances in cybersecurity – have been highlighted by the rising number of data breaches during the COVID-19 pandemic, including such examples as:

Organizations secure the transport layer in which data is transferred rather than securing data itself
The controls and policies lie within an organization’s IT estate rather than with the data owner
There is a lack of centralized visibility into data movement and assets across the organization
It takes too much time and effort to implement policy changes across the organization
Employee awareness of, and preparedness for, security is generally the weakest link in cyber defense; a majority of breaches can be traced back to human negligence

Moving toward self-aware data

This situation is precisely where self-aware data can help. Self-aware data refers to data that is intelligent and can protect itself from intrusions. Each piece of self-aware data can defend itself at any place, continuously, during its lifespan and does not rely on securing the communication tunnel, which is the common security method. The approach is based on democratizing data security, which includes a process by which the data owner sets up policies related to accessing their data. It treats the root cause of data loss rather than the symptoms.

Let’s take a closer look at how organizations can implement self-protecting, self-aware data:

Focus on data rather than the communication channel – The core focus should be on securing data. A wrapped layer of security protocols across data enables the user to freely send the data across media without the worry of data loss. The data owner sets these protocols, and only users who meet these protocols can access the data.
The owner controls the data asset throughout its life cycle – Once the owner creates the data and establishes access-related policies, that owner should have complete control of the data until it is deleted. Even if copies are made on any devices or stored across locations, the owner should be able to control the files with the same policies.
Seamless data movement and interoperability across platforms – Self-aware data needs to be operable across platforms, devices, applications, operating systems, cloud services, and data centers. It must be universally deployable and interoperable to provide real-world protection across today’s diverse environments.
Built-in log analysis – Organizations need to implement built-in log analysis across the data life cycle, from creation to storage, until destruction. Self-aware data should be able to provide proof of possession, custody, and control. It needs to provide this information back to its owner for every copy or instance from anywhere.
Ability to upgrade policies on the fly – To adapt to the dynamic cybersecurity regulations, owners should have the feature set to apply any new policy regulation across all files at any time.

Future-proofing data

In a rapidly changing digital world, there is also an increasing need to future-proof intelligent data. We thus recommend the following actions to safeguard self-aware data from the next-generation threats of AI-/ML-powered cyberattacks:

Implement geo-fencing and geo-location capabilities – Such policies can ensure that the data stays within the organization’s geographical presence, which is especially helpful as we increasingly see a rise in hacker groups from specific geographies.
Detect and safeguard related data pieces – Organizations should also ensure that the protection rules or protocols are able to replicate themselves wherever that data or any part of it flows. For example, if the protocols allow certain users to access an Excel sheet containing a sales data table, these protocols should be replicated automatically if any row of that sales table is used in any other document or Excel file to ensure end-to-end data safety.
Foolproof data against any augmented intelligence approach – Data masking and Generative Adversarial Network (GAN)-based techniques to generate synthetic data have been a boon for training AI/ML models. Self-aware data, if masked or even synthesized to generate new synthetic data, should be able to recognize the base parent file and initiate the same set of protocols on the new files created.

When combined with a zero-trust architecture, self-aware data can act as an invulnerable armor for the valuable data assets that organizations possess. To capitalize on the opportunity, some startups have already started work on tools and solutions to enable self-aware data in the hopes of making data breaches irrelevant.

If you have any questions regarding how self-aware data can help secure your existing data landscape or would like to share your inputs on the broader cybersecurity landscape, please write to us at [email protected] and [email protected].