Protecting the Active Data – Strategic Insight!

By Rajesh Dangi, April 16, 2018

Growing concerns on the data governance and the recent breaches call out for insight into the most valuable asset for any organization, Data.

Although data governance spans out to availability, usability, integrity and security of the data as a whole, we would touch upon the “active data” of the organization and dwell upon the key challenges of managing this active data, key challenges and strategies around the risk management thereof. The ILM (Information lifecycle management) is a well-known framework explaining the active data and role of the same in overall digital asset management.

What is Active Data?

While I was talking to my old colleague and good friend in the BFSI industry, he mentioned that he is always concerned about availability of active data of customers to ensure key services are always available and the show must go on… needless to say for BFSI, the current active data for a customer is his balance and last few transactions ( say 90 days window) that’s more than enough for the customer to operate his/her account and must be available all the time with accuracy ( read, integrity) and secured ( Read, Confidential) all the time. S/he might not be worried if the historical statements and records go offline for a while as long as s/he has the ability to transact using the BFSI platform. Same case with any other organization say MFG (read, manufacturing) as long as you maintain summary of balances and visibility of last few transactions the supply chain, current inventory, accounts and net current payables etc one can continue to operate, and historical data can be made available as a service request which will not be a showstopper for running the days business. So for BFSI active data context will be aligned to their business needs and specific to that vertical compared to MFG…

Thus we can outline the “Active data” as the dataset which is current and required to run the core transactions and frequently accessed as part of any business process or given transaction workflow and has well defined tenure and scope. After these transaction cross identified yet limited window after which this dataset becomes less active / non-relevant for current transactions or key requirement of running a stateful set of the key transactions that may require retrieval or reconstruction of data. Few key examples of active data will be your available balance when you connect to your bank via ATM, the availability of that key data drives what you can do at the ATM, right?

What is applicable data governance for active data?

Before we go there, let us first understand the principles against which Information / data security is measured and those are..

  • Confidentiality
  • Integrity
  • Availability

When data is the business, it’s essential to be able to access the correct ( read, integrity) data quickly ( Read, Availability) and securely ( Read, Confidential), from the governance stand point one must know where and how the data is captured, transmitted, processed and stored, it must be made available as required and always managed securely while it is at rest (Read, Stored) or in motion. It is important to establish set of controls around each phase and organization must be able to track, manage and demonstrate as to how it is secured all along. This is what is mandated by multiple applicable statutes and regulations.

From ISO27K, which is widely applicable for most of the organizations for information / data security and dictates multiple control areas and objectives with which certifying body examines or audits the status of organization readiness for risk identification and management. Apart from this the other most common data protection compliance standards include SOX, PCI-DSS, FISMA/FedRAMP, HIPAA, SSAE 16, SOC 2 & 3, OCC which serve a very wide number of common industries and verticals, fundamentally talking about a common objective to secure the information. Our CERT-In has been designated under Section 70B of Information Technology (Amendment) Act 2008 in the area of cyber security and information protection. ( Refer National Cyber Security policy 2013 for more details)

As per the ISMS (read, Information security management system) defining scope, policy and assessment is vital for securing access to identified confidential data, availability of the same and maintaining integrity of the entire system. In view of the fact that risk identification and management is critical yet not done as a focused activity many organizations unknowingly risk the data loss or data theft. Few key questions organizations should ask for themselves are..

  • What does our ISMS framework tell us about “Active data” protection? Does it clearly define what is active data, how it should be protected and risks identified?
  • Do we have adequate policies and controls in place to secure our data across all stages of processing and storage?
  • Have we identified critical active data assets across the organization? Where does data processing take place? Where do we store this data and who all has the access?
  • Is there any third party involved in entire transaction workflow whom data is shared?
  • What mechanism we have in place to monitor access, incidents / events and preventive measures / remedial actions in place to manage security incidents effectively?
  • Does the processing and storage platforms deliver high performance and have latest technologies to ensure availability of the “Active data” and guarantees quick and smaller transaction windows thus minimizing the chances of data loss, session stickiness and exposure due to active external API connections?
  • What technologies and tools are deployed to drive the data protection strategy? How effectiveness is measured, and corrections are done to augment newly identified risks or technology changes?

On the governance front there are stringent laws and associated penalties and potential lawsuits in case neglect both knowingly or unknowingly for protecting sensitive data ( read, customers, stakeholders or impacted parties) is established, our IT act 2000 stipulates few sections to enforce data protection ( refer sections 11,65,66,66A,66E,67C,72 etc) and privacy thereof, thus is is no longer a choice but a mandate for sure.

Confused? The data protection is so vast and with multiple applications and modern technologies such as distributed computing, social media integration, mobile commerce, business intelligence and analytics coming our way, keeping track of active data assets is really becoming and challenge. So how can one still remain in control… let me try and simplify this a bit..

What should be the focus areas for “Active Data” protection?

The active datasets are crucial piece of business and thus demands robust strategy, analysis and good performance for backup / retrieval, storage/disks, access and processing capabilities. This can be achieved via following key tenets..

  • Training & Awareness – Education, Behavior & Awareness – Critical active dataset Identification, classification, associated risk assessment etc. Key to awareness is th know where data is processed, transmitted, stored or replicated and if there are any ways data can be copied without authorization…
  • Security framework & architecture – North to South and East to West – Risk Management (Prevention, Mitigation etc) – Securing both Physical and logical security, Encryption and Isolation etc. There will be ample tools to provide these services but effective choices, implementation, monitoring the alerts and continuously keeping the preventive measures updated remains a challenging task till now.
  • Access control, management and governance – Assessment Cycles, Auditability & Authorization etc, PIM/PAM / IAM toolsets. Getting right audit cycles and span of audits extending to external processers / vendors as third party risk audits etc are key differentiators, simply outsourcing the internal audits and filling checklists is a common practice and will not yield positive results in the long run, the risks of data loss must be identifies across all levels and functions and most of the users need guidance as to what is called as risk and how their awareness itself can reduce most of the risks. How many organizations have risk register with more than 25-30 risks identified and accepted for remediation. Identified risks can be analyzed, avoided or prevented, reduced or accepted and suitably responded for effective remediation, this is called as risk management in short.
  • Threat Detection, Management and incident response – Business Continuity & VAPT, identify attack surfaces within the setup / environment, external threats identification and end point security, HIPS / NIPS etc. This itself is another topic for deliberation. The active data if compromised due to lapse in security is bigger impact to business since there are liabilities and legal consequences due to negligence if established. It is hard to believe that major business lost millions due to recent cyber attacks and risk realization ( yeah, if risks are not identified and results in loss or impact to business it is called risk realization, Unfortunately most of us realize zero day attacks as elements on surprise.
  • High performance, secured and redundant infrastructure – Architecture & Deployment of security stack, backup and replication tools, OLTP applications on Private & Hybrid Clouds, legacy systems must be deployed in a redundant configuration thus providing failsafe access and availability. Most of the emerging technologies, such as distributed databases and open source systems have tackled high availability as part of the fundamental design. High availability databases are built to eliminate single points of failure and are optimized to ensure that the end user does not experience an interruption in service or a degradation in user experience in case of failure at both hardware, software level. The CAP Theorem, provides the three states that are key characteristics yet it is impossible for a distributed computer system to provide simultaneously, first is Consistency, meaning multiple values for the same data element does not occur, second is Availability, that the service operates and remain accessible fully, and third being Partition Tolerant, that’s is responding correctly to node and/or connectivity failure, An example of a network partition is when two nodes can't talk to each other, but there are clients able to talk to either one or both of those nodes and still be able to perform at desired service level.
Degrees of Magnitude of Risks

What constitutes valuable active data may differ from company to company and industry to industry. It will still encompass device / user identification, application / database rights management, and virtually any descriptive, structural or administrative information that helps in the retrieval, preservation, and retrieval of the active data. Studies show hardware failure and human error are the two most common causes of data loss, accounting for roughly three quarters of all incidents besides natural disasters.

We all know the impact to business due to data loss resulting in loss of productivity, which in turn impacts company sales and profitability and might be severe depending on the value of the lost data to that business and major legal consequences in case of breach. We have sighted enough examples of loss of active production data due to human error and lost hopes of retrieval due to not taking regular backups or the backups saved on the same host, device or impacted storage.

There is another element of surprise, data theft. A good old example is that you lose your mobile phone and you notice it but when someone copies all critical contacts and messages or passwords from your mobile phone and you don’t even have a clue, unless of course your phone is encrypted, and data stored was not in a readable format in the wrong hands. While many organizations keep tight vigilance at the perimeter the actual data is not protected or still remains vulnerable for inside attacks, the DLP (Read, Data loss prevention) tools help protect the data while in motion or at rest, but very unlikely when it is processed or cached. Thus, the risk in modern times is evolving protecting the distributed systems, parallel processing and third-party platform syndication for specialized services such as payment gateways etc. Let’s talk “Prevention, better than cure!!” Three simple steps one must take to help prevent loss or corruption of active data .


Your customers, business partners, and investors will ask what your security posture looks like, so it makes sense to perform a thorough review of your environment to identify gaps where confidential data, perform risk assessment and ensure participation from all stakeholders in your organization. Get professional help in case you are not equipped to handle this, first principle of ISMS is to garner the sponsorship from leadership to in information protection and providing all required resources ( yeah, budgets included!!)


Tie the loose ends via technology, best practices and continuous improvement and assessment, you never know what changes in the background, save your back! The security domain is the most innovating and most of the hackers deploy technologically advanced skills and techniques, keeping ahead of them warrants vigilance and proactiveness. On the other hand, keeping fundamental hygiene itself can provide enough safe guards, few of these include but not limited to, anti-virus, anti-malware, BYOD management, portable devices / USBs, poor password management, unpatched systems and service upgradation, lack of hardening of network and servers, OS hardening, unnoticed root privileges, lack of multi-factor authentication, default configurations, lack of isolation or separation of production, dev & test environments and insufficient policies and infrequent file integrity monitoring and most common default passwords and open firewalls…. Another critical aspect for data protection is to ensure diligence in access management and control, regular access audits and reviews across all levels…I can go on and on! Point is just getting your fundamentals right can lower the changes of breaches and data loss significantly.


Depending on the criticality of active data, the identified active datasets must be secured via backups that will have different policy sets for daily or hourly incremental, daily (some people call it nightly) or weekly full and one copy at remote location, this will ensure availability via near DR, remote replication and active-active operation. Remember replication and backup are two different things, if the data is corrupted the replicated copy will also get corrupted but backups are always saved on offline media servers and can preserve last good set of your data. Most enterprises saved themselves from ransomwares just because they had implemented good backup plans.

Do refer to the CIO best practices released by Meity, as an additional reference..

Disclaimer, The data protection and strategies are ever changing and thus the approach and ruleset to counter the threats, keeping us on toes as always!! Yet, Action Matters!!