Protecting data integrity means assuring the accuracy and reliability of data during its immediate involvement in processing, as well as over its longer term lifecycle. It must be accounted for in the design, implementation, and usage of data in any system by implementing controls for data’s validity in processing, transit, and storage. The number and type of controls implemented to preserve integrity must be adequate to the level of the data’s importance or tolerance for a mistake. If a data system has very low tolerance for integrity issues, it will require a very high number of controls, and the concept of non-repudiation is a critical feature for the system. Non-repudiation means a user not being able to deny having performed an action or activity, or being able to verify a user’s action of an event’s origin. Even systems with a moderate tolerance for an occasional data integrity issue may require non-repudiation for some types of data transactions.
Accuracy, Reliability, and Validity
In order for data to remain accurate and reliable, it must be validated adequately at key points along its lifecycle. The term validation goes hand-in-hand in any consideration or discussion about data integrity. To ensure data integrity is to make certain it is recorded exactly as intended, and if it is ever requested, it is the same as it was when it was recorded. A system with data integrity will be designed to prevent any unintentional changes to information by authorized individuals. It is also concerned with preventing any creation, change, or removal by an unauthorized person or process. Some may think this realm of unauthorized changes belongs to the concept of confidentiality. It might overlap, as it deals in the domain of protecting data from unauthorized parties. However, it’s really a distinction with no meaningful practical difference, since anyone concerned with information security will need to be concerned with both.
Lifecycle – Processsing, Transit, and Storage
The lifecycle of data describes where and how it is processed, moved, and stored from its creation until its deletion. Mapping the data lifecycle involves noting how it’s validated within each state, as well as across each boundary between states. Overlaying a timeline and accompanying constraints can reveal a clear picture of vulnerabilities to consider. The resulting diagram will make it easier to compose policies and implement controls that account for data integrity adequately. Processing deals with granular transaction commands in creating, modifying, deleting, saving or retrieving. Integrity can be compromised by unintentional coding mistakes, and abrupt or unexpected failure scenarios such as a sudden power outages. Building validation and controls to discover, handle, and perhaps recover data without corruption or loss at layers close to the user initiating the transaction is key. Threats in transit include viruses and hackers, but more often deal with intermittent loss of connectivity. Storage loss or corruption may occur if there is disk drive failure, a virus or hacker attack (e.g. ransomware), or some database or file system logical failure. Most of the hardware and software employed have many integrity controls built-in and readily configurable, and best practice instructions and advice for implementing for your system are readily available. This makes designing a system with fairly good data integrity affordable in most instances.
Adequacy – Striking the right balance
Integrity might be considered synonymous with data quality. A system’s adequate integrity target might be expressed as a point lying along a spectrum of tolerance for quality. On one end of the spectrum is 100% data integrity representing no tolerance at all for errors ever. At the other end there is 0% data integrity, representing its polar opposite of complete data loss or complete corruption. No system constantly lives on either endpoint. However, if you were to look at a distribution of 1000 systems chosen at random and place them along that spectrum, most systems will live in a relatively narrow range in the extreme high quality and very low tolerance for error band. Few systems are viable anywhere close to the low quality end of the spectrum, representing very high tolerance for error. That’s due to improvements in, and adoption of, high quality and affordable hardware and software. If information isn’t mostly accurate or reliable most of the time, the system probably isn’t viable for long. There are always affordability and practicality considerations at play. If the data or system involved has implications for catastrophicly large harm or loss, then there may be integrity and validity considerations and controls at every layer and scope imaginable. Systems that are less mission-critical will have adequate integrity controls to match the relative importance of the data. But in any case, people expect data to be correct and valid or they will eventually abandon use of the system. Therefore, basic integrity controls need to be designed into any system.
Non-repudiation and audit trails
Nonrepudiation means being able to prove which user or process caused an event or change to data. It’s ensured through the following five basic features of an information system:
- accountability, and
Staying abreast of the best practices for ensuring each basic feature above is critical. An audit trail or log is comprised of a database table that chronologically adds records that show a security-relevant and important transactions, events, or operations. It’s intended to serve as documentary evidence of the sequence of activities which occurred around and during a specific operation, procedure, or event. Financial systems, healthcare systems, and many other types need audit records to reconstruct what happened, and who and what was responsible if there is a problem, or important transaction, or security breach, etc. An audit trail process should always run in a privileged mode, be routinely and automatically checked from time to time to ensure it is persistent and actively working and running, so it can access and supervise all actions from all users. Users should not be allowed to stop it from running, or change any of the data it is recording. Best practice, if affordable, is to locate it on a separate database or on a different drive than the normal database so that it is more easily cordoned off and safe from users changing, deleting, or adding anything to it.