By: Ken Adams
Data security is a vital part of successful data warehousing and business intelligence today, and a critical step in successfully securing a data warehouse is implementing proper data security levels. From a legal perspective, related acronyms like HIPAA and FERPA have become important legislation every data professional should know.
HIPAA is the “Health Insurance Portability and Accountability Act” and provides standards for the handling and protection of medical information. FERPA is the “Family Educational Rights and Privacy Act” and provides similar standards and protections for student information.
These are two of the more well-known federal laws that protect an individual’s data and establish guidelines for how that data is handled. Let’s dive a little deeper on each.
HIPAA deals with personal health information (PHI), which (under HIPPA) includes data elements such as names, phone numbers, and email addresses. When those data elements are combined with information derived from an individual’s medical record (or payment history), the information becomes protected under federal law.
In many cases, PHI is required to be anonymized prior to its inclusion in a data warehouse or access to the data should be tightly controlled at the user level. This might mean masking identifiable data elements or vertically partitioning those data elements into separate tables.
FERPA deals with personally identifiable information (PII), which (under FERPA) includes information that can be used to determine an individual’s identity, such as name, social security number, and date of birth. The federal law specifically provides protections on PII that can be derived from a student’s education records.
As a result, any records maintained by a school must be handled in a way that protects the privacy of the student. A student’s PII will need to be controlled in similar fashion when both loaded into a data warehouse and when reported on from the warehouse. Encrypting data at rest and in motion, in addition to scrubbing any PII from reports produced from the warehouse, are typical means of accomplishing the required security measures.
Data Security Levels
In short, HIPAA and FERPA are federal laws that stipulate how PHI and PII should be handled for patients and students. Both PHI and PII rank pretty high on the data classification scale.
Organizations, such as Harvard and Cal-Berkeley, have attempted to lay out data classification scales that start with a low protection level and increment to a high protection level. Their scales help provide guidance for data stewards and give database professionals an idea on how certain data should be stored.
PHI ranks high on these scales. The Harvard scale ranks PHI at a level 4 on a 5 level scale and Cal-Berkeley assigns a “high business impact” to any loss of PHI data. PII ranks medium to high, as it lands at a 3 on the Harvard scale and Cal-Berkeley indicates the risk of loss as “moderate”.
Why Data Security Levels Matter
Security levels (like ones mentioned) help data professionals plan how – and maybe even where – to store protected data. And proper storage might mean that federally-protected data doesn’t belong in your warehouse at all.
Removing these types of data from a data warehouse mitigates the probability for loss due to exposure. Exposure of PII might mean serious financial loss for an organization and most certainly a damage to the reputation and credibility of an organization.
Just ask Equifax if you don’t believe me.
So, What Should be in Your Data Warehouse?
In a data warehouse, it’s not enough to say “don’t take what you don’t need.” Most consumers of data from a warehouse don’t need to know the email address or birthdate for an individual but might need to know the state or zip code for completing necessary higher-level analysis.
The state and zip code are not PII, so storing those in the warehouse is less of a risk than storing the full home address. If the information is available via a query or report, then it’s more likely the sensitive information gets exposed, and thus more likely problems will be created for the organization.
In the end, it might be easier to scrub those data elements completely and create some bridge back to the underlying production data should the data warehouse users require those PII or PHI elements.
If the data warehouse is used for operational reporting in addition to higher-level analysis, then storing PII and PHI information could be a requirement. In that case, safeguards should be put into place. It might be useful for your company to create their own data classification similar to the ones I mentioned earlier.
Doing so would help your company understand which data has low sensitivity versus which is restricted. Then access to your data in the warehouse can be handled appropriately.
Client Example – Large Hospital System & PII
In the not too distant past, I had an engagement with a very large hospital system where we dealt with lots of sensitive patient PII data. To segregate the sensitive data, we created a separate database schema and set of tables. We stored PII data elements such as social security, street address, and names on this schema. We also created a non-sensitive schema and tables where we stored non-PII data elements. The majority of users did not require access to PII and were only granted access to the non-sensitive schema.
Those users who did require access to the sensitive PII were given access to the sensitive schema. That way, queries against sensitive data elements could only be executed by users who had access. Other users could only query and return non-sensitive data elements.
Client Example – Securing Student Assessment Data
On a prior engagement, my team and I were dealing with student assessment data and were determining how to store sensitive student data. Operational reporting was a requirement, so PII was reported on back to the individual students. Internally, though, many of the researchers had no need to analyze PII.
From a storage perspective, we were examining encryption as well as our processes for controlling user access. We were also working on a policy for destruction of PII data. Removing PII that is no longer needed reduces the risk of exposure while still allowing internal teams to query the non-sensitive information.
There are many layers to the data security onion, as illustrated in this article! There is really no one size fits all solution and understanding the nature of your company’s data is paramount in coming up with appropriate data security policies.
I have found that classifying the kinds of data in your organization and ranking those in terms of exposure risk is needed. If a data warehouse is in your future, you need to determine the reporting requirements and decide whether storing PHI and PII in the warehouse is important.
If it’s not, then leave those behind in the production system and bring over only what is required for analysis. This reduces the number of times this type of data is stored in your company and reduces your overall risk to attackers.
If you are struggling with storing private/restricted data or would like to get help on how to properly do so, please contact us and we’ll be in touch shortly.
Thanks for reading. We hope you found this blog post to be useful. Do let us know if you have any questions or topic ideas related to BI, analytics, the cloud, machine learning, SQL Server, (Star Wars), or anything else of the like that you’d like us to write about. Simply leave us a comment below, and we’ll see what we can do!
Keep Your Business Intelligence Knowledge Sharp by Subscribing to our Email List
Get fresh Key2 content around Business Intelligence, Data Warehousing, Analytics, and more delivered right to your inbox!
Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.