Case Study: Enterprise Information Management for a Large
Government Agency

The Context

Our client is a large, not-for-profit government agency that provides housing and housing assistance to low-income families. The organization serves over 20,000 families, including working families with children, seniors, and disabled households.

The agency engaged our company to create their version of an enterprise information management (EIM) system. The system’s core is a data platform hosted in Microsoft Azure.

The Challenge

The agency’s reporting and information sharing had become an issue. Reports were assembled manually in Excel and distributed via email or file shares. This caused issues with the timeliness of the information as well as the accuracy of the data being presented. Users were limited in their ability to interact with the data as the reports were static.

Additionally, leadership had an ardent desire to share insights with the public. The agency wanted to be as transparent as possible to community leaders and others on the types of services provided and the number of households served. When we arrived, this was not possible.

Additional Challenge

Our client records sensitive data for the members it serves. This data is PII (personally identifiable information) and while useful for internal reporting, it cannot be shared in a public forum.

The PII needed to be stored and secured in the data platform but only reported where required. This demanded diligence in creating an appropriate data model and choosing the proper method for sharing reports and dashboards via the web.

The Solution

The government agency wanted to modernize its data operations and did not have an appetite for growing its on-premise footprint or adding staff to support the effort. The initial work for the engagement entailed determining the appropriate cloud platform. The two competing cloud providers were Amazon and Microsoft. We provided our client with an in-depth comparison of services and pricing for each platform, and our client chose to go with Microsoft.

The main components of the EIM platform are:

  • Azure Data Lake Storage Gen. 2 (ADLS)
  • Azure Data Factory (ADF)
  • Azure Databricks
  • Azure SQL
  • Azure Logic Apps
  • Power BI
The initial deliverable was an operational report that was created and delivered monthly. The report gave insight into significant metrics such as household and unit composition, work compliance, and member demographics broken down by the specific programs provided by the agency. A key requirement was the ability to review prior periods inside the same report.

The data for this report was sourced from two on-premise databases. The databases were fed from two disparate systems – a legacy system and a commercial-off-the-shelf (COTS) system. The data from those systems needed to be extracted, cleansed, and harmonized into a singular reporting model.

We worked closely with the on-premise reporting team to understand those metrics and how they were derived from each system. We then created logic in Databricks notebooks and Azure Data Factory ELT (extract-load-transform) pipelines to move the data and shape it to a conformed data lakehouse model. The same model would be stored redundantly in Azure SQL for reporting in Power BI.

The IT department was very concerned about the security of this architecture. Much time was spent on ensuring that the communication between cloud and on-premise resources was secure and that only individuals from inside the agency could access data stored in Azure Data Lake Storage or Azure SQL. The architecture went before a review board and was determined to meet all the agency’s requirements.

The lakehouse model was able to support the initial deliverable along with other internal reporting needs. The ingestion process was based on source-aligned ADF template pipelines (i.e., SQL Server, CSV, Excel) that were informed from metadata configurations stored in a database repository. New source tables or files became entries in a set of related tables, which expedited the development process.

Going Public

Leadership at the agency was most interested in developing a dashboard report that could be presented via their website. After delivering the initial reports, we quickly turned our attention to how to do this in a timely and secure fashion.

Power BI was the tool of choice for reporting. Reports developed in Power BI can be published on a website. Our client wanted the dashboard to be hosted externally and did not want users to have to log in to view the reports.

To meet this requirement, we created a dataset from the lakehouse that did not contain any sensitive information about the households being reported on. PII was scrubbed from the data so that the dashboard could be published publicly without exposing individual family members served by the agency.

In a short amount of time, report pages were created with limited requirements, and visuals such as maps were used to enhance the user experience. At the same time, we worked closely with IT to maintain the integrity and security of the dashboard. The dashboard is now hosted on the website and has become a centerpiece of the EIM platform.

Community leaders have heaped praise on the agency and challenged other agencies to pursue similar efforts.

Lessons Learned

One key lesson learned from this engagement was that it’s important to try to create reusable, data-driven data pipeline templates to speed up the development process. The initial work to create the templates will be worth the time savings as new source objects are added.

Another key lesson learned was the importance of understanding the security requirements of the internal IT department and communicating with full transparency about the architecture of your solution. Failure to do so upfront can add days and even weeks to creating a secure and acceptable solution!

Need help maximizing Azure?

We can help you make the most out of your Azure investment. Contact us today.
Copyright © 2024 Key2 Consulting