By: Ken Adams
Power BI Desktop is a free and powerful analytics tool that puts data visualization and data transformation right at your fingertips.
As we all know, 2020 was a contentious election year, especially here in Georgia, where we had runoff elections for two Senate seats.
During election season, my wife was perusing the Secretary of State’s website and pointed out to me that voter rolls are publicly available, including absentee ballot information. Every voter in the state that requested an absentee ballot is tracked and their information is available to anyone by going to the website and clicking on a link. They provide the files in CSV format at the county level and state-wide level.
For this exercise, I want to show you how to process the individual county files in Power BI and do some quick analysis of the data. This is a useful exercise and the Power BI Power Query Editor makes it simple.
- Go to the GA Secretary of State website.
- Make the following selections:
- Then click on and download the .zip file seen in the screenshot.
- Extract the files to a folder somewhere on your local machine – you will see 159 .csv files or one for each county in the state of Georgia. Please note: I removed the STATEWIDE.csv from the extract location as I want to demonstrate on the individual county files.
- Open Power BI Desktop and then select Get Data\More… and choose Folder from the All options on the left.
- Then click on Connect and navigate to the folder containing the voter files and click Ok and then Ok again.
- You are presented with the list of the files and you’ll want to select Combine and Load – note you have the option to apply transformations which allows you to filter or further manipulate the data.
- Power Query determines the connector to use automatically – leave the defaults on the Combine files dialog screen.
Power BI will begin the data import and will also detect any errors with the files. In the previous screenshot, you see a Skip files with errors option box but we’ll allow the errors to be filtered out individually rather than ignoring the entire file. The process to import the 159 files takes just a few minutes. Once imported you can now see the data that was imported. You can now begin to further transform the data if desired.
- Click on Transform data from the Home ribbon.
- In the preview, you can see that the source file is included and that file number corresponds to the county number in Georgia. For example – 001 APPLING County.
- You will see that the initial import has already applied steps in getting the data to this point.
At this point, I don’t see any additional transformations that need to be applied. Power Query has done a good job of selecting the appropriate data types, elevating the 1st row to a header, etc. Close the Power Query Editor and return to the Power BI Desktop visuals pane.
Let’s create some simple visuals.
- Choose the Map visual from the Visualizations selector. (We do want to increase the accuracy of our map so let’s make sure a handful of columns are categorized correctly. From the data tab match Zip Code with Postal code from the Data Category dropdown. Do the same for City and State. If you want go ahead and categorize the remainder of the address parts.)
- Back on the visuals tab, you will see in the fields selector that these columns now have a geographic indicator.
- From the fields selector, drag Zip Code as Location and Voter Registration # as Size. Also, select Count as the aggregation for Voter Registration #.
- The visual is now being populated but the view of the entire state is difficult to digest. Let’s add a data filter for County and we can start drilling in on the data and number of absentees requested for certain counties. Fulton County had almost 530K absentee ballots and you can zoom in to see areas of heavy concentration such as Johns Creek, GA, which is where Key2 Consulting happens to be located.
There are many scenarios that you can test and do so quickly. I was looking to see the number of absentee ballots issued in November. By the way, the election was 11/3 so not a lot of days had gone off the calendar in November before the general election was held. I’ve added filters on Ballot Issued Date for Month and Year.
I’ve highlighted an interesting data point in the visual above – Stephens County had over 4,000 absentee ballots issued in November. It might be an error in reporting or issues with the data. If we include the day along with the month and year the picture becomes a bit clearer. With November 2nd selected we now see:
So over 2,000 ballots were issued on 11/2/2020 in a single zip code. That works out to around 10% of the population of that particular zip code.
You could continue looking at different visuals or adding to the current visual. The data is there and Power BI makes it simple to work with.
Thanks for reading. We hope you found this blog post to be useful. Do let us know if you have any questions or topic ideas related to BI, analytics, the cloud, machine learning, SQL Server, Star Wars, or anything else of the like that you’d like us to write about. Simply leave us a comment below, and we’ll see what we can do!
Keep Your Business Intelligence Knowledge Sharp by Subscribing to our Email List
Get fresh Key2 content around Business Intelligence, Data Warehousing, Analytics, and more delivered right to your inbox!