By: Matt Wollner
KICKOFF! Fantasy football season is back, and if you’ve done any research at all, you’ve probably noticed that the number of websites offering player projections is endless. You – if you’re like most people – probably have a site or two that you prefer to get your data from.
But what if you want to aggregate the data across multiple sites and get a more comprehensive look at each player? For my personal fantasy prep, I pull data from 10+ websites into PowerBI Desktop and join all the data sets together. It gives me some added confidence that the numbers are accurate.
But what if the charts you are trying to pull from are paged across multiple pages (a paginated report)?
In this post, I will show you how to use PBI Desktop to import data from a chart that spans across multiple web pages.
The Game Plan:
First, I’m going to gather the fantasy football projections for all the major positions from CBSSports.com. Notice that the URL I am using only shows QBs, and part of the URL actually has the position name “QB” (highlighted):
If you click on another position in the dropdown, like Running Backs, notice that the URL updates but only the position changes (highlighted):
Note: This paging will only work if the charts are identical in structure. The Defense and Kicker positions will have a different column structure than the QB, RB, WR, and TE. For that reason, this example will be focused only on the QB, RB, WR, and TE.
Gameday – The Process:
To start, we will import this page into PBI, using the “Get Data From Web” functionality and selecting the Advanced option.
Split the URL into three lines so that:
- Line 1 – The beginning of the URL (https://www.cbssports.com/fantasy/football/stats/sortable/points/)
- Line 2 – The position in the URL – “QB/RB/TE/WR”
- Line 3 – the remainder of the URL (standard/projections/2018/ytd)
After selecting OK, the Navigator page will appear. Let’s select the data elements that we want to retrieve from the web page. You will see that there are 2 options. By clicking on the data elements, you can preview the data they contain. Table 0 contains the data we are looking for. Choose it, and select Load.
As you can see, the added data needs a little scrubbing. Let’s start by removing the 2 header rows, setting the 3rd row as the header row, removing the bottom row, and renaming the columns. (Need help? Check out our blog on How To Manipulate Data in Microsoft Power BI.)
Now let’s go back to edit the query and add a new Parameter. Under the Home Ribbon, select Edit Queries and then select Manage Parameters. Create a new parameter and call it “Position”. Set the Type to Text and set a Default Value.
Now we need to update the Source to use the Parameter. Under the queries, select the Table. Select Data Source Settings, and then Change Source. Now go back to the 2nd part of the URL that we defined earlier and enter the Parameter name (Position) that we created earlier. Select OK and then Close.
To test that the parameters are set up correctly, you can change the parameter value and update, then return to the Table View.
Finally, to get all of our data in one dataset, we need to add a function and invoke the function for each position. Right-click on the main table and click Create Function. Give the function a name, like GetPosition.
To invoke the function, we need to define a list of values the function will be called for. Let’s create a new table with the values QB, RB, WR, and TE.
Select Enter Data. Name the column Position and enter the values mentioned above.
Click on the new table and add a new Custom Column (under Add Column). The Custom Column is going to call our GetPosition function for each row in the table –
Fourth Quarter – Conclusion
And we’re done! The result (shown below) is a small table with an additional embedded table for each position. If you click the Expand button, all columns will be expanded out to give you a full table with a row for every player at each position.
While this example walked through how to pull data across multiple pages based on a parameter value, the same method can be applied for extracting data from any paginated report (published across multiple pages). For example, the NFL.com stats page is a paginated report and can be updated by changing the P=# at the end of the URL. If you set up a table with the values 1-7, you can extract and use all 7 pages of stats.
Overtime – Thanks for Reading
We hope this article was useful. If you come up with any other interesting uses for this method of extracting data, please be sure to tell us in the comments! We’d love to hear from you.
Keep your data analytics sharp by subscribing to our mailing list!
Thanks for reading! Keep your knowledge sharp by subscribing to our mailing list to receive fresh Key2 content around data analytics, business intelligence, data warehousing, and more!
Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.