By: Mason Prewett
 
 

I’m a SharePoint administrator in my organization and I’m often asked to answer many questions about the content that resides in the SharePoint farms that my team supports. My organization allows the site owners to manage all of their site content and permissions and thankfully so, as this would be an unmanageable task for the SharePoint team. However, doing so leaves our SharePoint team without an idea of the type of content that’s out there. Even though it’s unrealistic to be involved in all of the site content that’s hosted on our sites, I feel it’s the responsibility of my team to be aware of what content we’re hosting.

SharePoint doesn’t offer any type of tools that can assist with content inventory (that I’m aware of), so I chose to write some PowerShell code to export this information to CSV. I then used SSIS to import the data into a SQL table. From there, I can query all content using T-SQL.

The PowerShell code for pulling all inventory:

PowerShell code file: SharePointSystemInventory

PowerShell Code Notes:

This exports two files. One for sites and one for documents.

I pull sites into a different file versus documents because the document code will only pull site information for sites that actually have documents stored in them. Having the site info pulled separately ensures that all sites are inventoried.

I am only pulling inventory for all items in document libraries. Pulling all content in lists didn’t offer much value to the type of questions that I was trying to answer. To include all list info, remove the following condition if($list.BaseType -eq “DocumentLibrary”). Additional logic will also need to be written to handle list items differently since the code looks for the file extension of the item.

A sample of the CSV file that is exported for sites (Headers and 1 row):

“Farm”,”WebApplicationURL”,”SiteCollectionURL”,”WebURL” “SPProdFarm”,”https://spfarm.com/”,”https://spfarm.com/sites/hr”,”https://spfarm.com/sites/hr/employees”

A sample of the CSV file that is exported for documents (Headers and 1 row):

“Farm”,”WebApplicationURL”,”SiteCollectionURL”,”WebURL”,”DcoumentURL”,”FileExtension” “SPProdFarm”,”https://spfarm.com/”,”https://spfarm.com/sites/hr”,”https://spfarm.com/sites/hr/employees”,”Documents/EmployeeList.xlsx”,”.xlsx”

The Create statement for both SQL tables. These were the safe data types for each column after testing in my environment:

Create Code Notes:

To reduce redundancy, I didn’t pull in the WebApplication URL or SiteCollection URL for the document tables.

The Create statement for the DocumentInventory view to include WebApplicationURL and SiteCollectionURL:

The next step would be to import the CSV files into each table. This can be done manually in SQL Server Management Studio or by creating an SSIS package to pull the data in automatically. I created a SSIS package to make the process automated. This helped me a lot since I had 11 farms to inventory. I had to pull in 22 files – 2 files per environment.

Finally, I wrote a few SQL queries to discover facts about the content that is hosted in my SharePoint farms.

Some queries that I used:

Insert SQL code file: SQLQueries

SQL Query Code Notes:

I determined what type of content a document was by defining the “FileExtension IN” clause. This made sense for my environment, which is focused on SQL Server Reporting Services. This may need customizing in other environments based on what the focus of the SharePoint sites are.

These were the file types that showed up in my environment. Other file types may show up in a different environment that may need to be considered. Ex: My environment didn’t have any mp4 file types which would need to be considered as a media file type.

One enhancement that could be done is to create a mapping table for file types and their content definitions. This would eliminate the need to hard code this into IN clause. This will be added into version 2 of my solution.

You can query this data to get any information that’s needed from here. If I have a question about the SharePoint content in my environments, I can easily answer it by using this info. It includes high level farm details as well as down to the site level details.

Real world information about my SharePoint farms that I was able to provide using this data:

What type of content is being most/least used in my farm?

This helps prove that my Sharepoint farms are focused (or aren’t focused) on providing certain services.

What sites have the most/least content?

This helps show site stakeholders which of their teams are using Sharepoint as a tool and which are not.

This helps identify which site may have too much content in a situation where a content database is getting too big. (Also in my next version, I will be adding file size to the document export. This will provide the ability to see storage consumed at all levels.)

If history of this is being kept, content trending can be determined of all sites and documents?

This can tell site/document growth over time.