When maintaining a data warehouse, it is import understand the size and disk requirements of your tables over time. My current client has asked that I generate a report showing table disk usage over time, so that we can predict table growth.
I was looking around and found the procedure sp_spaceused. This returns the row count, data usage and index usage for a given table. I decided to create a fact table to save the table info on a daily basis. The SSIS package should not be too difficult to create but it turned out to be a much bigger challenge then I expected.
First I created a Fact Table and couple of Dimension Tables. I took my standard Dim.Date and Created a Dim.TableDefinition to store the table name and other table related info. I used sys.tables to get the base info.
CREATE TABLE Fact.TableSize(
TableSizeKey int IDENTITY(1,1) NOT NULL,
RunDateKey int NULL,
LastInsertDateKey int NULL,
TableDefinitionKey int NULL,
TableRows bigint NULL,
ReservedSpace bigint NULL,
DataSpace bigint NULL,
IndexSpace bigint NULL)
CREATE TABLE Dim.TableDefinition(
TableDefinitionKey int IDENTITY(1,1) NOT NULL,
SchemaName varchar(50) NOT NULL,
TableName varchar(50) NOT NULL,
TableSchemaName varchar(100) NOT NULL,
TableDescription varchar(max) NULL)
--Populate Dim.TableDefinition
SELECT
S.name + '.' + T.name AS TableSchemaName
,S.name AS SchemaName
,T.name AS TableName
,NULL AS TableDescription
FROM sys.tables T
INNER JOIN sys.schemas S ON T.schema_id = S.schema_id
WHERE LEFT(T.name, 3) IN ('dim','fac')
To create the SSIS package I started off in the Data flow. I first created an OLE DB Source, with the SQL command
sp_spaceused 'dbo.Table1'
and clicked on the Columns tab. I was sad to see that the Editor only pulled back the default set of columns that the procedure returns, when no parameters are passed in.
To get the proper output I stubbed out a select statement before calling the procedure.
IF 1 = 2 BEGIN
DECLARE @SchemaStub TABLE
(TableName VARCHAR(50)
,TableRows VARCHAR(50)
,ReservedSpace VARCHAR(50)
,DataSpace VARCHAR(50)
,IndexSpace VARCHAR(50)
,UnusedSpace VARCHAR(50))
SELECT TableName
,TableRows
,ReservedSpace
,DataSpace
,IndexSpace
,UnusedSpace
FROM @SchemaStub
END
EXEC sp_spaceused 'dbo.DimDate'
The next problem I encountered is when I tried to pull the table name from a variable.
IF 1 = 2
…
END
EXEC sp_spaceused @objname = ?
Error: Parameters cannot be extracted from the SQL command….
I also tried declaring @objname at the top, but that did not work either. So I decided to change the select statement into a variable.
Back on the Control Flow I created an Execute SQL Task. I used a SELECT against sys.tables to get a list of all fact and dimension tables. The Result Set of the task is saved to an Object variable, vTableList. I then used a For Each Loop, for each table value in the vTableList. I then used a Script Task to replace the table name in the select statement above.
Dts.Variables("vSpaceUsedSQL").Value = "IF 1 = 2 BEGIN DECLARE @SchemaStub TABLE( " & _
"TableName VARCHAR(50),TableRows VARCHAR(20),ReservedSpace VARCHAR(20),DataSpace VARCHAR(20),IndexSpace VARCHAR(20))" & _
" Select TableName,TableRows,ReservedSpace,DataSpace,IndexSpace " & _
" FROM @SchemaStub END" & _
" EXEC sp_spaceused '" & Dts.Variables("vTableName").Value & "'"
To wrap up the package
1. I added a SQL Task to delete any data from today, incase it is run twice in the same day.
2. Set the RunDateKey to the key value of GETDATE()
3. I created some pretty Reporting service reports to show my table sizes over time.

