By: Andy Kim
Introduction
The SELECT…INTO statement is an excellent tool to have in your arsenal if you’re a SQL developer. The SELECT…INTO statement:
- Creates a target table for you automatically, eliminating the need to manually create a table.
- Loads the target table lightning-fast through the use of minimal logging (under the correct conditions).
Alternatively, you could use the CREATE TABLE syntax to create the target table first, then use an insert statement to load it separately. This method can be much, much, slower – especially as the size of the data set grows larger (millions to billions of records). This can be the only option in some cases, however.
Although it is great that the SELECT…INTO statement automatically creates the target table for us, we lose a certain level of control on how the target table is created and defined. Certain tasks require the target table to be in a specific format in order to perform subsequent operations on it. Some examples of this might be but are not limited to:
- A column must be non-nullable in order to add a primary key constraint on it.
- An ALTER SWITCH statement requires the source and target table/column properties to be almost exactly the same.
Do we have any control over how the target table is created when using a SELECT…INTO statement? It turns out there are some ways we can manipulate the target table. I would like to demonstrate some of those manipulations in this blog post.
Let’s take a good look at the source table we will be extracting from. It will be the control table for our upcoming experiments. In particular, pay attention to the column names, data types, and nullability.
What happens when we execute the following statement? How is the OrderDetail_Temp table created?
1 2 3 |
SELECT * INTO dbo.OrderDetail_Temp FROM dbo.OrderDetail_Source |
If we take a look in object explorer, we can see that it was created based on the original table, and in fact, matches the source at this level of detail.
Changing Column Names in the Target Table
The first simple way you can manipulate the target table is to set a column name. You can achieve this by aliasing a column in the query as shown below.
After running this statement:
1 2 3 4 5 6 7 8 |
SELECT [RecordID] ,[CustomerID] as CustomerKey -- alias to another name ,[OrderID] as OrderKey -- alias to another name ,[OrderDetailID] as OrderDetailKey -- alias to another name ,[CreatedDateTime] ,[ModifiedDateTime] INTO dbo.OrderDetail_Temp FROM dbo.OrderDetail_Source s |
We can see that the target table is created with the aliased column names:
Creating New Data Types in the Target Table
Another simple way you can manipulate the target table is to cast or convert to another data type. You can achieve this by casting/converting a column in the query, as shown below.
After running this statement:
1 2 3 4 5 6 7 8 |
SELECT [RecordID] ,cast([CustomerID] as varchar(255)) as CustomerKey ,cast([OrderID] as bigint) as OrderKey ,[OrderDetailID] as OrderDetailKey ,cast([CreatedDateTime] as datetime2(0)) as CreatedDateTime ,[ModifiedDateTime] INTO dbo.OrderDetail_Temp FROM dbo.OrderDetail_Source s |
We can see that the target table is created with the specified data types:
When casting or converting, we will need to keep in mind that our data may limit our options. In addition, we will need to understand the implications, such as losing precision in some cases.
Controlling the Nullability Property of Columns
You can use the NULLIF statement to create a nullable column, while you use an ISNULL statement to create a non-nullable column, as shown below.
After running this statement:
1 2 3 4 5 6 7 8 |
SELECT [RecordID] ,NULLIF(cast([CustomerID] as varchar(255)),-1) as CustomerKey ,cast([OrderID] as bigint) as OrderKey ,[OrderDetailID] as OrderDetailKey ,cast([CreatedDateTime] as datetime2(0)) as CreatedDateTime ,ISNULL([ModifiedDateTime],'1/1/1900') as ModifiedDateTime INTO dbo.OrderDetail_Temp FROM dbo.OrderDetail_Source s |
We can see that the target table is created with the specified nullabililty:
Avoid Bringing Identity Property to the Resulting Table
Although we can’t see it from the screenshots, there is an identity on the source table on column RecordID. The identity is being carried over to the new table.
A common hack to avoid bringing an identity property over to the target table is to do a dummy join or union. The dummy join or union causes the identity property to not be carried over.
This statement
1 2 3 4 5 6 7 8 |
SELECT [RecordID] ,NULLIF(cast([CustomerID] as varchar(255)),-1) as CustomerKey ,cast([OrderID] as bigint) as OrderKey ,[OrderDetailID] as OrderDetailKey ,cast([CreatedDateTime] as datetime2(0)) as CreatedDateTime ,ISNULL([ModifiedDateTime],'1/1/1900') as ModifiedDateTime INTO dbo.OrderDetail_Temp FROM dbo.OrderDetail_Source s |
While this statement will NOT create the identity because there is a join:
1 2 3 4 5 6 7 8 9 10 |
SELECT [RecordID] ,NULLIF(cast([CustomerID] as varchar(255)),-1) as CustomerKey ,cast([OrderID] as bigint) as OrderKey ,[OrderDetailID] as OrderDetailKey ,cast([CreatedDateTime] as datetime2(0)) as CreatedDateTime ,ISNULL([ModifiedDateTime],'1/1/1900') as ModifiedDateTime INTO dbo.OrderDetail_Temp FROM dbo.OrderDetail_Source s INNER JOIN dbo.DummyTable dt on 1 = 0 -- condition never true |
Adding an Identity Column to the Resulting Table
Let’s say we want to add a new identity column in the target table. We can do it like this:
1 2 3 4 5 6 7 8 9 10 11 |
SELECT IDENTITY (INT, 1, 1) AS NewIdentityColumn ,[RecordID] ,NULLIF(cast([CustomerID] as varchar(255)),-1) as CustomerKey ,cast([OrderID] as bigint) as OrderKey ,[OrderDetailID] as OrderDetailKey ,cast([CreatedDateTime] as datetime2(0)) as CreatedDateTime ,ISNULL([ModifiedDateTime],'1/1/1900') as ModifiedDateTime INTO dbo.OrderDetail_Temp FROM dbo.OrderDetail_Source s INNER JOIN dbo.DummyTable dt on 1 = 0 -- condition never true |
We can see that the target table has a new identity column.
Directing Your Table to Be on a Specific Filegroup
With SQL Server 2017, you can now also use the ON clause to direct the target table to be created on a specific file group other than the default.
1 2 3 4 5 6 7 8 9 10 11 |
SELECT IDENTITY (INT, 1, 1) AS NewIdentityColumn ,[RecordID] ,NULLIF(cast([CustomerID] as varchar(255)),-1) as CustomerKey ,cast([OrderID] as bigint) as OrderKey ,[OrderDetailID] as OrderDetailKey ,cast([CreatedDateTime] as datetime2(0)) as CreatedDateTime ,ISNULL([ModifiedDateTime],'1/1/1900') as ModifiedDateTime INTO dbo.OrderDetail_Temp ON [AnotherFileGroup] FROM dbo.OrderDetail_Source s INNER JOIN dbo.DummyTable dt on 1 = 0 -- condition never true |
Conclusion
Now that we have seen some different ways to manipulate the target table when using a SELECT…INTO statement, I encourage you to try these out next time the opportunity arises. I would also love to hear about other manipulations on the SELECT…INTO statement that you have used. Please leave a comment!
References
Select Into Statement General Information
https://docs.microsoft.com/en-us/sql/t-sql/queries/select-into-clause-transact-sql?view=sql-server-2017
Select Into a Specific Filegroup
https://www.mssqltips.com/sqlservertip/5018/selectinto-enhancements-in-sql-server-2017/
Questions?
Thanks for reading! We hope you found this blog post to be useful. Do let us know if you have any questions or topic ideas related to BI, analytics, the cloud, machine learning, SQL Server, (Star Wars), or anything else of the like that you’d like us to write about. Simply leave us a comment below, and we’ll see what we can do!
Keep your data analytics sharp by subscribing to our mailing list
Get fresh Key2 content around Business Intelligence, Data Warehousing, Analytics, and more delivered right to your inbox!
Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.