By: Jharred Williams
 
 
 
To kick things off, let’s start with a quick refresh on SQL Server Integration Services (SSIS) and what’s new with the tool today.
 
To say SSIS is strictly a data extract, transformation and loading (ETL) tool would be understating the capabilities of the Microsoft business intelligence product. Yes, SSIS is a data integration tool used to ETL data. However it’s also a workflow application for data warehousing that can be used to automate maintenance tasks of SQL Server databases, as well as perform updates to multidimensional cube data.

SSIS is also source-agnostic, meaning it’s operative regardless of your sources/destinations being in the cloud, on-prem, or a hybrid of the two. SSIS was not built for a hybrid environment, but this “gap” can be bridged by Windows authentication VMs which make it possible for packages to be executed not only on premises but in the cloud as well.

So now that we’ve defined SSIS, let’s take a deeper dive and discuss more about the tool and how you can conduct performance tuning for SSIS to ensure that you’re achieving max performance.

We will evaluate the following topics pertaining to SSIS:

– SSIS Control Flow
– The Data Flow Task
– Performance Results / Maximizing Efficiency

I have chosen to present this information in an outline format for readability purposes.
 

1. SSIS Control Flow

Tasks are primarily limited by the environment in which they are debugged and by what they are developed to execute.

  • Example
    • When considering utilizing an FTP Task, a task that downloads and uploads data files and manages directories on servers, it can only FTP a file as fast as the network and FTP server will allow it to run.
      • In your workflow, you can use the package to download data files from a remote server or an Internet location.
      • Another practical application for using the FTP task would be copying directories and data files from one directory to another, before or after moving data and applying transformations to the data.
      • The FTP connection manager authentication
        • Supported authentications:
          • Anonymous authentication
          • Basic authentication
        • Non-supported authentications:
          • Windows authentication

 

  • Executes SQL Task:
    • Will only operate as performant as the database engine can execute the SQL query it contains
      • A slow stored procedure will still be a slow stored procedure in an execute SQL task
        • A turbo button to increase performance does not yet exist
    • Increasing performance
      • Change the environment
      • Modify queries / Stored procedures
  •  

  • Containers:
    • Used to group task
    • Provide structure to SSIS packages
    • Support repeating control flows
    • Can include other containers in addition to tasks
    • Enable mixtures of serial and parallel processing
      • By package design, multiple tasks can run at the same time
      • Precedent constraints
        • Can also be used to control the workflow task
        • Does not allow tasks to be executed successfully unless previous tasks have executed successfully and are viewed as green arrows
      •  

      • Types of Containers
        • Foreach Loop Container
          • To repeat tasks for each element in a collection
        • For Loop Container
          • To repeat tasks until a specified expression evaluates to false
        • Sequence Container
          • This container groups tasks and containers that must succeed or fail as a unit
        • Task Host Container
          • The task host container encapsulates a single task
  •  

  • Package property for concurrent executables
    • MaxConcurrentExecutable
      • Determines how many tasks can be executed at once in a package
      • Default is -1
        • This is the number of processors detected by the OS +2
          • So if you have 6 processors, you might see 8 tasks running in parallel
        • If this limit is exceeded, SSIS will randomly pick which task to begin processing first and then proceed to the remaining task once one task has completed.
    • If precedence constraints are removed, then potentially all tasks could execute at the same time
    • Parallelism can be controlled from the design of the package
  •  

  • Scale Out
    • This is a new feature of integration service from SQL Server 2017
    • The Scale Out Master is responsible for Scale Out management and receives package execution requests from users while the Scale Out Workers pull execution tasks from the Scale Out Master to run the packages
    • It enables you to distribute SSIS package execution more easily across multiple worker computers, and manage executions and workers from a single master computer
    • Abilities:
      • Run multiple packages deployed to the SSISDB catalog in parallel
      • Manage the Scale Out topology in the Scale Out Manager app that allows you to run multiple package executions simultaneously

 

Key Takeaway

[bctt tweet=”Performance in the control flow is generally based on the design of the package and the performance of the individual tasks cannot be affected except by a data flow task or by changing your environment.” quote=”Performance in the control flow is generally based on the design of the package, and the performance of the individual tasks cannot be affected except for a data flow task or changing your environment.” username=”Key2Consulting”]
 

2. Data Flow Performance

  • Source Adapters read data from sources
    • Simply put, they are the mechanism for grabbing data from the source
      • Uses a connection string
  • Connectivity improvements of SSIS 2017
    • You can now connect to the OData feeds of Microsoft Dynamics AX Online and Microsoft Dynamics CRM Online with the updated OData components
  • Data Source Performance
    • Limited by how fast the source can receive the data
    • Things to consider when pulling data
      • Query execution plan
      • Available resources on the server
      • Network performance
    • Data will move into the data flow only as fast as the source will allow it to be pulled

 
Question: How can I tell how fast my data is being pulled from the source?

Answer: In a simple data flow task, connect a row count to your source and then run it to a row count.

  • Row count
    • A low resource transformation that counts the rows that pass through the data flow and can be easily stored in a variable
  • Improving performance from your data source
    • Reduce the number of columns selected
      • Reframe from using SELECT *
    • Reduce the column width
    • Reduce data types of columns
    • Filter rows
    • Use a WHERE clause
    • Use “SQL Command” option instead of the “Table or View” option
      • Mainly for relational sources
    • Enable “Fast-Parse” option
  • Data is added to in-memory buffers
    • Buffers get processed through the transformations in the data flow task (DFT)

 

3. The Performance of SSIS Transformations

  • Based on what is done with the buffers of data from inside of the DFT
    • Synchronous transformations
      • Work on data within a buffer
    • Asynchronous transformations
      • Copy data from one buffer to another
      • Partially blocking asynchronous transformations
        • Work on a buffer by buffer basis
        • Pass buffers downstream as soon as possible
      • Fully blocking asynchronous transformations
        • Only creates output buffers after getting all input buffers
  • Improving the speed of your transformations
    • Eliminate unneeded transformations
    • Perform work in your source queries if possible
    • Remove unneeded columns
      • SSIS Debugger will give warnings of unused columns
    • Replace OLE DB Command transformation
      • Use staging table and Execute SQL task if possible
    • Don’t be afraid to redesign your data flow framework
  • Data in buffers passed to destinations
  • Destination then stores the data to be queried
    • Reminder: this is limited by how fast the destination can accept and store data
    • Data insertion performance, server resources, and the network performance are all factors to consider

 

4. Evaluating Performance

  • Auditing
    • Based on a common catalog introduced in the SQL Server 2012 release of SSIS, a complete monitoring and management solution was implemented to help you to design a distributed data movement solution and collect information around package execution statistics and results
    • SSIS package logs can be used. However, the logs can quickly become cumbersome depending on the complexity of the package
    • I often recommend building your own auditing tables
    • Capturing the ExecutionInstanceGUID will allow you to tie your audit table to the SSIS package logs

 

Contact Us

Thank you for reading! If you’re considering using SSIS in your environment or have already implemented Microsoft SQL Server and would like to see more gains from the tool, we can help you. Contact us today.
 

Keep your data analytics sharp by subscribing to our mailing list

If you haven’t subscribed to our email mailing list yet, be sure to do so for Key2 business intelligence, data warehousing, and custom application development insights, tips, best practices, and more.

 
 


Key2 Consulting is a data warehousing and business intelligence company located in Atlanta, Georgia. We create and deliver custom data warehouse solutions, business intelligence solutions, and custom applications.