The Stages of Query Compilation
Excerpt by Don Kiely | November 13, 2013
Compiling a query involves three stages: parsing, algebrization, and optimization. The process transforms the logical T-SQL code you write into the physical operators that define the steps that SQL Server will perform to execute the query and return results. Below shows a very high-level view of the entire process.
When SQL Server receives a batch of T-SQL statements from a client application (which includes Management Studio), it first checks to see if a plan exists in the plan cache for the query. If the query was recently executed by this or another session, then the query doesn't need compilation. The plan is passed to the execution engine, and the process is done. If no match is found in the plan cache, SQL Server parses the query to check the correct T-SQL syntax, including the use
of valid SQL identifiers for object names-although not yet to verify that all referenced objects exist-as well as the spelling and use of keywords. Then for each statement in the batch, the query processor sees if there is an existing plan guide and if it is in the cache. A plan guide is an object that you can define to specify a query plan for a query. If there is a match that is in the plan cache, the processor uses the plan guide for that statement and sends it off for execution. If there is no match, the query is parameterized, which essentially creates a version of the query with separate parameters, and it once again checks to see if the parameterized query is in the plan cache. If there is a version in the plan cache, it is sent to the execution engine. Otherwise, if there is no matching plan guide or parameterized query in the plan cache, the query processor has to go through the process of creating an execution plan for the query. The first step is algebrization, also called normalization.
This step creates a logical tree, sometimes called a parse tree, that represents the logical steps needed to execute the query. A primary task of algebrization is binding, which checks whether referenced tables and columns exist, loads metadata about those objects, and adds information about implicit conversions needed. The next step is optimization, which, as its name implies, performs any refinements that can make the execution plan more efficient. Then the plan is added to the plan cache and sent to the execution engine.
TIP: Not all execution plans are cached. For example, you can define a stored procedure with the WITH RECOMPILE option, which causes SQL Server to generate a new execution plan every time that code executes. In that case, SQL Server doesn't bother caching the plan since there is no point.
Caching execution plans is one of the major ways that SQL Server dramatically increases the efficiency of query execution. Creating an execution plan can be a time-consuming operation, so if most queries can use a cached plan the server will be able to process far more queries. Even if an execution plan for a query is in the plan cache, SQL Server may decide to recompile the query anyway and throw out the old plan. This process is called recompilation, and it occurs for two broad reasons, correctness and optimality.
- Correctness: The query processor might determine that an existing cached plan could return incorrect results. In that case, it recompiles the statement or batch. There are numerous reasons why a plan might return incorrect results, including schema changes; added, changed, or dropped indexes; statistics updates; use of WITH RECOMPILE; changed environment settings (usually SET statements); or an explicit call to the sp_recompile system stored procedure.
- Optimality: SQL Server tracks data distributions with statistics, including information about the number of rows in a table as well as changes to the data in individual columns. The query processor uses a threshold value to decide whether it should recompile a query. Essentially, if the data has changed too much, it could affect the choice of the optimal execution plan and so SQL Server recompiles it.
The benefit of recompilation, even though it is a hit against performance, is that a query with an execution plan that is incorrect or suboptimal is likely to perform far worse than with a correct, optimal plan. It could even return incorrect results in extreme cases. So the performance hit of recompilation is usually well worth the performance benefits.
NOTE: In SQL Server 2005 and later, individual statements can be recompiled instead of the entire batch, an enormous performance boost for long batches or complicated stored procedures and other code modules that can take some time to compile.
It is important to understand that the query processor is not looking for the absolute best execution plan possible. A complex query might have many thousands of execution plans that return correct results, and it might take hours to find the best plan that saves mere milliseconds off query execution time. One of the tasks that the optimizer does is to look for a trivial plan, which has a known and constant cost, involves no parameters, and only uses system functions. An example of a query with a trivial plan is:
SELECT * FROM dbo.Customers;
Because this query returns all rows and all columns from a relatively small table, no amount of analysis is likely to find one method of scanning the table that is substantially better, in terms of performance, than another. There just aren't that many ways to perform this query. If the optimizer finds a trivial plan, it returns that execution plan and does no further work.
This post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer: LocalDB Databases course written by expert Don Kiely.