Components
of the SQL Server Engine
Figure 1-1 shows the
general architecture of SQL Server, which has four major components (three of
whose subcomponents are listed): protocols, the relational engine (also
called the Query Processor), the storage engine, and the SQLOS. Every batch
submitted to SQL Server for execution, from any client application, must
interact with these four components. (For simplicity, I’ve made some minor
omissions and simplifications and ignored certain “helper” modules among the
subcomponents.)
Figure 1-1: The major components of the SQL Server database engine
The
protocol layer receives the request and translates it into a form that the
relational engine can work with, and it also takes the final results of any
queries, status messages, or error messages and translates them into a form
the client can understand before sending them back to the client. The
relational engine layer accepts SQL batches and determines what to do with
them. For Transact-SQL queries and programming constructs, it parses,
compiles, and optimizes the request and oversees the process of executing the
batch. As the batch is executed, if data is needed, a request for that data
is passed to the storage engine. The storage engine manages all data access,
both through transaction-based commands and bulk operations such as backup,
bulk insert, and certain DBCC (Database Consistency Checker) commands. The
SQLOS layer handles activities that are normally considered to be operating
system responsibilities, such as thread management (scheduling),
synchronization primitives, deadlock detection, and memory management,
including the buffer pool.
Protocols
When
an application communicates with the SQL Server Database Engine, the
application programming interfaces (APIs) exposed by the protocol layer
formats the communication using a Microsoft-defined format called a tabular data stream
(TDS) packet. There are Net-Libraries on both the server and client
computers that encapsulate the TDS packet inside a standard communication
protocol, such as TCP/IP or Named Pipes. On the server side of the
communication, the Net-Libraries are part of the Database Engine, and that
protocol layer is illustrated in Figure 1-1. On the client side, the
Net-Libraries are part of the SQL Native Client. The configuration of the
client and the instance of SQL Server determine which protocol is used.
SQL
Server can be configured to support multiple protocols simultaneously, coming
from different clients. Each client connects to SQL Server with a single
protocol. If the client program does not know which protocols SQL Server is
listening on, you can configure the client to attempt multiple protocols
sequentially. The following protocols are available:
· Shared
Memory
The simplest protocol to use, with no configurable settings. Clients using the
Shared Memory protocol can connect only to a SQL Server instance running on
the same computer, so this protocol is not useful for most database activity.
Use this protocol for troubleshooting when you suspect that the other
protocols are configured incorrectly. Clients using MDAC 2.8 or earlier
cannot use the Shared Memory protocol. If such a connection is attempted, the
client is switched to the Named Pipes protocol.
· Named
Pipes
A protocol developed for local area networks (LANs). A portion of memory is
used by one process to pass information to another process, so that the
output of one is the input of the other. The second process can be local (on
the same computer as the first) or remote (on a networked computer).
· TCP/IP The most widely
used protocol over the Internet. TCP/IP can communicate across interconnected
networks of computers with diverse hardware architectures and operating
systems. It includes standards for routing network traffic and offers
advanced security features. Enabling SQL Server to use TCP/IP requires the
most configuration effort, but most networked computers are already properly
configured.
· Virtual Interface
Adapter (VIA) A protocol that works with VIA hardware. This is a
specialized protocol; configuration details are available from your hardware
vendor.
Tabular
Data Stream Endpoints
SQL
Server 2005 also introduces a new concept for defining SQL Server
connections: the connection is represented on the server end by a TDS
endpoint. During setup, SQL Server creates an endpoint for each of the four
Net-Library protocols supported by SQL Server, and if the protocol is
enabled, all users have access to it. For disabled protocols, the endpoint
still exists but cannot be used. An additional endpoint is created for the
dedicated administrator connection (DAC), which can be used only by members
of the sysadmin fixed server role. (I’ll discuss the DAC in more detail in
configuration chapter.)
The
Relational Engine
As
mentioned earlier, the relational engine is also called the query processor.
It includes the components of SQL Server that determine exactly what your
query needs to do and the best way to do it. By far the most complex
component of the query processor, and maybe even of the entire SQL Server product,
is the query optimizer, which determines the best execution plan for the
queries in the batch.
The
relational engine also manages the execution of queries as it requests data
from the storage engine and processes the results returned. Communication
between the relational engine and the storage engine is generally in terms of
OLE DB row sets. (Row set is the OLE DB term for a result set.)
The storage engine comprises the components needed to actually access and
modify data on disk.
The
Command Parser
The
command parser handles Transact-SQL language events sent to SQL Server. It
checks for proper syntax and translates Transact-SQL commands into an
internal format that can be operated on. This internal format is known as a query
tree. If the parser doesn’t recognize the syntax, a syntax error is
immediately raised that identifies where the error occurred. However,
non-syntax error messages cannot be explicit about the exact source line that
caused the error. Because only the command parser can access the source of
the statement, the statement is no longer available in source format when the
command is actually executed.
The
Query Optimizer
The
query optimizer takes the query tree from the command parser and prepares it
for execution. Statements that can’t be optimized, such as flow-of-control
and DDL commands, are compiled into an internal form. The statements that are
optimizable are marked as such and then passed to the optimizer. The
optimizer is mainly concerned with the DML statement SELECT, INSERT, UPDATE,
and DELETE, which can be processed in more than one way, and it is the
optimizer’s job to determine which of the many possible ways is the best. It
compiles an entire command batch, optimizes queries that are optimizable, and
checks security. The query optimization and compilation result in an
execution plan.
The
first step in producing such a plan is to normalize each query, which
potentially breaks down a single query into multiple, fine-grained queries.
After the optimizer normalizes a query, it optimizes it, which means
it determines a plan for executing that query. Query optimization is cost
based; the optimizer chooses the plan that it determines would cost the least
based on internal metrics that include estimated memory requirements, CPU
utilization, and number of required I/Os. The optimizer considers the type of
statement requested, checks the amount of data in the various tables
affected, looks at the indexes available for each table,
and then looks at a sampling of the data values kept for each index or column
referenced in the query. The sampling of the data values is called distribution
statistics. Based on the available information, the optimizer considers
the various access methods and processing strategies it could use to resolve
a query and chooses the most cost-effective plan.
The
SQL Manager
The
SQL manager is responsible for everything related to managing stored
procedures and their plans. It determines when a stored procedure needs
recompilation, and it manages the caching of procedure plans so that other
processes can reuse them.
The
SQL manager also handles auto parameterization of queries. In SQL Server
2008, certain kinds of ad hoc queries are treated as if they were
parameterized stored procedures, and query plans are generated and saved for
them. SQL Server can save and reuse plans in several other ways, but in some
situations using a saved plan might not be a good idea.
The
Database Manager
The
database manager handles access to the metadata needed for query compilation
and optimization, making it clear that none of these separate modules can be
run completely separately from the others. The metadata is stored as data and
is managed by the storage engine, but metadata elements such as the data
types of columns and the available indexes on a table must be available
during the query compilation and optimization phase, before actual query
execution starts.
The
Query Executor
The
query executor runs the execution plan that the optimizer produced, acting as
a dispatcher for all the commands in the execution plan. This module steps
through each command of the execution plan until the batch is complete. Most
of the commands require interaction with the storage engine to modify or
retrieve data and to manage transactions and locking.
The
Storage Engine
The
SQL Server storage engine has traditionally been considered to include all
the components involved with the actual processing of data in your database.
SQL Server 2005 separates out some of these components into a module called
the SQLOS. In fact, the SQL Server storage engine team at Microsoft actually
encompasses three areas: access methods, transaction management, and the
SQLOS.
Transaction
Services
A
core feature of SQL Server is its ability to ensure that transactions are atomic–that
is, all or nothing. In addition, transactions must be durable, which means
that if a transaction has been committed, it must be recoverable by SQL
Server no matter what–even if a total system failure occurs 1 millisecond
after the commit was acknowledged. There are actually four properties that
transactions must adhere to, called the ACID properties: atomicity,
consistency, isolation, and durability.
Locking
Operations Locking is a crucial function of a multi-user
database system such as SQL Server, even if you are operating primarily in
the snapshot isolation level with optimistic concurrency. SQL Server lets you
manage multiple users simultaneously and ensures that the transactions
observe the properties of the chosen isolation level. Even though readers
will not block writers and writers will not block readers in snapshot
isolation, writers do acquire locks and can still block other writers, and if
two writers try to change the same data concurrently, a conflict will occur
that must be resolved. The locking code acquires and releases various types
of locks, such as share locks for reading, exclusive locks for writing,
intent locks taken at a higher granularity to signal a potential “plan” to
perform some operation, and extent locks for space allocation. It manages
compatibility between the lock types, resolves deadlocks, and escalates locks
if needed. The locking code controls table, page, and row locks as well as
system data locks.
The
SQLOS
Whether
the components of the SQLOS layer are actually part of the storage engine
depends on whom you ask. In addition, trying to figure out exactly which
components are in the SQLOS layer can be rather like herding cats. I have
seen several technical presentations on the topic at conferences and have
exchanged e-mail and even spoken face to face with members of the product
team, but the answers vary. The manager who said he was responsible for the
SQLOS layer defined the SQLOS as everything he was responsible for, which is
a rather circular definition. Earlier versions of SQL Server have a thin
layer of interfaces between the storage engine and the actual operating
system through which SQL Server makes calls to the OS for memory allocation,
scheduler resources, thread and worker management, and synchronization objects.
However, the services in SQL Server that needed to access these interfaces
can be in any part of the engine. SQL Server requirements for managing
memory, schedulers, synchronization objects, and so forth have become more
complex. Rather than each part of the engine growing to support the increased
functionality, all services in SQL Server that need this OS access have been
grouped together into a single functional unit called the SQLOS. In general,
the SQLOS is like an operating system inside SQL Server. It provides memory
management, scheduling, IO management, a framework for locking and
transaction management, deadlock detection, and general utilities for
dumping, exception handling, and so on.
Another
member of the product team described the SQLOS to me as a set of data
structures and APIs that could potentially be needed by operations running at
any layer of the engine. For example, consider various operations that
require use of memory. SQL Server doesn’t just need memory when it reads in
data pages through the storage engine; it also needs memory to hold query
plans developed in the query processor layer. Figure 1-1 (shown earlier)
depicts the SQLOS layer in several parts, but this is just a way of showing
that many SQL Server components use SQLOS functionality.
|
No comments:
Post a Comment