Components of the SQL Server Engine
Figure 1-1 shows the
general architecture of SQL Server, which has four major components Protocols, the Relational engine (also called the Query Processor), the storage engine, and the SQLOS. Every batch submitted to SQL
Server for execution, from any client application, must interact with these
four components.
Figure 1-1: The major components of the SQL Server database engine
The
protocol layer receives the request and translates it into a form that the
relational engine can work with, and it also takes the final results of any
queries, status messages, or error messages and translates them into a form
the client can understand before sending them back to the client. The
relational engine layer accepts SQL batches and determines what to do with
them. For Transact-SQL queries and programming constructs, it parses,
compiles, and optimizes the request and oversees the process of executing the
batch. As the batch is executed, if data is needed, a request for that data
is passed to the storage engine. The storage engine manages all data access,
both through transaction-based commands and bulk operations such as backup,
bulk insert, and certain DBCC (Database Consistency Checker) commands. The SQLOS
layer handles activities that are normally considered to be operating system
responsibilities, such as thread management (scheduling), synchronization
primitives, deadlock detection, and memory management, including the buffer
pool.
Protocols
When
an application communicates with the SQL Server Database Engine, the
application programming interfaces (APIs) exposed by the protocol layer
formats the communication using a Microsoft-defined format called a tabular data stream
(TDS) packet. There are Net-Libraries on both the server and client
computers that encapsulate the TDS packet inside a standard communication
protocol, such as TCP/IP or Named Pipes. On the server side of the
communication, the Net-Libraries are part of the Database Engine, and that protocol
layer is illustrated in Figure 1-1. On the client side, the Net-Libraries are
part of the SQL Native Client. The configuration of the client and the
instance of SQL Server determine which protocol is used.
SQL
Server can be configured to support multiple protocols simultaneously, coming
from different clients. Each client connects to SQL Server with a single
protocol. If the client program does not know which protocols SQL Server is
listening on, you can configure the client to attempt multiple protocols
sequentially. The following protocols are available:
· Shared
Memory
The simplest protocol to use, with no configurable settings. Clients using
the Shared Memory protocol can connect only to a SQL Server instance running
on the same computer, so this protocol is not useful for most database
activity. Use this protocol for troubleshooting when you suspect that the
other protocols are configured incorrectly. Clients using MDAC 2.8 or earlier
cannot use the Shared Memory protocol. If such a connection is attempted, the
client is switched to the Named Pipes protocol.
· Named
Pipes
A protocol developed for local area networks (LANs). A portion of memory is
used by one process to pass information to another process, so that the
output of one is the input of the other. The second process can be local (on
the same computer as the first) or remote (on a networked computer).
· TCP/IP The most widely
used protocol over the Internet. TCP/IP can communicate across interconnected
networks of computers with diverse hardware architectures and operating
systems. It includes standards for routing network traffic and offers
advanced security features. Enabling SQL Server to use TCP/IP requires the
most configuration effort, but most networked computers are already properly
configured.
· Virtual Interface
Adapter (VIA) A protocol that works with VIA hardware. This is a
specialized protocol; configuration details are available from your hardware
vendor.
Tabular
Data Stream Endpoints
SQL
Server 2005 also introduces a new concept for defining SQL Server
connections: the connection is represented on the server end by a TDS
endpoint. During setup, SQL Server creates an endpoint for each of the four
Net-Library protocols supported by SQL Server, and if the protocol is
enabled, all users have access to it. For disabled protocols, the endpoint
still exists but cannot be used. An additional endpoint is created for the
dedicated administrator connection (DAC), which can be used only by members
of the sysadmin fixed server role. (I’ll discuss the DAC in more detail in
configuration chapter.)
The
Relational Engine
As
mentioned earlier, the relational engine is also called the query processor.
It includes the components of SQL Server that determine exactly what your
query needs to do and the best way to do it. By far the most complex
component of the query processor, and maybe even of the entire SQL Server
product, is the query optimizer, which determines the best execution plan for
the queries in the batch.
The
relational engine also manages the execution of queries as it requests data
from the storage engine and processes the results returned. Communication
between the relational engine and the storage engine is generally in terms of
OLE DB row sets. (Row set is the OLE DB term for a result set.)
The storage engine comprises the components needed to actually access and
modify data on disk.
The Command Parser
The command parser
handles Transact-SQL language events sent to SQL Server. It checks for proper
syntax and translates Transact-SQL commands into an internal format that can
be operated on. This internal format is known as a query tree. If the
parser doesn’t recognize the syntax, a syntax error is immediately raised
that identifies where the error occurred. However, non-syntax error messages
cannot be explicit about the exact source line that caused the error. Because
only the command parser can access the source of the statement, the statement
is no longer available in source format when the command is actually
executed.
The Query Optimizer
The query optimizer takes the query tree
from the command parser and prepares it for execution. Statements that can’t
be optimized, such as flow-of-control and DDL commands, are compiled into an
internal form. The statements that are optimizable are marked as such and
then passed to the optimizer. The optimizer is mainly concerned with the DML
statement SELECT, INSERT, UPDATE, and DELETE, which can be processed
in more than one way, and it is the optimizer’s job to determine which of the
many possible ways is the best. It compiles an entire command batch,
optimizes queries that are optimizable, and checks security. The query
optimization and compilation result in an execution plan.
The
first step in producing such a plan is to normalize each query, which
potentially breaks down a single query into multiple, fine-grained queries.
After the optimizer normalizes a query, it optimizes it, which means
it determines a plan for executing that query. Query optimization is cost
based; the optimizer chooses the plan that it determines would cost the least
based on internal metrics that include estimated memory requirements, CPU
utilization, and number of required I/Os. The optimizer considers the type of
statement requested, checks the amount of data in the various tables
affected, looks at the indexes available for each table,
and then looks at a sampling of the data values kept for each index or column
referenced in the query. The sampling of the data values is called distribution
statistics. Based on the available information, the optimizer considers
the various access methods and processing strategies it could use to resolve
a query and chooses the most cost-effective plan.
The SQL Manager
The SQL manager is responsible for
everything related to managing stored procedures and their plans. It
determines when a stored procedure needs recompilation, and it manages the
caching of procedure plans so that other processes can reuse them.
The
SQL manager also handles auto parameterization of queries. In SQL Server
2008, certain kinds of ad hoc queries are treated as if they were
parameterized stored procedures, and query plans are generated and saved for
them. SQL Server can save and reuse plans in several other ways, but in some
situations using a saved plan might not be a good idea.
The Database
Manager
The database manager handles access to the
metadata needed for query compilation and optimization, making it clear that
none of these separate modules can be run completely separately from the
others. The metadata is stored as data and is managed by the storage engine,
but metadata elements such as the data types of columns and the available
indexes on a table must be available during the query compilation and
optimization phase, before actual query execution starts.
The Query Executor
The query executor runs the execution plan
that the optimizer produced, acting as a dispatcher for all the commands in
the execution plan. This module steps through each command of the execution
plan until the batch is complete. Most of the commands require interaction
with the storage engine to modify or retrieve data and to manage transactions
and locking.
The Storage Engine
The SQL Server storage engine has
traditionally been considered to include all the components involved with the
actual processing of data in your database. SQL Server 2005 separates out
some of these components into a module called the SQLOS. In fact, the SQL
Server storage engine team at Microsoft actually encompasses three areas:
access methods, transaction management, and the SQLOS.
Transaction
Services
A core feature of SQL Server is its ability
to ensure that transactions are atomic–that is, all or nothing. In
addition, transactions must be durable, which means that if a transaction has
been committed, it must be recoverable by SQL Server no matter what–even if a
total system failure occurs 1 millisecond after the commit was acknowledged.
There are actually four properties that transactions must adhere to, called
the ACID properties: atomicity, consistency, isolation, and durability.
Locking
Operations Locking is a crucial function of a multi-user database
system such as SQL Server, even if you are operating primarily in the
snapshot isolation level with optimistic concurrency. SQL Server lets you
manage multiple users simultaneously and ensures that the transactions
observe the properties of the chosen isolation level. Even though readers
will not block writers and writers will not block readers in snapshot
isolation, writers do acquire locks and can still block other writers, and if
two writers try to change the same data concurrently, a conflict will occur
that must be resolved. The locking code acquires and releases various types
of locks, such as share locks for reading, exclusive locks for writing,
intent locks taken at a higher granularity to signal a potential “plan” to
perform some operation, and extent locks for space allocation. It manages
compatibility between the lock types, resolves deadlocks, and escalates locks
if needed. The locking code controls table, page, and row locks as well as
system data locks.
The
SQLOS
Whether
the components of the SQLOS layer are actually part of the storage engine
depends on whom you ask. In addition, trying to figure out exactly which
components are in the SQLOS layer can be rather like herding cats. I have
seen several technical presentations on the topic at conferences and have
exchanged e-mail and even spoken face to face with members of the product
team, but the answers vary. The manager who said he was responsible for the
SQLOS layer defined the SQLOS as everything he was responsible for, which is
a rather circular definition. Earlier versions of SQL Server have a thin
layer of interfaces between the storage engine and the actual operating
system through which SQL Server makes calls to the OS for memory allocation,
scheduler resources, thread and worker management, and synchronization
objects. However, the services in SQL Server that needed to access these
interfaces can be in any part of the engine. SQL Server requirements for
managing memory, schedulers, synchronization objects, and so forth have
become more complex. Rather than each part of the engine growing to support
the increased functionality, all services in SQL Server that need this OS
access have been grouped together into a single functional unit called the
SQLOS. In general, the SQLOS is like an operating system inside SQL Server.
It provides memory management, scheduling, IO management, a framework for
locking and transaction management, deadlock detection, and general utilities
for dumping, exception handling, and so on.
Another
member of the product team described the SQLOS to me as a set of data
structures and APIs that could potentially be needed by operations running at
any layer of the engine. For example, consider various operations that
require use of memory. SQL Server doesn’t just need memory when it reads in
data pages through the storage engine; it also needs memory to hold query
plans developed in the query processor layer. Figure 1-1 (shown earlier)
depicts the SQLOS layer in several parts, but this is just a way of showing
that many SQL Server components use SQLOS functionality.
|
No comments:
Post a Comment