Change Data Capture, also known as CDC, introduced the first time in SQL Server 2008 version, as a helpful feature to track and capture the changes that are performed on the SQL Server database tables, with no additional programming efforts. Before SQL Server 2016, Change Data Capture could be enabled on a SQL Server database only under the SQL Server Enterprise edition, which is not required starting from SQL Server 2016.
Change Data Capture tracks the INSERT, UPDATE and
DELETE operations on the database table, and records detailed information about
these changes in a mirrored table, with the same columns structure of the
source tables, and additional columns to record the description of these
changes. SQL Server writes one record for each INSERT statement showing the
inserted values, on record for each DELETE statement showing the deleted data
and two records for each UPDATE statement, the first one showing the data before
the change and the second one showing the data after performing the change.
For example, I have a simple dbo_Persons table here and have two records, Sathish & Abhinav. Now let's say that somebody goes and changes,
you know, Sathish to Sathish Chandra, then I would have to go and maintain some
kind of audit trail table, do you know, where I can get the old values and the
new values.
Yes, CDC
helps us to capture, insert, update and delete activities on table data. Now
enabling CDC is a two-step process.
The first
thing is we need to go and enable CDC on the database level and then we need to
go and define specific tables on which we need to go and enable CDC.
To enable CDC, we need to do go and fire this stored
procedure here,
--Enable on
database level
use sqldbahub
exec sys.sp_cdc_enable_db
Msg 22830, Level
16, State 1, Procedure sys.sp_cdc_enable_db_internal, Line 274 [Batch Start
Line 21]
Could not update
the metadata that indicates database sqldbahub is enabled for Change Data
Capture. The failure occurred when executing the command 'SetCDCTracked(Value =
1)'. The error returned was 15404: 'Could not obtain information about Windows
NT group/user 'AzureAD\SathishChandraBhanum', error code 0x54b.'. Use the
action and error to determine the cause of the failure and resubmit the
request.
Completion time: 2024-05-08T14:26:13.1966680+05:30
If you face above error:
--If you face
error on enabled CDC
EXEC sp_changedbowner 'sa'
So once you fire this stored procedure, It enables CDC on
the database level.
Once you have enabled CDC, it creates a couple of system tables
like in the system tables folder as below.
[cdc].[captured_columns]
[cdc].[change_tables]
[cdc].[ddl_history]
[cdc].[index_columns]
[cdc].[lsn_time_mapping]
Also it created a many stored procedures. if you go down below, you can see this
underscore CDC stored procedure, all of these are created for CDC.
Now the next step is now that we need to enable CDC on
the table level.
We have this simple table here called a dbo.Persons.
We'll let's go ahead and enable CDC on the dbo.Persons.
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'Persons',
@role_name = N'null',
@filegroup_name = N'PRIMARY'
@supports_net_changes = 1 –Use this, If the table had primary key
--if you face
error
The error
returned was 22836: ‘Could not update the metadata for database [database name]
to indicate that a Change Data Capture job has been added. The failure occurred
when executing the command ‘sp_add_jobstep_internal’.
SELECT srvname AS OldName FROM master.dbo.sysservers
SELECT SERVERPROPERTY('ServerName') AS NewName
exec sp_dropserver 'SQLDBAHUB';
GO
sp_addserver 'ST21PUNELT1685', local;
GO
Once enabled
CDC on the table level, it will create one more table in the system tables
folder like [cdc].[dbo_Persons_CT]
(always table name underscore with CT. )
Underscore
CT table, which actually captures insert, update and delete activities for a
particular table which we enabled CDC.
For this
example, [cdc].[dbo_Persons_CT]
table will capture all
the activities of insert, update and delete for the sales table.
Now CDC has
been enabled on the [dbo_Persons] table, table data look like below.
PersonID |
LastName |
FirstName |
Address |
City |
1 |
M |
Sathish |
Ameerpet |
HYD |
2 |
A |
Abhinav |
LBNagar |
Hyderabad |
Now if I will do some modification on the [dbo_Persons] table data, like change FirstName Sathish to Sathish Chandra.
PersonID |
LastName |
FirstName |
Address |
City |
1 |
M |
Sathish Chandra |
Ameerpet |
HYD |
2 |
A |
Abhinav |
LBNagar |
Hyderabad |
Now if I go and open [cdc].[dbo_Persons_CT] table data looks like below:
__$start_lsn |
__$end_lsn |
__$seqval |
__$operation |
__$update _mask |
PersonID |
LastName |
FirstName |
Address |
City |
__$command_id |
0x000000310000BC580003 |
NULL |
0x000000310000BC580002 |
3 |
0x04 |
1 |
M |
Sathish |
Ameerpet |
HYD |
1 |
0x000000310000BC580003 |
NULL |
0x000000310000BC580002 |
4 |
0x04 |
1 |
M |
Sathish
Chandra |
Ameerpet |
HYD |
1 |
You can see
that there are lots of fields here, just
look at the last two fields. So here you can see that, it's showing that, the
old value was sathish and then later it was changed to Sathish Chandra and the remaining
fields are unchanged.
in the same way, if I go and delete Abhinav, then will get one entry here saying that Abhinav was deleted, this one stands for delete.
__$start_lsn |
__$end_lsn |
__$seqval |
__$operation |
__$update_mask |
PersonID |
LastName |
FirstName |
Address |
City |
__$command_id |
0x000000310000BC580003 |
NULL |
0x000000310000BC580002 |
3 |
0x04 |
1 |
M |
Sathish |
Ameerpet |
HYD |
1 |
0x000000310000BC580003 |
NULL |
0x000000310000BC580002 |
4 |
0x04 |
1 |
M |
Sathish
Chandra |
Ameerpet |
HYD |
1 |
0x000000310000C1A00003 |
NULL |
0x000000310000C1A00002 |
1 |
0x1F |
2 |
A |
Abhinav |
LBNagar |
Hyderabad |
1 |
in the same way, if I go and insert new record Bhanu,
Now we can monitor insert, update and delete activities on a table dbo_Persons_CT.
Now there is
one more important field in this table here called as the operation field.
The additional columns include
- __$start_lsn and __$end_lsn that show the commit log sequence
number (LSN) assigned by the SQL Server Engine to the recorded change
- __$seqval that shows the order of that change
related to other changes in the same transaction,
- __$operation that
shows the operation type of the change, where 1 = delete, 2 = insert,
3 = update (before change), and 4 = update (after change)
- __$update_mask that
is a bit mask defined for each captured column, identifying the updating
columns
This
operation field actually helps you to track that the kind of activity has
happened and when this transaction occurred.
In other
words, you can track which kind of the operation like insert, delete or update
performed.
Now there
are a couple of more tables here and probably these tables can be interesting.
[cdc].[captured_columns]
[cdc].[change_tables]
[cdc].[ddl_history]
[cdc].[index_columns]
[cdc].[lsn_time_mapping]
The first
table, you know, which is very interesting is the captured_columns table. In
this table, basically it just keeps the record of what kind of columns it is tracking.
you can see that currently we have
enabled CDC on the dbo_Persons table. So we are tracking the dbo_Persons.
Second, it
has one more table here called as the change_tables table,
here it tracks on basically which tables has been enabled for CDC.
The most
important table is the dbo_Persons_CT table where you will get the old
values and new values with the operation field saying that what kind of
activities was conducted on the record.
Limitations
Change Data Capture can be easily used to
audit only the database DML changes, albeit it with no
option to monitor SELECT statement, with the negligible configuration effort.
On the other hand, to consider CDC as a SQL Server Audit solution, it requires
significant maintenance and administration effort. This includes automating an
archiving mechanism, as the tracking data will be kept in the change table for
a configurable number of days and will be stored in the same or different data
file, that should be also monitored and maintained.
In addition, the change tables will be stored under
each database, and a function will be created for each tracked table. This
makes it cumbersome and requires significant programming effort to create a
consolidated auditing report that reads the DML changes information from all
tables under the same database, from all databases under the same instance, or
cross multiple instance.
Another limitation for CDC feature as a SQL Server
Audit solution is difficult process that is required to handle the DDL changes
on CDC enabled table, as having the Change Data Capture enabled on the source
table will not prevent performing DDL changes on that table.
Also, if the SQL Server Agent service is not running, CDC capture job will not work, and the database log file will grow rapidly, even if the database recovery model is Simple, as the log truncation will not advance, even if a CHECKPOINT is performed, till all the changes that are waiting for capture will be gathered by CDC capture process.
No comments:
Post a Comment