Clustering
What
is Clustering?
A cluster is a group of
independent computer systems, referred to as nodes, working together as a unified computing resource. A cluster
provides a single name for clients to use and a single administrative
interface, and it guarantees that data is consistent across nodes.
§ Network Load Balancing (NLB),
§ Component Load Balancing (CLB), and
§ Microsoft Cluster Service (MSCS) Failover
Cluster.
Network Load Balancing
Network Load Balancing acts as a front-end
cluster, distributing incoming IP traffic across a cluster of servers, and is
ideal for enabling incremental scalability and outstanding availability for
e-commerce Web sites. Up to 32 computers
can be connected to share a single virtual IP address. NLB enhances
scalability by distributing its client requests across multiple servers within
the cluster. As traffic increases, additional servers can be added to the
cluster; up to 32 servers are possible in any one cluster. NLB also provides
high availability by automatically detecting the failure of a server and
repartitioning client traffic among the remaining servers within 10 seconds,
while it provides users with continuous service.
Component Load
Balancing
Component
Load Balancing distributes workload across multiple servers running a site's
business logic. It provides for
dynamic balancing of COM+ components across a set of up to eight identical
servers. Both CLB and Microsoft Cluster Service can run on the same group of
machines.
Failover
Clustering
Cluster Service acts as a back-end cluster; it
provides high availability for applications such as databases, messaging and
file and print services. MSCS attempts to minimize the effect of failure on the
system as any node (a server in the cluster) fails or is taken offline.
Figure 1. Three
Microsoft server technologies support clustering
MSCS
failover capability is achieved through redundancy across the multiple
connected machines in the cluster, each with independent failure states.
Windows 2003
- 8 Nodes
Windows 2008/2008R2 - 16 Nodes
Windows 2012/2012R2 - 64 Nodes
Each
node has its own memory, system disk, operating system and subset of the
cluster's resources. If a node fails,
the other node takes ownership of the failed node's resources (this process is
known as "failover"). Microsoft Cluster Service
then registers the network address for the resource on the new node so that
client traffic is routed to the system that is available and now owns the
resource. When the failed resource is
later brought back online, MSCS can be configured to redistribute resources and
client requests appropriately (this process is known as "failback").
Microsoft
Cluster Service is based on the shared-nothing clustering model. The
shared-nothing model dictates that while several nodes in the cluster may have
access to a device or resource, the resource is owned and managed by only one
system at a time.
Microsoft Cluster
Service is comprised of three key components:
The Cluster Service, Resource Monitor and
Resource DLLs.
The Cluster
Service
The Cluster Service is the core component and
runs as a high-priority system service. The Cluster Service controls cluster
activities and performs such tasks as coordinating
event notification, facilitating communication between cluster components,
handling failover operations and managing the configuration. Each cluster
node runs its own Cluster Service.
The Resource
Monitor
The Resource Monitor is an interface between the Cluster Service and the cluster resources,
and runs as an independent process. The Cluster Service uses the Resource
Monitor to communicate with the resource DLLs. The DLL handles all
communication with the resource, so hosting the DLL in a Resource Monitor
shields the Cluster Service from resources that misbehave or stop functioning.
Multiple copies of the Resource Monitor can be running on a single node,
thereby providing a means by which unpredictable resources can be isolated from
other resources.
The Resource DLL
The third key Microsoft Cluster Service
component is the resource DLL. The Resource Monitor and resource DLL
communicate using the Resource API, which is a collection of entry points,
callback functions and related structures and macros used to manage resources.
What
is a Quorum?
What is a quorum? To put it simply, a quorum is
the cluster’s configuration database. The database resides in a file named \MSCS\quolog.log. The quorum is
sometimes also referred to as the quorum log.
Although the
quorum is just a configuration database, it has two very important jobs. First
of all, it tells the cluster which node
should be active. Think about it for a minute. In order for a cluster to
work, all of the nodes have to function in a way that allows the virtual server
to function in the desired manner. In order for this to happen, each node must
have a crystal clear understanding of its role within the cluster. This is where the quorum comes into play.
The quorum tells the cluster which node is currently active and which node or
nodes are in standby.
It is
extremely important for nodes to conform to the status defined by the quorum.
It is so important in fact, that Microsoft has designed the clustering service
so that if a node cannot read the
quorum, that node will not be brought online as a part of the cluster.
The other
thing that the quorum does is to intervene when communications fail between
nodes. Normally, each node within a cluster can communicate with every other
node in the cluster over a dedicated network connection. If this network connection
were to fail though, the cluster would be split into two pieces, each
containing one or more functional nodes that cannot communicate with the nodes
that exist on the other side of the communications failure.
When this
type of communications failure occurs, the cluster is said to have been
partitioned. The problem is that both partitions have the same goal; to keep
the application running. The application can’t be run on multiple servers
simultaneously though, so there must be a way of determining which partition
gets to run the application. This is where the quorum comes in. The partition
that “owns” the quorum is allowed to continue running the application. The
other partition is removed from the cluster.
Types of Quorums
Standard quorum
Majority Node Set Quorum (MNS)
So far in
this article, I have been describing a quorum type known as a standard quorum. The main idea behind a
standard quorum is that it is a configuration database for the cluster and is
stored on a shared hard disk, accessible to all of the cluster’s nodes.
In Windows
Server 2003, Microsoft introduced a new type of quorum called the Majority Node Set Quorum (MNS). The
thing that really sets a MNS quorum apart from a standard quorum is the fact
that each node has its own, locally stored copy of the quorum database.
Types:
1) Quorum Disk
2) Local Only Quorum
3) MNS (Majority Node Set)
Windows 2008/2008 R2/2012 have different types of Quorums:
Cluster Aware Applications:
·
SQL Server
Database Services
·
SQL Server
Analysis Services
Cluster Unaware Applications:
·
SQL Server
Reporting Services
·
Integration
Services
·
Notification
Services
How
Clustering Works
In a two-cluster node Active / Active
setup, if any one of the nodes fail, then the another active node will take
over the active resources of the failed instance. It is always preferred while
creating two-node cluster that each node be connected to a shared disk array
using either fiber channel or SCSI cables.
The shared data in the cluster must be stored on shared disks, otherwise, when a failover occurs; the node which is taking over in the cluster pack cannot access it. As we are already aware, clustering does not help protect data or the shared disk array that it is stored on. So it is very important that you select a shared disk array that is very reliable and includes fault tolerance.
Both nodes of the cluster are also connected to each other via a private network. This private network is used for each node to keep track of the status of the other node. For example, if one of the node experiences a hardware failure, the other node will detect this and will automatically initiate a failover.
When clients initiate a connection, how will they know what to do when a failover occurs? This is the most intelligent part of Microsoft Cluster Services. When a user establishes a connection with SQL Server, it is through SQL Server’s own virtual name and virtual TCP/IP address. This name and address are shared by both of the servers in the cluster. In other words, both nodes can be defined as preferred owners of this virtual name and TCP/IP address.
The shared data in the cluster must be stored on shared disks, otherwise, when a failover occurs; the node which is taking over in the cluster pack cannot access it. As we are already aware, clustering does not help protect data or the shared disk array that it is stored on. So it is very important that you select a shared disk array that is very reliable and includes fault tolerance.
Both nodes of the cluster are also connected to each other via a private network. This private network is used for each node to keep track of the status of the other node. For example, if one of the node experiences a hardware failure, the other node will detect this and will automatically initiate a failover.
When clients initiate a connection, how will they know what to do when a failover occurs? This is the most intelligent part of Microsoft Cluster Services. When a user establishes a connection with SQL Server, it is through SQL Server’s own virtual name and virtual TCP/IP address. This name and address are shared by both of the servers in the cluster. In other words, both nodes can be defined as preferred owners of this virtual name and TCP/IP address.
Usually, a client will connect to the SQL Server cluster using the virtual name used by the cluster. And as far as a client is concerned, there is only one physical SQL Server, not two. Assuming that the X node of the SQL Server cluster is the node running SQL Server ‘A’ in an Active/Active cluster design, then the X node will respond to the client’s requests. But if the X node fails, and failover to the next node Y occurs, the cluster will still retain the same SQL Server virtual name and TCP/IP address ‘A’, although now a new physical server will be responding to client’s requests.
During the failover period, which can last up to several minutes, clients will be unable to access SQL Server, so there is a small amount of downtime when failover occurs. The exact amount of time depends on the number and sizes of the databases on SQL Server, and how active they are.
Clustering Terms
Cluster
Nodes
A
cluster node is a server within a cluster group. A cluster node can be Active
or it can be Passive as per SQL Server Instance installation.
Heartbeat
The heartbeat is a checkup mechanism arranged between two nodes using a private network set up to see whether a node is up and running. This occurs at regular intervals known as time slices. A failover is initiated, if heartbeat is not functioning, and another node in the cluster will take over the active resources.
The heartbeat is a checkup mechanism arranged between two nodes using a private network set up to see whether a node is up and running. This occurs at regular intervals known as time slices. A failover is initiated, if heartbeat is not functioning, and another node in the cluster will take over the active resources.
Private
Network
The Private Network is available among
cluster nodes only. Every node will have a Private Network IP address, which
can be ping from one node to another. This is to check the heartbeat between
two nodes.
Public
Network
The Public Network is available for
external connections. Every node will have a Public Network IP address, which
can be connected from any client within the network.
Shared Cluster Disk Array
A shared disk array is a collection of storage disks that is being
accessed by the cluster. This could be SAN or SCSI RAIDs. Windows Clustering
supports shared nothing disk arrays. Any one node can own a disk resource at
any given time. All other nodes will not be allowed to access it until they own
the resource (Ownership change occurs during failover). This protects the data
from being overwritten when two computers have access to the same drives
concurrently.
Quorum Drive
This is a logical drive assigned on the shared disk array
specifically for Windows Clustering. Clustering services write constantly on
this drive about the state of the cluster. Corruption or failure of this drive
can fail the entire cluster setup.
Cluster Name
This name refers to Virtual Cluster Name, not the physical node
names or the Virtual SQL Server names. It is assigned to the cluster as a
whole.
Cluster IP Address
This IP address refers to the address which all external connections
use to reach to the active cluster node.
Cluster Administrator Account
This account must be configured at the domain level, with
administrator privileges on all nodes within the cluster group. This account is
used to administer the failover cluster.
Cluster Resource Types
This includes any services, software, or hardware that can be
configured within a cluster. Ex: DHCP, File Share, Generic Application, Generic
Service, Internet Protocol, Network Name, Physical Disk, Print Spooler, and
WINS.
Cluster Group
Conceptually, a cluster group is a collection of logically grouped
cluster resources. It may contain cluster-aware application services, such as
SQL Server 2000.
SQL Server Network Name (Virtual Name)
This is the SQL Server Instance name that all client applications
will use to connect to the SQL Server.
SQL Server IP Address (Virtual IP Address)
This refers to the TCP/IP address that all client applications will
use to connect to SQL Server; the Virtual Server IP address.
SQL Server 2000 Full-text
Each SQL Virtual
Server has one full-text resource.
Microsoft Distributed Transaction Coordinator (MS DTC)
Certain SQL Server Components require MS DTC to be up and running.
MS DTC is shared for all named / default instances in cluster group.
SQL Server Virtual Server Administrator Account
This is the SQL Server service account, and it must follow all the
rules that apply to SQL Service user accounts in a non-clustered environment.
How to Cluster
Windows Server 2003
Before Installing Windows 2003 Clustering
Before you install
Windows 2003 clustering, we need to perform a series of important preparation
steps. This is especially important if you didn't build the cluster nodes, as
you want to ensure everything is working correctly before you begin the actual
cluster installation. Once they are complete, then you can install Windows 2003
clustering. Here are the steps you must take:
- Double
check to ensure that all the nodes are working properly and are configured
identically (hardware, software, drivers, etc.).
- Check to
see that each node can see the data and Quorum drives on the shared array
or SAN. Remember, only one node can be on at a time until Windows 2003
clustering is installed.
- Verify that
none of the nodes has been configured as a Domain Controller.
- Check to
verify that all drives are NTFS and are not compressed.
- Ensure that
the public and private networks are properly installed and configured.
- Ping each
node in the public and private networks to ensure that you have good
network connections. Also ping the Domain Controller and DNS server to
verify that they are available.
- Verify that
you have disabled NetBIOS for all private network cards.
- Verify that
there are no network shares on any of the shared drives.
- If you
intend to use SQL Server encryption, install the server certificate with
the fully qualified DNS name of the virtual server on all nodes in the
cluster.
- Check all
of the error logs to ensure there are no nasty surprises. If there are,
resolve them before proceeding with the cluster installation.
- Add the SQL
Server and Clustering service accounts to the Local Administrators group
of all the nodes in the cluster.
- Check to
verify that no antivirus software has been installed on the nodes.
Antivirus software can reduce the availability of clusters and must not be
installed on them. If you want to check for possible viruses on a cluster,
you can always install the software on a non-node and then run scans on
the cluster nodes remotely.
- Check to
verify that the Windows Cryptographic Service Provider is enabled on each
of the nodes.
- Check to
verify that the Windows Task Scheduler service is running on each of the
nodes.
- If you
intend to run SQL Server 2005 Reporting Services, you must then install
IIS 6.0 and ASP .NET 2.0 on each node of the cluster.
These are a lot of
things you must check, but each of these is important. If skipped, any one of
these steps could prevent your cluster from installing or working properly.
How to Install Windows Server 2003 Clustering
Now that all of your physical nodes and shared array or SAN is
ready, you are now ready to install Windows 2003 clustering. In this section,
we take a look at the process, from beginning to end.
To begin, you must start the Microsoft Windows 2003 Clustering
Wizard from one of the nodes. While it doesn't make any difference to the software
which physical node is used to begin the installation, I generally select one
of the physical nodes to be my primary (active) node, and start working there.
This way, I won't potentially get confused when installing the software.
If you are using a SCSI shared array, and for many SAN shared
arrays, you will want to make sure that the second physical node of your
cluster is turned off when you install cluster services on the first physical
node. This is because Windows 2003 doesn't know how to deal with a shared disk
until cluster services is installed. Once you have installed cluster services
on the first physical node, you can turn on the second physical node, boot it,
and then proceed with installing cluster services on the second node.
ew
in Windows Server 2003
These
are some of the improvements Windows Server 2003 has made in clustering:
·
Larger
clusters:
The Enterprise Edition now supports up to 8-node clusters. Previous editions
only supported 2-node clusters. The Datacenter Edition supports 8-node clusters
as well. In Windows 2000, it supported only 4-node clusters.
·
64-bit
support:
This feature allows clustering to take advantage of the 64-bit version of
Windows Server 2003, which is especially important to being able to optimize
SQL Server 2000 Enterprise Edition.
·
High
availability: With this update to the clustering service, the
Terminal Server directory service can now be configured for failover.
·
Cluster
Installation Wizard: A completely redesigned wizard allows
you to join and add nodes to the cluster. It also provides additional
troubleshooting by allowing you to view logs and details if things go wrong. It
can save you some trips to the Add/Remove Programs applet.
·
MSDTC
configuration: You can now configure MSDTC once and it is
replicated to all nodes. You no longer have to run the comclust.exe utility on
each node.
1) Installing
SQL Server SP on a cluster both SQL 2005 and 2012
When
applying Service Pack on a cluster follow Rolling upgrade. In SQL Server 2005
if patch is applied entire instance will face downtime as resource database and
all binaries are patched at same time and there is only one resource database
in shared disk.
Whereas in SQL
Server 2008 onwards, each node has its own resource database and hence patching
can be split between the nodes. We can first patch Passive node and then
restart the server (during this time business will run from Active Node) and
perform Failover and patch the previously active node (Now business will run
from New Active Node).
2)
Configuring Backups on a cluster?
Backups on a
cluster are to be taken to a dedicated SAN Shared Clustered Disk which is a
part of the Clustered Group. SQL Backups on cluster cannot happen to a local
drive and hence a clustered disk is good configuration setting considering a
risk of failover.
Jobs can be
created and configured to Shared Disk, so that if even if failover occurs the
jobs will re-run as per schedule and continue to use the shared disk.
3) How many IPs
are required for a Cluster?
This question can be answered only when we know the
number of nodes.
Assume number of nodes is n value then 2(n) +3n =
number of nodes
The three
additional IPs are
1) Windows Virtual IP
2) SQL Virtual IP
3) MSDTC IP
4) Multiple
Instance cluster (Active-Active)
If there are
multiple instances on a cluster to utilize Node hardware resources optimally
then that configuration is called Active-Active or Multi-Instance cluster.
5) Adding a Disk
on a cluster
Adding a disk is a multi-step process.
1) First Add
disk to the Cluster Administrator as Clustered Disk.
2) After adding
cluster disk, make sure that Clustered Disk is added to SQL Server Cluster
Group.
3) After Adding
to SQL Server Cluster Group, set Dependency to SQL Server Main Service with the
newly added clustered disk.
4) Verify in
sys.dm_io_shared_disks DMV in the SQL Server instance if the newly added drive
is visible.
7) What are
dependencies in a cluster? What is Dependency Report?
Dependencies are important for Cluster
functionality.
SQL Server Agent ->AND->SQL Server Main
Service -> AND -> All Disks + SQL Server Name
SQL Server Name -> AND -> SQL Server Virtual
IP
As a thumb rule
all dependencies mostly in SQL Server clustered instances will ideally be AND
dependency (except in the case of Multi-Subnet Failover Clusters)
8)
Possible Owners and Preferred Owners
Possible
Owners:-
It is the list
of all the nodes that are configured for a clustered instance. If a failover
occurs the choice of failover WILL/MUST be from one of the members from this
list. If a node is not a possible owner then failed over instance will not come
online on that node. If no possible
owner nodes are up, then the group will still failover to a node that’s not a
possible owner, but it will not come online.
Preferred
Owners:-
Preferred owners
are the nodes we would like to have it on under ideal conditions, but maybe not
the only one it can be on. For example, Node 1 and 3 are "Preferred"
owners, and nodes 1,2 and 3 are Possible owners, then if the service is on node
1 and node 1 fails, then the service will move to Node 3 and only go to Node 2
if both 1 and 3 are not available.
10) Clustering
Commands?
cluster /list
cluster node /status
cluster group /status
cluster network /status
cluster netinterface /status
cluster resource /status
cluster group "SQL Server (SQLSEENU143)"
/move:Node2
11) How to read
Quorum log?
Cluster log can be read from
C:\Windows\Cluster\Reports\Cluster.log file from each node.
Reading from Quorum Drive is not recommended as
Local Admin we would not have admin rights on the MSCS Cluster directory in
Quorum Drive.
cluster log /gen
Generates recent cluster.log on Node1 and Node2.
12) Cluster
Checks? IsAlive and LookAlive?
LookAlive
check
(called as Basic resource health check) verifies that SQL Server is running on
the current node. By default it checks every 5 seconds. If LookAlive check
fails Windows Cluster performs IsAlive check.
IsAlive
check
(called as thorough resource health check) runs every 60 seconds and verifies
instance is up and running or not using the command in Resource DLL called
SELECT @@SERVERNAME every 60 seconds. If this query fails, the check runs
additional retry logic to avoid stress-related failures.
sp_server_diagnostics
13) How to
failover SQL Server cluster using a command?
cluster group "SQL Server (SQL2K12)"
/move:Node22
14) Splitbrain
situation
A split-brain
scenario happens when all the network communication links between two or more
cluster nodes fail. In these cases, the cluster may be split into two or more
partitions that cannot communicate with each other.
HA clusters
usually use a heartbeat private network connection which is used to monitor the
health and status of each node in the cluster. If heartbeat communication fails
for any network reason, split-brain situation occurs (Partitioning). Every node
thinks that other node is down and there is a risk of starting services. So to
avoid this risk, Quorum updates the nodes about wellbeing of other nodes.
Quorum acts as a point of communication till Private network is up and running.
The node that
owns the quorum resource puts a reservation on the device every three seconds;
this guarantees that the second node cannot write to the quorum resource. When
the second node determines that it cannot communicate with the quorum-owning
node and wants to grab the quorum, it first puts a reset on the bus.
The reset breaks
the reservation, waits for about 10 seconds to give the first node time to
renew its reservation at least twice, and then tries to put a reservation on
the quorum for the second node. If the second node's reservation succeeds, it
means that the first node failed to renew the reservation. And the only reason
for the failure to renew is because the node is dead. At this point, the second
node can take over the quorum resource and restart all the resources.
15) What is the
significance of MSDTC? Can we configure Multiple MSDTC’s?
MSDTC is used
for distributed transactions between clustered SQL Server instances and any
other remote data source. If we need to enlist a query on a clustered instance
in a distributed transaction we need MSDTC running on the cluster as a
clustered resource. It can run on any node in your cluster - We usually have it
running on the passive node.
1) Before
installing SQL Server on a failover cluster, Microsoft strongly recommends that
you install and configure Microsoft Distributed Transaction Coordinator (MS
DTC)
2) SQL Server requires
MS DTC in the cluster for distributed queries and two-phase commit
transactions, as well as for some replication functionality.
3) Microsoft
only supports running MSDTC on cluster nodes as a clustered resource. We do not
recommend or support running MSDTC in stand-alone mode on a cluster. Using
MSDTC as a non-clustered resource on a Windows cluster is problematic and it
can cause data corruption if a cluster failover occurs.
4) To help
ensure availability between multiple clustered applications, Microsoft highly
recommends that the MS DTC have its own resource group and resources.
16) Why is SQL
Server Service Manual on a cluster?
Whenever node restarts each node should not attempt
to start SQL Server. Hence by design in SQL Server clustering the services are
configured as manual.
17) What to do
if Quorum fails? (Windows Task)
Quorum crash/failure is more of disk corruption that
would have occurred crashing the Quorum. Ideally we have windows team
addressing this issue through Monitoring implementation.
18) Intro to
Mirror/Log Shipping on Cluster?
19) Service SID?
It is a mechanism that assigns privileges to the
service itself, rather than to the account under which the service runs.
Service SIDs managed to improve our security because
they enable you to use the user Service Account with the least privileges
required.
20) How to
cluster troubleshooting?
1) Refer Cluster Administrator (cluadmin.msc) and
check Cluster Events.
2) Issues can be Disk Related, Network Related,
Service Related (SQL Server), Cluster Related.
3) As per the issue contact the respective team.
4) If SQL Server is the issue, Check Event Viewer on
why Service went down.
i) Check
Event Viewer
ii) Check
SQL Server Error Log
iii) Verify
for any errors and troubleshoot as per issue.
5) Additional sources of troubleshooting.
C:\Windows\Cluster\Report\cluster.log file will help in identifying
underlying issue in the cluster in the specific node. Cluster log would be
present on both the nodes.
http://www.sql-server-performance.com/articles/clustering/cluster_infrastructure_p1.aspx
http://www.sql-server-performance.com/articles/clustering/clustering_best_practices_p1.aspx
Cluster
Aware Applications:
SQL Server Database Services
SQL Server Analysis Services
Cluster
Unaware Applications:
SQL Server Reporting Services
Integration Services
Notification Service:
Instance
Aware Services:
SQL Server Main Service
SQL Server Agent Service
SQL Server Full Text Search
Instance
UnAware Services:
Browser Service
VSS Writer
SQL Server AD Helper
Prerequisites
for Configuring SQL Server Prerequisites of Cluster:-
1) SQL Server
Media on both nodes
2) Create
three Global Groups (Domain Groups) Optional
3) Create
service accounts
4) SQL Server
Virtual IP (Ask Network Team)
5) SQL Server
Virtual Name
6) S: drive
for Data/Log files as Shared iSCSI drives
7) Components
that are cluster aware are Database Services and Analysis Services
8) Configure
MSDTC as a Cluster Resource
9) Add Disk
Dependency in SQL Server group to the SQLData drives.
10) Hardware check on both nodes (equal)
11) Validate Windows Cluster
Sequence
of Cluster Resources during Failover:
Stopping Order
1) SQL Server Agent Service
2) SQL Server Main Service
3) SQL Server IP
4) SQL Server Name
5) All Disk(s)
Starting Order
1) All Disks
2) SQL Server IP
3) SQL Server Name
4) SQL Server Main Service
5) SQL Server Agent Service
Scenarios
in Cluster:
1) Applying SP in SQL Server 2005 cluster (v)
2) Adding a disk to the cluster for SQL Server (v)
5) Failovers and Failbacks. (v)
4) Adding/Deleting a Node in a cluster.
6) Preferred Owner and Possible owners. (v)
7) Look Alive and IsAlive. (v)
8) Changing the Virtual IP for the SQL Server
cluster. (V)
9) Master corruption in SQL Server cluster.
10) IP Addresses needed for Two Cluster
configuration.
If it is a two node cluster --Ã 2(N)+3 => 2(2)+3=7 IP’S
1 Public, 1 Private at Node1
1 Public, 1 Private at Node2
1 IP for Windows Cluster
1 Virtual IP for SQL Cluster
1 IP for MSDTC
1 IP for Quorum
1 IP for Backups (if third party backup solutions
are used)
Adding
a new disk to the cluster:
1) Contact Storage team checking for possibility of
extending the disk or Adding a new disk. Extending disk sometimes involves
downtime, so it depends on customer providing downtime.
2) Once the disk is either extended/added. Ensure
Windows team makes the disk as a Clustered Disk.
Cluadmin.msc->Storage->Add Disk
3) Add the disk as a resource under SQL Server
Cluster Group.
Cluadmin.msc->SQLServer Cluster Group-> Add
Storage-> Add the new clustered disk.
4) Set Disk Dependency.
Cluadmin.msc->SQLServer Cluster Group->Right
click on SQL Server Main
Service->Properties->Dependencies->Insert->AND (Disk Number).
Single
Instance:
Active/Passive clustering means having instances
running in the cluster as Active on one Node and second node is always passive
to take over responsibilities when First node crashes.
The terminology has been changed to Single Instance
Cluster to avoid confusion.
Multiple
Instance:
Active/Active clustering simply means having two
separate instances running in the cluster—one (or more) per machine.
The terminology has been changed to Multi-Instance
Cluster.
MSDTC:-
MSDTC is an acronym for Microsoft Distributed
Transaction Coordinator.
The Microsoft Distributed Transaction Coordinator
service (MSDTC) tracks all parts of the transactions process, even over
multiple resource managers on multiple computers.
This helps ensure that the transaction is committed,
if every part of the transaction succeeds, or is rolled back, if any part of
the transaction process fails.
Do we
need MSDTC? Is it Compulsory?
SQL 2005 does require MSDTC for setup, since it uses
a transactions to control setup on multiple nodes. However, SQL Server
2008/2008R2/2012 and SQL 2014 setup does NOT require MSDTC to install SQL.
N+1:-
Having one passive dedicated for failovers.
Assume 3 Node cluster, 2 Nodes are Active and 1 Node
is allocated to be Passive.
N+M:-
Having multiple passives dedicated for failovers.
Assume 5 Node cluster, 3 Nodes are Active and 2
Nodes are allocated to be Passive.
Geo
Cluster:-
Geo-cluster is a cluster between two different
subnets or a group of subnets. These subnets may be present at same place or at
different geographies.
Ideally Geo clustering involves when performing
clustering between different data centers.
Maximum number of instances on a clustered instance
is 25. 50 instances are possible if we choose SMB file shares.
Reason behind 25 instances on a cluster is due to
Shared Disk Drive Letter Availability.
Number
of Nodes on a Cluster:-
Windows 2003
- 8 Nodes
Windows 2008/2008R2 - 16 Nodes
Windows 2012/2012R2 - 64 Nodes
Quorum:
1) Quorum stores cluster configurations. Also called
as cluster config database
2) Quorum contains information of active owner
3) Quorum helps in communications during heartbeat
breakdown.
Types
of Quorums:
1)
Node Majority quorum mode -
This model requires an odd number of nodes. (Example
3).
Then the cluster can survive till 1 node failure.
Where majority of votes are available to keep the cluster alive.
-- Let’s say we are starting our cluster using N
nodes than, at any point of time, we must have at least (N + 1)/2 no of nodes
alive\working. Means this cluster can sustain up to (N-1)/2 node failures.
Example - (N + 1)/2
If N = 11 than (11 + 1)/2 =6
Then at any point of time It needs atleast\minimum 6
working nodes
2)
Node and Disk Majority quorum mode -
This model is combination of Node and Quorum disk
and it is used when there are even number of nodes.
This Quorum
Model can be used for clusters where the nodes are all in the one data center.
An extra vote gets added in the form of Disk. So that the risk of failure can
be reduced.
If there are 4 nodes, an extra node gets added to
make cluster survive till 2 failures.
-- Let’s say we
are starting our cluster using N nodes than, at any point of time, we must have
at least (N+1 + 1)/2 no of nodes alive\working. Means this cluster can sustain up
to (N+1 - 1)/2 node failures.
Example - (N+1 + 1)/2
If N = 10 than (10+1 + 1)/2 =6
then at any point of time It needs atleast\minimum 6
working nodes including disk vote , that is 5 working nodes + disk vote.
3)
Node and File Share Majority quorum mode –
This model is combination of Node and File Share
Application majority.
An extra vote gets added in form of File Share
Application so that the risk of failure can be reduced.
4) No
Majority: Disk Only quorum mode –
Traditional Windows 2003 Quorum Disk Model.
Recommend to discontinue use of this Model
Only Disk contains the Quorum and there is high risk
of failure of cluster if Quorum crashes.
Step-by-Step Configuring a 2-node multi-site cluster on Windows Server 2008 R2
Option 1 – place the file share in the primary
site.
Option 2 – place the file share in the secondary
site.
Option 3 – place the file share witness in a 3rd
geographic location
Configure the Cluster
Add the Failover Clustering
Role: Add
the Failover Clustering feature to both nodes of your cluster from Add Features
Wizard.
Change the names of your
network connections: It is best if you rename the connections on
each of your servers to reflect the network that they represent. This will make
things easier to remember later.
Make sure your public
network is first: Go into the Advanced Settings of your Network
Connections (hit Alt to see Advanced Settings menu) of each server and make
sure the Public network is first in the list.
Private network settings:
Your private network should only contain an IP address and Subnet mask. No
Default Gateway or DNS servers should be defined. Your nodes need to be able to
communicate across this network, so make sure the servers can communicate
across this network; add static routes if necessary.
Validate a Configuration:
The first step is to “Validate a Configuration”.
Open up the Failover Cluster Manager and click on
Validate a Configuration.
Add the cluster nodes:
The Validation Wizard launches and presents you the first screen as shown
below. Add the two servers in your cluster and click next to continue.
Select “Run only tests I
select”: A multi-site cluster does not need to pass the storage
validation (see Microsoft article). Toskip the storage validation process,click
on “Run only the tests I select” and click Continue.
Unselect the Storage test:
In the test selection screen, unselect Storage and click next
Confirm your selection:
You will be presented with the following confirmation screen. Click Next to continue.
View the validation report:
If you have done everything right, you should see a summary page that looks
like the Notice that the yellow exclamation point indicates that not all of the
tests were run. This is to be expected in a multi-site cluster because the
storage tests are skipped. As long as everything else checks out OK, you can
proceed. If the report indicates any other errors, fix the problem, re-run the
tests, and continue.
Create your cluster:
You are now ready to create your cluster. In the Failover Cluster Manager,
click on Create a Cluster.
Skip the validation test:
The next step asks whether or not you want to validate your cluster. Since you
have already done this you can skip this step. Note this will pose a little bit
of a problem later on if installing SQL as it will require that the cluster has
passed validation before proceeding. When we get to that point I will show you
how to by-pass this check via a command line option in the SQL Server setup.
For now, choose No and Next.
Choose a unique name and IP
address: create a name for this cluster and IP for administering
this cluster. This will be the name that you will use to administer the
cluster, not the name of the SQL cluster resource which you will create later.
Enter a unique name and IP address and click next.
Note: This is also the computer name that will need
permission to the File Share Witness as described later in this document.
Confirm your choices:
Confirm your choices and click next.
View the report to find out
what the warning is all about:
if you have done everything right you will see the Summary page. Notice
the yellow exclamation point; obviously something is not perfect. Click on View
Report to find out what the problem may be.
Implementing
a Node and File Share Majority quorum
We need to identify the server that will hold our
File Share witness. Remember, as we discussed earlier, this File Share witness
should be located in a 3rd location, accessible by both nodes of the cluster.
Once you have identified the server, share a folder as you normally would share
a folder. In my case, I create a share called MYCLUSTER on a server named
DEMODC.
The key thing to remember about this share is
that you must give the cluster
computer name read/write permissions to the share at both the Share
level and NTFS level permissions. If you recall back at Figure 13, I created my
cluster and gave it the name “MYCLUSTER”. You will need to make sure you give
the cluster computer account read/write permissions
Give the cluster computer
account share level permissions: Give
the cluster computer account share level permissions
Change your quorum type: Now with the shared folder in place and the appropriate
permissions assigned, you are ready to change your quorum type. From Failover
Cluster Manager, right-click on your cluster, choose More Actions and Configure
Cluster Quorum Settings.
Choose Node and File Share Majority: On the next screen choose Node and File Share Majority and
click next.
Choose your file share
witness: In this
screen, enter the path to the file share you previously created and click next.
Click Next to confirm your quorum change to Node and File Share
Majority: Confirm that
the information is correct and click next.
A successful quorum change: Assuming you did everything right, you should see the
following Summary page.
No comments:
Post a Comment