Rambling thoughts

All About My thoughts…..

The Curious Case of SQL Server Disaster February 19, 2012

Filed under: Uncategorized — Aditya Badramraju @ 3:53 pm

Many people especially in IT industry won’t realize the importance of the DBA and his advices until they see DISASTER of their databases.

Last week I am evident of 3 continuous disasters of databases in 3 successive days and following is my experience with this curious case

First Disaster (Wednesday):

I am yawning in home to see a TV channel forcibly by Grand Ma, at that very moment I got a call from office saying that they are 823 Error Mails flooding in to mailbox.

I immediately got rid of my grand ma and switched on my desktop to see the mails and found that the databases were not accessible.

As I don’t have access from my home I asked my pal to just run the following command to check the status of the databases

SELECT state_desc,* FROM SYS.DATABASES

 

And the output we got is

clip_image002[8]

I then asked to run the following command on which the 823 error is reported

As the database is online in state why still my database is not opening? I know the following table from BOL

State

Definition

ONLINE

Database is available for access. The primary filegroup is online, although the undo phase of recovery may not have been completed.

OFFLINE

Database is unavailable. A database becomes offline by explicit user action and remains offline until additional user action is taken. For example, the database may be taken offline in order to move a file to a new disk. The database is then brought back online after the move has been completed.

RESTORING

One or more files of the primary filegroup are being restored, or one or more secondary files are being restored offline. The database is unavailable.

RECOVERING

Database is being recovered. The recovering process is a transient state; the database will automatically become online if the recovery succeeds. If the recovery fails, the database will become suspect. The database is unavailable.

RECOVERY PENDING

SQL Server has encountered a resource-related error during recovery. The database is not damaged, but files may be missing or system resource limitations may be preventing it from starting. The database is unavailable. Additional action by the user is required to resolve the error and let the recovery process be completed.

SUSPECT

At least the primary filegroup is suspect and may be damaged. The database cannot be recovered during startup of SQL Server. The database is unavailable. Additional action by the user is required to resolve the problem.

EMERGENCY

User has changed the database and set the status to EMERGENCY. The database is in single-user mode and may be repaired or restored. The database is marked READ_ONLY, logging is disabled, and access is limited to members of the sysadmin fixed server role. EMERGENCY is primarily used for troubleshooting purposes. For example, a database marked as suspect can be set to the EMERGENCY state. This could permit the system administrator read-only access to the database. Only members of the sysadmin fixed server role can set a database to the EMERGENCY state.

 

We don’t have any of the above except online. So I then decided to access something and see what actually error is

Alter database adventureworks set online

 

Then I got the error

The operating system returned error 21(The device is not ready.) to SQL Server during a read at offset 0×00000000074000 in file ‘H:\MSSQL\Data\AdventureWorks.mdf’. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

So what is this?????

This is one more state of database that is being INCONSISTENT:

In this case the database isn’t accessible at all, because recovery hasn’t run or completed and so the database is in an inconsistent state. It could just be logically inconsistent (e.g. a transaction modifying data hasn’t recovered) or worse it could structurally inconsistent (e.g. a system transaction modifying index linkages has’t recovered). Either way, SQL Server wants to prevent you from getting into the database because it doesn’t know what state the data and structures in the database are in. But if you don’t have a backup, you need to get into the database, no matter what state things are in.(courtesy : Paul)

The way to get rid of this is to bring the database in to Emergency,Run CheckDB and then if no errors bring the database online

 

Alter database adventureworks

set EMERGENCY

go

Once the database is in Emergency mode then it is only accessible by sysadmin J . Now it’s time to run DBCC CHECKDB

DBCC CHECKDB(‘AdventureWorks’,with_no_info_msgs)

 

Once you find there are no errors then it’s time to bring the database online

 

Alter database adventureworks

set ONLINE

go

 

Second Disaster (Wednesday):

We are happy that databases are online and I documented the issue to client by finding the root cause. After that I am on my way to home when I got the phone  from my pal saying we encountered same situation with another server, then I asked him to take the action as yesterday.

But by then we had a situation where one of our databases went to suspect state and we did approach with the following procedure

Alter database adventureworks

set EMERGENCY

go

 

Alter database adventureworks

set SINGLE_USER

go

 

DBCC CHECKDB(‘AdventureWorks’,with_no_info_msgs)

 

Here we saw bulk of allocation and consistency errors. Then we had two options

1.       Restoring the latest backup

2.       Running database consistency allowing data loss

Unfortunately we have a  SUPER backup team who failed to give us backup immediately. After a long discussion we decided to run database consistency as our client is confident that the data load will be done next day any ways.

 

So we then ran the following steps

 

DBCC CHECKDB (AdventuerWorks, REPAIR_ALLOW_DATA_LOSS) (Check before adapting to your environment)

 

We got the output as the following

 

The log for database ‘AdventureWorks’ has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

 

Followed by long chain where the database repaired and where it deleted the corrupted database.

We then brought database as multi user and online.

 

Alter database adventureworks

set MULTI_USER

go

 

Alter database adventureworks

set Online

go

Third Disaster (Wednesday):

Thus because of poor backup strategy we ended up in repairing the database with data loss. Fortunately we didn’t see huge data loss.

The disaster happened today is worst where the total system data files disk got corrupted. Again here I am explaining a situation where you should not be(HAVE BACKUPS PLEASE!!!!!)

Here we don’t have backups and somehow the server team managed to get sql server installed and now our task is to attach the production database

We tried to attach the database with single mdf file then we got the error as following

The log cannot be rebuilt because there were open transactions/users when the database was shutdown, no checkpoint occurred to the database, or the database was read-only. This error could occur if the transaction log file was manually deleted or lost due to a hardware or environment failure.

Then what we tried to do is

·         Create a new database with same name

·         Stop SQL Server

·         Swap the MDF Files and delete the log file

·         Start SQL Server

·         Take the database in to Emergency mode

·         Then in to Single User mode

·         Run Database consistency with allow data loss so that the log can be rebuilt forcibly

·         Yayyyyyyyyyy!!! We now have the log file J

·         Bring the database back to multi user and online.

Thus you have the database online.

Above methods are applicable ONLY when

·        You don’t have valid database backups

As an administrator its our first duty to satisfy above constraint but if we have left with no choice we always have Hacking methods J

Happy Troubleshooting J

About these ads
 

7 Responses to “The Curious Case of SQL Server Disaster”

  1. Viswanath Says:

    Your explaination is superr buddy

  2. Vamsi Says:

    Good Work !!!! Keep up the good show :)

  3. Pradeep Says:

    Brother, don’t you think, behind this disaster there is poor storage configuration?

  4. Pradeep Says:

    anyways i like to work on such kind of issues. Keep posting brother and will discuss sometime.

  5. manu0417 Says:

    Bro, the explanation is good. But repair_allow_data_loss option is not applicable to business. Good for a Test Stage environments though. The last line is the most important and should be highlited first in this post.

    HAVE A VALID BACKUP
    As an administrator its our first duty to satisfy above constraint :)

    Good going. Keep it up.

    • True buddy!!! Believe me there are companies who run important data and doesn’t have proper backups… and also there will be very lucky ppl who are so fortunate(;)) that their backup drive will also get corrupted…In that case this works…. anyways thanks for reading Bro.. Also another post is following on the same topics …off late discovered pretty interesting things..will discuss with you in person :)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 668 other followers