Tuesday, 2 July 2019

Unable to connect to SQL servers Error 17300


I recently worked on issue in which we were unable to connect to the SQL instance and we were getting below error while we try to connect  even though the SQL service was running

A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233)

While analyzing the  error log I was able to find error 17300 which points to memory pressure , but the instance  was a newly provisioned one and there was no chance for any workloads causing memory pressure and while analyzing the logs further I found that while doing the post install tasks Junior DBA accidentally set  max memory to a very small value 

OK.. So now we know what change caused the issue and max memory need to be increased to fix the issue  . How can we increase the max memory where we are not able to connect to instance ?

These are the situations in which Dedicated Administrator Connections comes as savior

I took the DAC session by giving 'admin:Instnace name' in the connection window which popups while i open a new query window[Ctrl +N] in SSMS( if you try to take DAC connection by connecting to instance from object explorer it will fail with the message :"Dedicated administrator connections are not supported via SSMS as it establishes multiple connections by design" as object explorer make multiple connection to instance) another option to connect to DAC is using SQLCMD.



Once you have DAC connection run the command to change the max memory setting 

EXEC sp_configure 'show advanced option', '1'
GO
RECONFIGURE WITH OVERRIDE
GO
EXEC sp_configure 'max server memory', '10240'
GO
RECONFIGURE WITH OVERRIDE
GO 

After increasing max memory setting  normal SQL connection was working fine but i have an interesting question from my curios Junior DBA " what if he would have tried to restart the instance and instance failed to start due to lack of memory? " 

So let me show you how we can resolve such situation. I changed the max memory back to 128 MB and restarted the instance and  the instance failed to start and as expected there was 17300 logged in the error log

So how to change max memory setting now  as DAC is not an option when the SQL itself is down 😝

We still have a Savior "Minimal configuration Mode" which helps us when the SQL fails to start due to configuration issue , we can start Sql service in minimal configuration mode by giving -f in startup parameter or running "net start MSSQLSERVER /f" from an elevated command prompt.

Once the SQL Service was started in minimal configuration mode we took the DAC connection and changed the memory setting ,stopped and restarted the service in normal mode and now were are able to connect to SQL instance.



 















Monday, 18 March 2019

Taking log backup when there is no disk space for taking backup


Have you come across a situation in which log file filled up disk and you don't have enough space to take a log backup ? I have come across one ,let me share my learning here
 
One of my application team member contacted me and told their applications development environment is down. When i checked the SQL health i found that database log file have filled up the disk and log reuse wait was showing log backup

So what's next take a log backup right ? but unfortunately in my case since this  was non prod server log backups was never configured and we didn't have enough space for taking large log backup so what else can be done here ? even though there was  no point in time requirement but switching to simple recovery model was not an easy option as this database was part of availability group. Backup to 'NUL' option came to help in this scenario. I triggered a log backup to NUL backup log [databasename] to disk='NUL' which truncated the log file and  i was able shrink the database to  create space in log drive.


What is backup device NUL?
The NUL disk device( it is not NULL)  will discard all information sent to it,even you won't be able to create a file or folder NUL in windows. NUL is a system reserved device name. 



In short we will be able to mimic a backup but actually we won't be  having a backup file and it really don't server the purpose of a backup 😆. If you are taking a log backup to NUL SQL Server will treat it as a valid log backup(it doesn't take database to pseudo-simple recovery model like the truncate_only option we used before SQL 2008). Subsequent log backups will succeed   but you won't be able to restore the subsequent log backups as you took the log backup to NUL so take a differential or full backup if you want to have a restore able log backup chain.

Let me share backup and restore scenario created in my lab machine  where i took a log back to NUL

I took a log backup to NUL and next log backup completed successfully as you you can see below 


When i try to restore all the backup files ,log backup after the backup to NUL fails as earlier log backup with a lesser LSN is missing .


You will end up in same situation if you are trying to restore a differential backup taken after a full backup to NUL




















Sunday, 25 March 2018

Troubleshooting startup issues with process monitor

Recently i came across a startup issue with my SQL Server agent where it was not starting up with the error message 'The request failed or the service did not respond in a timely fashion consult the event log or other applicable error logs for details'

There was no useful information from the event viewer the only log i found for sql agent service in application log  is  'SQLServerAgent service successfully stopped.' which doesn't help me here

SQL Agent logs also didn't give any clue

2018-03-26 09:58:40 - ? [129] SQLSERVERAGENT starting under Windows NT service control
2018-03-26 10:12:24 - ? [131] SQLSERVERAGENT service stopping due to a stop request from a user, process, or the OS...
2018-03-26 10:12:26 - ? [000] Configuration option 'show advanced options' changed from 0 to 1. Run the RECONFIGURE statement to install. [SQLSTATE 01000] (Message 15457)  Configuration option 'Agent XPs' changed from 1 to 0. Run the RECONFIGURE statement to install. [SQLSTATE 01000] (Message 15457)  Configuration option 'show advanced options' changed from 1 to 0. Run the RECONFIGURE statement to install. [SQLSTATE 01000] (Message 15457)
2018-03-26 10:12:26 - ? [098] SQLServerAgent terminated (normally)

Since logs didn't give any clue i  though of checking for any registry or filesystem access/permission issue using process monitor

Process monitor can be download from this path : Download link

To list only Registry and filesystem activity we can chose show registry and filesystem activity from main toolbar


To list  events of  only SQLAGENT   i added a filter (CTRL+L) to display details only of process sqlagent.exe



Then i started event capture and tried starting the SQL server Agent service .I could see all the registry and filesystem activities captured in the procmon and i stopped the capture once the service failed to start
While analyzing the event i found an access denied message  on a registry path
On checking event properties i found that sqlserveragent was missing read and set value in registry path 'HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL13.MSSQLSERVER\SQLServerAgent' once i gave set value and read permission to this path SQL Agent started up successfully. On comparing the permission in few other servers i found that sqlserveragent was having full control in the path so i gave full control in order to avoid  issues which may occur due to lack of full access


Thursday, 25 January 2018

The '-K ' Startup parameter in SQL Server

Last week i created my first  SQL Instance in Amazon RDS,  Its pretty simple to  create an instance  and like any other Managed database service it does offers automated backups ,patching,monitoring etc and its easy to scale up as per your needs.

As Sql Error log is DBA's great friend i thought of  going through the error log and it always used to give me some great information and something interesting stroked me, the startup parameter they have used...........

2018-01-24 15:55:47.620 Server       Registry startup parameters: 
-d D:\RDSDBDATA\DATA\master.mdf
-e D:\RDSDBDATA\Log\ERROR
-l D:\RDSDBDATA\DATA\mastlog.ldf
-k 20.000000
-T 3226
-T 7806

yes its says about the master db file,error log location , Trace flag 3226 used to suppress logging of successful backup in errorlog , Trace Flag 7806 which is used to enable DAC on SQL Express edition(i created a express instacen). So whats -K option ??? i checked in  Microsoft docs on Database Engine Service Startup Options but didn't get any information and i didn't  get any usefully information from worlds favorite search engine too 😓 or i didn't user the correct key words to search  😋

After a long time searching got  details on '-K'  from the Checkpoint documentation It is used to throttle I/O  behavior based on the throughput of the I/O subsystem for some types of checkpoints.

In some cases, checkpoints might issue a substantial amount of I/O, causing the I/O subsystem to get flood with write requests, which can severely affect read performance.SQL Server includes a commandline option that allows throttling of checkpoint I/O by specifying  –k parameter, followed by a decimal number, which can be added to the list of startup parameters for the SQL Server service. The value specified indicates the number of megabytes per second that the checkpoint process can write. By using this –k option, the I/O overhead of checkpoints can be spread out and have a more measured effect.By default, the checkpoint process makes sure that SQL Server can recover databases within the recovery interval that you specify.So by using -k option may cause longer recovery time based on the specified value.The –k option doesn’t apply to indirect checkpoints. So in this case checkpoint I/O  is throttled to 20 megabytes per second

I have discussed this with a few DBA's on this option and no one had used this option or came across a scenario where they have observed checkpoint was causing I/O issues. If you guys have come across such a scenario please do share them in comments 

Wednesday, 22 March 2017

Is your database really in FULL/BULK recovery model?



We are aware of three recovery model (simple,bulk and full) and each recovery model's characteristics  in SQL Server. But do they show this characteristic always??

No, If we have not taken a full backup  after changing database from simple to bulk/Full or a database is created in bulk/Full recovery model, Database will not wait for a log backup to truncate log file but it truncates on checkpoints   which is behavior of database in simple recovery model and this mode is sometimes referred as pseudo simple recovery model(not documented). If you try to make a log backup of database in this state it will fail with below message

Msg 4214, Level 16, State 1, Line 1
BACKUP LOG cannot be performed because there is no current database backup.
Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.


That makes sense right , Database engine show this behavior as we are aware that it requires a full backup to do a point in time recovery, then why  let the log continue to grow when there is no way that the database can be recovered to a point in time



How can i spot if my database is in pseudo simple mode?

There is no direct way to find if a database is in pseudo simple mode, We can find this by checking last_log_backup_lsn column in sys.database_recovery_status , if  this column in NULL it means a transaction log back up cannot be performed because either the database is in SIMPLE recovery or there is no current database backup  and by checking recovery_model_desc column in sys.databases whether the database is in Full or BULK logged model


Below script will help us in identifying database in pseudo simple recovery model

select db.name,db.recovery_model_desc,drs.last_log_backup_lsn from sys.database_recovery_status drs join sys.databases db on db.database_id=drs.database_id 
where db.recovery_model in (1,2) and drs.last_log_backup_lsn IS NULL



In my instance below are the databases in pseudo recovery model