Within my company we have a windows based client-server application which uses IIS at the backend. | |
For months (since we virtualised) we have randomly been experiencing a ‘503 server unavailable’ error which has stopped the application completely.
As we didn’t have a great deal of information on the problem and we had many users jumping up and down we resolved the issue temporarily by restarting IIS using the ‘IIS reset /noforce’ command. This would keep the issue at bay for a few days and then it would occur again. One day I had the problem 4 times in one day so I decided to get to the bottom of it.
I found that the problem was not being caused by an error but by a safety mechanism built into IIS application pools called Rapid Fail Protection. Here is an explanation I found as to why the default application pool was being disabled :
“Rapid-fail protection is unique in that it doesn't apply to recycles based on requests and resources. Instead, an application pool is placed into rapid-fail protection if the IIS 6.0 W3SVC fails to ping the worker process, if the worker process crashes, or if its startup or shutdown time limit is exceeded. This is a sign to the service that the worker process is no longer available and for health reasons should be shut down. When a pool is placed into rapid-fail protection, W3SVC actually stops the app pool, and HTTP.sys returns 503 responses to all queued requests and to all new requests to that pool.”
OK so this explains part of the problem :
- Several worker processes encounter issues
- W3SVC fails to get an adequate response.
- W3SVC shuts the application pool down
- Users get the ‘503 server unavailable’ error.
This is good as it explains what is happening but why has this started happening since we virtualised.
In our case it was a simple tick box. Before we virtualised we had the application running on a physical server which didn’t have a lot of RAM and we would often get ‘No system memory’ errors and our application would stop working.
When we virtualised one of our engineers tried to address the issue by limiting each worker process to a specified amount of RAM before it recycled itself.
Note : To access DefaultAppPool Properties – Open IIS on your server > expand Application Pools > Right click DefaultAppPool>Select properties> Recycling tab
As a result the worker processes would often use up their allocated memory and would terminate and not be contactable. In busy periods this would happen to several of the worker processes all at the same time which would cause Rapid Fail Protection to be triggered.
So the solution was to untick the ‘Maximum used memory’ setting and restart IIS. As we had virtualised and increased our memory we could afford to give the application free reign.
Mystery solved.