In some cases, the maintenance page can get indexed!
This is a screenshot of the top result for [jobfinder], our “green jobs” listing at jobs.care2.com:
The problem is that the site got indexed by Googlebot while we had it offline. The title and snippet in the SERP came from our maintenance page, which is hosted by a secondary webserver that responds to all requests when the production websites are offline. We retained our #1 ranking for [jobfinder], despite the not-relevant title and META description.
How can this be prevented?
robots.txt could be used to disallow all spider access during maintenance, but this is not a good answer. Google caches the file’s contents, which means, first, chances are googlebot wouldn’t check the file during the maintenance window, but would instead rely on the cached contents from a previous visit. Or, if
robots.txt was read from the maintenance server, googlebot might not come back to the production site for a long time (until the cache expired).
On-page NOINDEX meta tags might work, but again, these might be cached, which we don’t want to happen.
I think the best answer is to use proper HTTP header codes. We had been serving
200 OK headers from the maintenance server, which is really just incorrect.
So, we have reconfigured the webserver that hosts our maintenance page to serve
HTTP 503 Service Unavailable headers. According to w3.org, HTTP 503 means:
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay.
I have not seen confirmation from Google, but after making this change I did discover a relevant thread at WMW: Pages indexed with our temporary “site update” message