I recently migrated a server to a new VHost that was supposed to improve the performance – however after the upgrade the performance actually was worse.
Looking at the system load I discovered that the load average was at about 3.5 – with only 2 cores available this corresponds to server overload by almost 2x.
Further looking at the logs revealed that this unfortunately was not due to the users taking interest in the site, but due to various bots hammering on the server. Actual users would be probably drawn away by the awful page load times at this point.
Asking the bots to leave
To improve page loading times, I configured my robots.txt as following
User-agent: *
Disallow: /
This effectively tells all bots to skip my site. You should not do this as you will not be discoverable at e.g. Google.
But here I just wanted to allow my existing users to use the site. Unfortunately the situation only slightly improve; the system load was still over 2.
From the logs I could tell that all bots were actually gone, except for
- SemrushBot by semrush.com
- MJ12Bot by majestic.com
- DotBot by Moz.com
But those were enough to keep the site (PHP+MySQL) overloaded.
The above bots crawl the web for their respective SEO analytics company which sell this information to webmasters. This means that unless you are already a customer of these companies, you do not benefit from having your site crawled.
In fact, if you are interested in SEO analytics for your website, you should probably look elsewhere. In the next paragraph we will block these bots and I am by far not the first one recommending this.
Making the bots leave
As the bots do not respect the robots.txt, you will have to forcefully block them. Instead of the actual webpages, we will give them a 410/ 403 which prevents them touching any PHP/ MySQL resources.
On nginx, add this to your server section:
if ($http_user_agent ~* (SemrushBot|MJ12Bot|DotBot)) {
return 410;
}
For Apache2.4+ do:
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase MJ12Bot bad_bot
BrowserMatchNoCase DotBot bad_bot
Order Deny,Allow
Deny from env=bad_bot
For additional fun you could also given them a 307 (redirect) to their own websites here.