Originally, the Indy Library is a programming library which is available at http://www.nevrona.com/Indy or http://indy.torry.net under an Open Source license. This library is included with Borland Delphi 6, 7, C++Builder 6, plus all of the Kylix versions. Unfortunately, this library is hi-jacked and abused by some Chinese spam bots. All recent user-agents with the unmodified "Indy Library" string were of Chinese origin.
The following may help against rude and robots.txt-ignorant bots. However, there are more bad but smarter bots out there, which need more sophisticated countermeasures.
Trap them in a special /bot-trap directory:
Disclaimer: The following scripts are working for me. I share them just for your information. If you use them, you are doing so at own risk. Be sure not to screw up your web site or web server.
[*]Set up a special subdirectory /bot-trap (use another own name),
[*]Put an exclude statement into your /robots.txt,
Code: Select all
User-agent: *
Disallow: /bot-trap/
[*]Include a hidden link (a transparent 1x pixel GIF) at the beginning of your main entrance page /
Code: Select all
<a href="/bot-trap/"><img src="images/pixel.gif" border="0" alt=" " width="1" height="1"></a>
Caveats: this method is not bullet-proof, and there may be collateral damage, so check the trapped addresses regularly and in time to research and unblock those that may have been special and innocent user tools instead of bots.
Own experience here is about 1 or 2 false ones among 100 hits.
To prevent certain user-agents or IP address ranges from being trapped you could add some sort of whitelisting.
[*]Additionally, we have put a /bot-trap/index.php to notify the hostmaster by mail and automatically append the bot's IP address to an active blacklist file blacklist.dat. For the first start, create an empty ../blacklist.dat file and make it readable and writeable for the web server. Here is the text of the /bad-bot/index.php:
Code: Select all
<html>
<head><title> </title></head>
<body>
<p>There is nothing here to see. So what are you doing here ?</p>
<p><a href="http://your.domain.tld/">Go home.</a></p>
<?php
/* whitelist: end processing end exit */
if (preg_match("/10\.22\.33\.44/",$_SERVER['REMOTE_ADDR'])) { exit; }
if (preg_match("Super Tool",$_SERVER['HTTP_USER_AGENT'])) { exit; }
/* end of whitelist */
$badbot = 0;
/* scan the blacklist.dat file for addresses of SPAM robots
to prevent filling it up with duplicates */
$filename = "../blacklist.dat";
$fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
while ($line = fgets($fp,255)) {
$u = explode(" ",$line);
$u0 = $u[0];
if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
}
fclose($fp);
if ($badbot == 0) { /* we just see a new bad bot not yet listed ! */
/* send a mail to hostmaster */
$tmestamp = time();
$datum = date("Y-m-d (D) H:i:s",$tmestamp);
$from = "[email protected]";
$to = "[email protected]";
$subject = "domain-tld alert: bad robot";
$msg = "A bad robot hit $_SERVER['REQUEST_URI'] $datum \n";
$msg .= "address is $_SERVER['REMOTE_ADDR'], agent is $_SERVER['HTTP_USER_AGENT']\n";
mail($to, $subject, $msg, "From: $from");
/* append bad bot address data to blacklist log file: */
$fp = fopen($filename,'a+');
fwrite($fp,"$_SERVER['REMOTE_ADDR'] - - [$datum] \"$_SERVER['REQUEST_METHOD'] $_SERVER['REQUEST_URI'] $_SERVER['SERVER_PROTOCOL']\" $_SERVER['HTTP_REFERER'] $_SERVER['HTTP_USER_AGENT']\n");
fclose($fp);
}
?>
</body>
</html>
[*]To exclude those known bad bots from further visits of my pages, all my active PHP pages begin with a check of the blacklist.dat file by including this statement at the beginning:
Code: Select all
<?php include($DOCUMENT_ROOT . "/blacklist.php"); ?>
Code: Select all
<?php
$badbot = 0;
/* look for the IP address in the blacklist file */
$filename = "../blacklist.dat";
$fp = fopen($filename, "r") or die ("Error opening file ... <br>\n");
while ($line = fgets($fp,255)) {
$u = explode(" ",$line);
$u0 = $u[0];
if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;}
}
fclose($fp);
if ($badbot > 0) { /* this is a bad bot, reject it */
sleep(12);
print ("<html><head>\n");
print ("<title>Site unavailable, sorry</title>\n");
print ("</head><body>\n");
print ("<center><h1>Welcome ...</h1></center>\n");
print ("<p><center>Unfortunately, due to abuse, this site is temporarily not available ...</center></p>\n");
print ("<p><center>If you feel this in error, send a mail to the hostmaster at this site,<br>
if you are an anti-social ill-behaving SPAM-bot, then just go away.</center></p>\n");
print ("</body></html>\n");
exit;
}
?>
Article courtesy of Kloth