5% off a wide range of trips from from Gap Adventures
How to identify and block unwanted visitors to your website - use Spamblocker
by Alec Scaresbrook (March 2009)
Background to Spamblocker
I recently wrote a piece of PHP code (called a class in programming lingo) to use on this website to block unwanted and potentially harmful visits. I hadn't realised the extent of these visits until I set up the online cycle shop - the shop statistics revealed a huge number of visits that did not relate to sales. On exploring further, I discovered some web bots and spiders were crawling the shop several times a second. Acceptable robots, such as Google's, visited much less often.
We had to stop this huge amount of useless traffic, and our Internet research led us to Project Honey Pot, which is a clever device to identify dodgy visitors and put their IP addresses on a blacklist.
As a result, I developed Spamblocker, which automatically sifts through visitors before they enter the site, compares them with the blacklist and blocks their IP addresses, thus denying access to this site.
I published this PHP program on www.phpclasses.org (Spamblocker) in January 2009 to share with others. In February 2009, it resulted in my nomination for a PHP Programming Innovation Award, which was set up to provide recognition to developers who make outstanding contributions to the PHP community. At the end of voting, I came equal fifth.
Who are the unwanted visitors?
Unwanted visitors are those automated web crawlers and spambots that constantly prowl the Internet and your website, to no benefit to you, and possibly harm.
Why do they search?
- for e-mail addresses to harvest for later inbox spamming and spoofing attempts.
- for content to steal (scrape) for link-farm website and other money-spinning sites, to boost text content and search engine results for sites that masquerade as authoritative information sources.
- for the opportunity to fill comment boxes and forums with nonsense or links to unsavoury or irrelevant websites.
- for search results for new search engines.
- for students who have been set projects to create search engines.
- for academics researching trends and statistics related to Internet use.
Why block these unwanted visitors?
For you:
- to reduce excessive load on your server, invisible (to you) but a damaging drain on your ISP's resources
- to reduce spam and phishing attempts.
- to prevent comment spam so your site remains authorative and acceptable to all.
- to restrict your valuable bandwidth to genuine visitors who can benefit you in some way (with information or (for those with commercial sites) with revenue).
- remember: your bandwith may be low cost today, but this could all change.
For the Internet community:
- to reduce excessive load on servers - thus keeping the Internet fast
- to reduce the wasted time and fraud associated with spam and phishing
- to deter and reduce this nuisance
- remember: bandwith may be low cost today, but this could all change as the system comes under more and more pressure from these bots.
Spamblocker - summary
N.B. Joining Project Honey Pot and PHPclasses is free. Using my Spamblocker software is also free. Please respect the open source initiative BSD licence.
- You need PHP 5 running on your server (ask your web service provider).
- Spamblocker obtains a visitor's IP address.
- It searches Project Honey Pot's RBL (Realtime Black List) DNS server for the visitor's IP address.You need a Project Honey Pot API key for this.
- Spamblocker analyses the search result and allows access or not according to the criteria that you specify.
- The criteria that you specify are the threat level and the last time (in days) that the IP address was used for unacceptable action on the Internet. The time lag is used because owners of IP addresses change and the new owner may not be responsible for the previous bad behaviour. The two criteria are used together to determine access.
- Each visitor is only checked against the RBL database once per visit - minimising programming overheads.
- A message is displayed to a blocked visitor to alert any unsuspecting owner of a problem with their IP address history.
- The blocked and allowed addresses are logged to a file or to tables in a mysql database. Search engines and visitors that are not in the RBL database are logged too.
Tutorial: using my blocking code 'Spamblocker'
N.B. Your server has to be running PHP 5, unless you change the Spamblocker class code to run on PHP 4.
1. Join PHPclasses and download Spamblocker.
2. Make a folder on the root of your server - call it 'class'.
3.
3.1 Join Project Honey Pot.
3.2 Contribute to the project (there are various ways - the simplest is to put spam-trapping code on a web page) so you can obtain an API key for searching the Project Honey Pot black list.
4. From the Spamblocker download, open 'spam_blocker_class.php' in Notepad so you can edit it as follows:
4.1 Find var $apikey = 'abc'; in the Spamblocker class code.
4.2 Replace 'abc' with your Project Honey Pot API key.
5. Upload your modified 'spam_blocker_class.php' file to your 'class' folder on your server.
6. If you want your visitors logged in text files, make another folder on the root of your server and name it 'spamblocker_logs' or similar. If you don't make a folder for visitor logs, this data will be automatically written to the root of your server.
7. Put the following code at the beginning (above the header) of each web page to be protected. I've put it on every page.
<?php
session_start();
include( "class/spam_blocker_class.php");
$ip->spam_blocker_control("");
?>
8. Within the two double quotes of the spam_blocker_control("") code you can specify a path to the folder where the visitor files are to be written.
Example: $ip->spam_blocker_control("spamblocker_logs/");
9. If you want data to be sent to a mysql database instead of text-based log files, edit the file 'spam_blocker_connection.php'. This file writes text-based log files by default, but also automatically creates tables in a mysql database if you specify one.
9.1 Edit 'spam_blocker_connection.php' (from the download) by including your database connection details.
9.2 Put your edited 'spam_blocker_connection.php' file in the 'class' folder you have already made and placed on the server.
Notes:
It is easy to forget that Spamblocker is working in the background all the time, which means that a large amount of visitor data accumulates, so I've included a switch for disabling the writing of this data.
You can find the switch ('var $write_data = true;'.) just beneath ('var $apikey = 'abc';') in the lines of code. To disable the writing of data, change 'true' to 'false'.
Analyse your traffic regularly to check you are only screening out the baddies - you can adjust the threat level if you think you are erring too much on the side of caution.
All appreciation welcome
If you found all this useful, perhaps you'd consider buying me a cuppa? 50p would be nice, or a bit more for a refill. Just click on the little PayPal button below:
Thanks!
Up to 50% off winter breaks in the UK and Spain.Visit Macdonald Hotels
Click the guide below for details

