an email address honeypot to poison spammer's databases
I wrote this program to make it easy for anybody to create a honeypot for spambots - all you have to do is unzip the tar.gz file and it is ready to go - directories are already there in the tar.gz. Note that as the output of this program is a set of static web pages, you can run it on a Windows machine therefore there is also a .zip file. Next, edit the perl script so that it has the domain name and company name you want to use, set the number of pages and contacts per page, save it and run it.
Next, spambots go around, spidering the Internet, looking for websites that have email addresses. Some of these spiders are written in a more intelligent way than others in that, in addition to just collecting addresses, they actually check that the domain name exists - either: by comparing it with the domain name part of the URL that they have obtained the addresses from; or, by doing a quick DNS for the domain name.
There are two approaches to keeping spambots happy:
One advantage of the first method is that it makes the spambot grind to a halt. Another advantage is that, being produced on the fly, it doesn't take up much disc space. However, the disadvantages are: that if the spambot goes there again, it will never see the same page in order to confirm any of the addresses; and, it takes up processor time.
- write a script that produces a self-referencing web page full of bogus email addresses that are produced at random; and,
- produce static pages that are full of bogus email addresses.
The second method has the advantage that it is static so, once it is there, the spambot can confirm its existence as many times as it likes and it will always get the same result. The disadvantage is that it can take up a lot of disk space.
As far as learning things about spambots, the first method can keep a connection open for you to put into action some sort of tracing process but you have to catch it in the act.
So, why bother with a honeypot in the first place? The answer is simply that: spammers need to have a connection to the Internet and they also need to be able to get their spam into the mailing system somehow. To get access to mail relays, you need to keep your nose clean as far as bounces are concerned - this is why we see so many spams that have no subject and no content (there is nothing there that is spam so it gets through and doesn't bounce). By getting stuff through, their non-bounce rate improves. The addresses in a spambot honeypot will always bounce, no matter how spam-free the content is.
The program uses four, plain-text database files (lists of names and sometimes weightings) to produce thousands of unique email addresses.
It uses a countries file which provides labels and page names. You can add to this by adding a name and a number that produces a weighting that alters the probablility of it being used. You can see that the United States entry has state names as well - you can add to these and change the numbers as you wish. The only requirement is that when you configure your sundew program, you don't try to get it to choose more pages than you have lines in the countries text file.
It also uses a last names list text file which just has names - no weightings. Again, you can add to this but remember, the more names, the easier it is to produce a list of unique names.
In addition, it uses male and female name lists with weightings - if you want your own, you can find such lists on the Internet - this year's baby names and so on.
Finally, you can configure the program to produce as many names on as many pages as you like. You can alter the html code that the program produces quite easily if you are familiar with hard-coding html. If you are not worried about humans looking at the pages, you can leave it as it is. A small-scale run of the program is viewable and you can see the type of output it produces by clicking here.
Once you have finished fiddling around with the program and you are happy with the results, you can delete all of the files in the /contacts/countries/ directory (they have names like 'brazil.html') and then, run your sundew. Next, copy the contacts directory and its subdirectories to where you want them in your server subdirectories. Following that, put a 1 x 1 pixel image link pointing to the /contacts/index.html file and you are all ready.
This program is designed so that the pages it outputs will sit on a server that does not normally run any SMTP service. Normally, this would be part of the SMTarPit so that any mail would be tarpitted. However, if you do run an SMTP server, the hits on it from these email addresses will bounce and have a negative effect on the spammer's relay statistics.
You don't need...
- a computer that runs Perl
- a web server (it doesn't have to be apache, it can be anything - there are no SSIs unless you decide to add them yourself).
- port 80 open on the firewall
- a domain name pointing to that IP address (even a domestic broadband machine can use this - go to DynDNS.Org to see how to get your own domain name for free)
- root access
You should have (any way)...
- to have Perl in the directory that this program runs in because it doesn't call anything else (you are not running anything other than your web serve when these pages are accessed any way)
- to run it on a mainframe - a home, broadband machine will do it
- to spend money
- a firewall that you can configure to point port 80 traffic to your server
- Perl (nothing fancy is needed here, the basic install that comes with your OS should do) or, if you are running Windows, there is a Windows version.
- a 24/7 connection to the Internet
- a machine that you run all of the time
Just click here.
Click here to download the tarball...
Length: 259,572 Bytes.
Click here to download the zip file...
Click here to download an alternative image file (right) to use in the honeypot website...
Length: 258,056 Bytes.
Use this instead of the default image and whilst even IE users doing their genealogy will see that the email addresses are not real (and therefore not to try to mail their unexpected relative from Zimbabwe or Cameroon on your server), the spambots will not be able to read it. The image is white writing on transparent (here, the image is in a blue-backgrounded table) so PNG reading browsers will pick it up and for IE, it has a background colour of blue so that it is readable in IE as well.
Length: 3,029 Bytes.
If you are running Windows and you need to install Perl, look at the Active State website for a free download of ActivePerl.
|0.0.2 Released 11/01/2005 259,572 Bytes|
- This version has a hugely extended surname list so there is a substantially lower chance of producing the same email address. In addition, you can make it ignore the part of the process that makes sure that each email address is unique if you think it is taking too long - a quick and dirty solution but who cares? - it is their funeral.
|0.0.1 Released 29/01/2005 38,837 Bytes|
- This version produces a html list that contains many bogus email addresses in a number of pages. All you have to do is tell it how many, the (possibly bogus) company name and a domain name that points to the web server. It's as simple as that.
email paul-grosse at ntlworld dot com
Return to home page