Wez Furlong

Browse archives
Conference Presentations
Subscribe. (circulation 766)
Comments. (circulation 4)

Search powered by Google

I am Wez Furlong, Director of Engineering at Message Systems. My team is responsible for the fastest MTA on Earth.

I'm also a PHP Core developer and OpenSource contributor, residing in Maryland with Juliette, Xander and Lily. (read more)

2nd February 2005 @ 04:40 EDT

Today, this blog got its first ever spam, via the trackback interface. How annoying. Here's how I've stopped it (yes, the regexes could be better, and the parse_url() call eliminated, but its late and this is a quick hack):

<?php
function ne_rbl_check($ip) {
   static $lists = array('.sbl-xbl.spamhaus.org');
   $ip = gethostbyname($ip);
   foreach ($lists as $bl) {
      $octets = explode('.', $ip);
      $octets = array_reverse($octets);
      $h = implode('.', $octets) . $bl;
      $x = gethostbyname($h);
      if ($h != $x) {
         return false;
      }
   }
   return true;
}
function ne_surbl_checks()
{
   $things = func_get_args();
   foreach ($things as $thing) {
      if (preg_match('/^\d+\.\d+\.\d+\.\d+$/', $thing)) {
         if (!ne_rbl_check($thing)) return false;
      }
      if (preg_match_all('~(http|https|ftp|news|gopher)://([^ ]+)~si',
            $thing, $m = array(), PREG_SET_ORDER)) {
         foreach ($m as $match) {
            $url = parse_url($match[0]);
            if (!ne_rbl_check($url['host'])) return false;
         }
      }
   }
   return true;
}
?>

These two functions implement RBL and SURBL checks. RBLs, as you probably already know, are real-time block lists; you can look up an IP address in a block list using DNS, and if you get a record back, that address is in the block list. The first of the two functions implements this, in a bit of a lame hackish way.

The second function implements content-based checks, commonly known as SURBL; the text is scanned for things that look like IP addresses or URLs; those IP addresses or host names are extracted from the content and then looked up in the RBL using the first function.

Why is this good? A comment spammer will typically want to inject a link to their site onto your blog, and you can be fairly sure that their site is listed in a good RBL. The RBL used in my sample above is an aggregation of the SBL and XBL lists which contain known spammers and known zombie/exploited machines, so it should do the job perfectly.

Now to hook it up to the blog; this snippet is taken from my trackback interface:

<?php
if (!ne_surbl_checks(get_ip(), $_REQUEST['excerpt'], $_REQUEST['url'], $_REQUEST['blog_name'])) {
   respond('you appear to be on SBL/XBL, or referring to content that is', 1);
}
?>

get_ip() is a function to determine the IP address of the person submitting the page; I haven't included it here for the sake of brevity; it's fairly simple to code one, but keep in mind that it needs to be aware of http proxies. respond() returns an appropriate error message to the person making the trackback and exits the script.

And that's all there is to it; you can do similar things with your comments submission and pingback interfaces.

Enjoy.

Post a comment
2nd February 2005 @ 22:17 EDT

You get gopher trackbacks? You rule!

3rd February 2005 @ 17:38 EDT

Completely off-topic (apologies in advance)

On the UAE homepage I noticed that you (I think) wrote a patch to give that emulator MMU support. I wish to run NetBSD on UAE to use the Sun3 compatibility layer (long story) to execute an ancient ADA cross-compiler.

Do you still have the binaries from when you were working on this project? I briefly tried to compile it last night, and ran into problems. I will try to resolve those tonight, but I decided to try the easy approach as well by contacting you in the meantime.

Also, might you have any other ideas for running NetBSD on virtual 68k hardware?

Thank you, Toby

by Toby Hackstock kc8imd(a)comcast(.)net in .
3rd February 2005 @ 17:46 EDT

The MMU Emulation wasn't complete enough to get to init on linux though, and was really really slow. IIRC, there was an Atari m68k emulator that might have had MMU support. You might be better off just trying to find someone actually running NetBSD/m68k hardware and asking them for a shell.

6th February 2005 @ 05:54 EDT

As you should already be aware, using an RBL (or any number of RBL combinations) is hardly an effective solution. I had 17 attempts, each from a unique ip address within a 3 minute time frame. Only half of them showed up in a multi-RBL check.

by c. s. in .
Post a comment