Defeat Spam Blogs With IP Based Content Delivery

Posted 1432 days ago - Traffic Building, Wordpress

The majority of bloggers are forced to deal with spam blogs (splogs, aka scraper blogs), and even though a variety of counter measures exist, they just don't seem to do the trick. Most of the time, splogs will scrape only an excerpt from the post, making the permalink at the bottom of the post useless. Some harvesting software is even smart enough to strip out these attempts at foiling the scrapers, so what's a blogger to do? Today, I introduce a way to deliver entirely different content to these spammers via IP based content delivery.

First Things First

In order for this to work, we'll need a list of IP addresses that known offenders use. For your convenience, I've compiled this massive list of 5,780 offending IPs (I highly recommend you use your own unique list compiled from your own server logs). Copy those IPs and save them to your server's root directory with a filename of your choice. Remember the filename, you'll need it in just a second. Now that you've got your enemy plotted, lets get to the code.

Backup, Modify, Test

Backup your theme's single.php to single.post.original.txt or something of your choice. Now, open up single.php with a text editor and insert the code below at the very top of your single.php file.

Please note: I take absolutely no credit for this code. The original source is located here.

<?php
function chkiplist($ip) {
$lines = file("THE-FILENAME-OF-IP-LIST.txt");
$found = false;
$split_it = split("\.",$ip);
$ip = "1" . sprintf("%03d",$split_it[0]) .
sprintf("%03d",$split_it[1]) . sprintf("%03d",$split_it[2]) .
sprintf("%03d",$split_it[3]);
foreach ($lines as $line) {
$line = chop($line);
$line = str_replace("x","*",$line);
$line = preg_replace("|[A-Za-z$max = $line;
$min = $line;
if ( strpos($line,"*",0) <> "" ) {
$max = str_replace("*","999",$line);
$min = str_replace("*","000",$line);
}
if ( strpos($line,"?",0) <> "" ) {
$max = str_replace("?","9",$line);
$min = str_replace("?","0",$line);
}
if ( $max == "" ) { continue; };
if ( strpos($max," - ",0) <> "" ) {
$split_it = split(" - ",$max);
if ( !preg_match("|\d{1,3}\.|",$split_it[1]) ) {
$max = $split_it[0];
}
else {
$max = $split_it[1];
};
}
if ( strpos($min," - ",0) <> "" ) {
$split_it = split(" - ",$min);
$min = $split_it[0];
}
$split_it = split("\.",$max);
for ( $i=0;$i<4;$i++ ) {
if ( $i == 0 ) { $max = 1; };
if ( strpos($split_it[$i],"-",0) <> "" ) {
$another_split = split("-",$split_it[$i]);
$split_it[$i] = $another_split[1];
}
$max .= sprintf("%03d",$split_it[$i]);
}
$split_it = split("\.",$min);
for ( $i=0;$i<4;$i++ ) {
if ( $i == 0 ) { $min = 1; };
if ( strpos($split_it[$i],"-",0) <> "" ) {
$another_split = split("-",$split_it[$i]);
$split_it[$i] = $another_split[0];
}
$min .= sprintf("%03d",$split_it[$i]);
}
if ( ($ip <= $max) && ($ip >= $min) ) {
$found = true;
break;
};
}
return $found;
};
$status = chkiplist($_SERVER['REMOTE_ADDR']);
?>

Ok, now what? Change the third line so the filename you saved your IPs to is specified, then look for:

<?php the_content(); ?>

Immediately before that line, add something similar to the following:

<?php if ($status == 1): ?>
Hey, thanks for scraping my post:
<a href="<?php the_permalink(); ?>" title="<?php the_title(); ?>">
<?php the_title(); ?><br />
Click here to see the site this content was stolen from!
<?php the_excerpt(); ?> <?php the_title(); ?></a>
Original Source: <?php the_permalink(); ?>
<?php else: ?>

The spam bots will see this:

Hey, thanks for scraping my post:

Defeat Spam Blogs With IP Based Content Delivery

Click here to see the site this content was stolen from!
The majority of bloggers are forced to deal with spam blogs
(splogs, aka scraper blogs), and even though[...]

Defeat Spam Blogs With IP Based Content Delivery

Original Source:

www.nullamatix.com/defeat-spam-blogs-with-ip-based-content-delivery/

We're not done yet. To prevent any PHP errors, you'll need to add this:

<?php endif; ?>

immediately after:

<?php the_content(); ?>

The whole thing, minus the chkiplist() function defined above, should look something like this:

<div class="entry-content">
<?php if ($status == 1): ?>
Hey, thanks for scraping my post:
<a href="<?php the_permalink(); ?>" title="<?php the_title(); ?>">
<?php the_title(); ?><br />
Click here to see the site this content was stolen from!
<?php the_excerpt(); ?> <?php the_title(); ?></a>
Original Source: <?php the_permalink(); ?>
<?php else: ?>
<?php the_content(); ?>
<?php endif; ?>
</div>

To test everything out and make sure your blog is up and running properly, just visit a post like you normally would. If the content is displays as usual, you're good to go. To test the scraper's view, just add your IP to the list of known spammers. The script above also supports wildcards, among other variations. Check out the original source mentioned above for more details.

To Conclude...

This won't immediately work on every new splog that comes out, but if you actively check your server logs, you can stop most of 'em by adding the offending IP(s). Now for the real question: what other purposes might this nifty little script serve? Just use your imagination - there is a hidden agenda behind this entire post ;)

Word Count: 1007

Tags: , ,

Click Here to Submit a Comment

Permalink / Last Modified:

Support Nullamatix.com:

See Also:

  • 12/26/2009 -- WordPress Hacks Worth Implementing
    Excerpt: "Combat Comment Spam Most spammers aren't clever enough to populate the REFERER header. This code snippet is not only extremely easy to implement, but pretty effective, too. Open up your themes functions.php and drop in the following: function ..."
  • 12/14/2007 -- Nullamatix.com Gets Several Enhancements
    Excerpt: "After admiring several of my favorite blogs and hearing a recommendation from my sister, I realized several necessary component were missing from my blog. Almost a full days work and a brew later, two-thirds of the intended updates have been ..."
  • 04/11/2010 -- Howto: XCache in a Lighttpd Chroot on Debian
    Excerpt: "Whether you're pressed for resources on a virtual/dedicated server, or simply looking for ways to improve web application performance, XCache is guaranteed to produce the desired result. Within minutes of installing XCache: page load times were cut in half, ..."
  • 12/25/2009 -- New Tool: Daily [Mod] Security Reports
    Excerpt: "After the Lighttpd mod security post and the DDoS attack that followed, I began working on a script that parses the Lighttpd server-error.log and inserts matched records into MySQL. The result? Check it out here: security.nullamatix.com Daily Security ..."

3 (Comments|Trackbacks)

[ RSS feed | Trackback URI | Leave a Comment ]

collapse Jonathan Bailey # @ 2008-03-07 14:09:12

This idea is interesting, but I have a a few concerns.

First, shouldn't this code go in the RSS feed and not the single.php page? Most scraping is through the RSS feed itself and not the Web site, I'm not sure how much is gained by putting this code in the site itself.

Second, I worry about the possibility, or perhaps probability, of false positives. Spammers change IP addresses regularly and it is entirely possible that once an IP is abandoned, it could be picked up by someone who wants to make legitimate use of the content. If the RIAA can't consistently pinpoint a person by their IP address, I don't see what hope we have.

Finally, in the same vein, since spammers change IP addresses so regularly, many will just dodge the list. It is a big part of the reason why fighters of email spam have done away with IP detection as a tool.

I like the idea in principle, but I don't think that this implementation of it is going to be effective enough to warrant the potential risks.

That is just my opinion though, I'm sure many others will disagree with me.

Thank you for writing this and for providing another tool. Even though I won't be using it, I hope that, perhaps, others are able to find it productive!

collapse Guy Patterson # @ 2008-03-08 08:59:51

Jonathan,

Thanks for taking the time to comment on the post. Your first suggestion makes complete sense and I honestly can't believe I failed to mention that in the article. You're right, most scrapers probably do scrape RSS feeds rather than the HTML.

False positives are inevitable, no doubt about it.

Your third and final remark again makes total sense, but geographically speaking, if you're not concerned with Russian and/or Romanian visitors, this little script would work perfectly.

The real reason I wrote this article wasn't because I wanted to block splog bots. Just use your creativity and imagine what else you could use this for :)

 
 
collapse Victoria Whitehead # @ 2009-02-09 11:24:54 Subscribed Via Email

Hi There,

I am very interested in your article, as I have been experiencing huge amounts of spam recently and am looking for a solution. I currently use the CMS Joomla and had a forum. The forum has since been taken down, because I was unable to control it.

Since then we are still receiving spam through our online mail form. Is there anything you can suggest that would prevent this. I was thinking about the picture verification tools, where people have to enter the letters displayed in the image.

Also, I'm intrigued to know what other use this script could be used for. Is it to display different content to people from different locations...? I.e. Have a UK page, a US Page etc.

 

Leave a Comment

Comments are moderated prior to showing up. If your comment does not show up immediately, please do not attempt to resubmit. If you're redirected to the original post after pressing "Add Comment", your comment was successfully entered into the moderation queue.

Trackback Responses to This Post:

  • No Trackbacks, yet. Help Nullamatix.com by Linking to This Post.