PureNews

PureNews is an amazingly sleek and powerful news theme with unlimited color variations.

View full feature list Check out the live demo Buy this theme today

Fighting back against the scrapers

Posted by on 31st Aug 2007 | 13 comments

I’ve caught one or two blogs scraping BloggingTips in the last week or so. Unfortunately, it’s something that is becoming more and more common. Truth be told, there is no foolproof way to stop someone taking your content, if someone is completely committed to ripping you off they will! However, scrapers are by their very nature very lazy people who would rather plagiarize someone elses work than get off their ass and write their own content.

Therefore, by implementing a few deterrents you can discourage the scrapers from ripping off your blog.

Use a partial feed

Only providing a summary of your blog posts in your feed will not please your readers but it will discourage scrapers from using your
content. Probably not the best solution but it’s certainly an option.

Place a copyright in your feed footer

By placing a copyright notice in the footer of your RSS feeder you are letting everyone know that if they are not reading the content through their newsreader, the content has been ripped off from your blog.

For example, I recently added Angsuman’s Feed Copyrighter for wordpress. Here’s an example of the copyright notice it adds to the end of your feed.

Copyright © 2007 Blogging Tips. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement. Please contact legal at bloggingtips.com so we can take legal action immediately.

The copyright notice just uses basic HTML so it would also be very easy to place an image in the copyright footer too if you wanted.

Link to your blog posts

Not only is linking internally a great way to distribute Pagerank and promote older posts, it will also hurt scrapers as there are more links back to the authors blog. It would be very timeconsuming for them to remove all references to your blog so most wouldn’t bother. If they leave the links in then at least you will get some traffic in return ie. it’s pretty obvious to readers they are reading a scraper blog if they see tons of references to the same site so they will be more likely to visit your blog instead of browsing through posts in the scrapers website.

Delete their pingbacks

One of the most common ways to find out someone is scraping your blog is through pingbacks. When they publish your post on their blog you will get a pingback to the original post. To be able to catch them using this method you need to make sure you are getting pingback updates via email and you need to make sure you are linking internally in your posts so that a pingback will generate.

When the pingback is generated they get a link to their blog from yours. Don’t let them get any traffic from you, spam (delete) their pingback so that their copycat blog gets nothing from you.

Stop people hotlinking images from your site

Hotlinking is when a webmaster links to images on someone elses website without their permission. This can be a big problem for some websites because it can take up so much bandwidth. Thankfully there are ways to reduce hotlinking and these methods can be used to discourage the scrapers from stealing your content.

To stop hotlinking you need to edit your htaccess file. You can stop every other site on the web using your images or you can specify what sites are not allowed to hotlink. You can also use an image to let everyone know the other person is hotlinking. For example, I could link to your blogs logo from here using the img tag. If you did not like me doing this you could edit your htaccess file to show a warning image instead of your logo.

This is very useful against scrapers because any images you use in your posts will automatically be copied to the scrapers blog. You can have some fun and make all your images appear as donkeys on their blog!!

Here are some hotlinking tutorials which will help you stop blog scrapers :)

Stop Hotlinking with htaccess
How To Stop Hotlinking and Bandwidth Leeching
Hot Link Checker

Respond Legally

From my experience scraper blogs don’t last too long. The blogs I caught scraping me two months ago are no longer doing so, I suspect that most scrapers get bored and move onto something else.

However, if you have one or two people scraping your blog on a consistent basis, it might be worthwhile going down the legal route. Legally, hosting companies need to remove websites from their server which have illegal or copied content.

Check out the 3 steps of a legal response by Advanced Business Blogging.

  1. Legal response 1.0: email and letter to offending party
  2. Legal response 2.0: direct your complaint to the offenders host/ISP
  3. Legal response 3.0: consult with a lawyer

You can read the complete guide for the above steps here.

Video blog more often

The Gypsy Bandito thinks the best way to stop scrapers is to do video blogs instead of writing posts.

Overview

Having your blog scraped can be really frustrating but a few easy steps can discourage content stealers from trying to rip off your blog.

I hope that blogging platforms will start to include features will stop or reduce blog scraping but until then, we will have to deal with these idiots ourselves.

Good luck :)


Kevin Muldoon is a professional blogger with a love of travel. He writes regularly about topics such as WordPress, Blogging, Productivity and Social Media on his personal blog and provides support to bloggers at Rise Forums. He can also be found on Twitter @KevinMuldoon and .

13 comments - Leave a reply
  • Posted by BeachBum on 31st Aug 2007

    I also put a few advertisements in my feed footer. Not enough to get readers to unsubscribe, but enough that I can make some money from the scrapers. I have a story about a scraper that stopped and moved on to easier prey, but too long for here.

    BeachBum

  • Posted by Cash Quests on 31st Aug 2007

    Kevin,

    That .htaccess to stop hotlinking is gold! Thank you!

  • Posted by Kevin on 31st Aug 2007

    BB – Thats a good suggestion. An ad in the footer is surely gonna piss off the scrapers

    Kumiko – no problemo :)

  • Posted by Bill Cook on 31st Aug 2007

    Another idea is that is lots of people are hotlinking an image, simply replace it with an advert for your blog (but with the same filename), maybe you'll get some extra traffic?? I did this once on my recipe blog though theres no way to know how much traffic this generated.

    Cheers,

    Bill.

  • Posted by CT Moore on 31st Aug 2007

    Good post, but the thing about partial feeds is that they don't really stop anyone. I know for a fact that the SearchAnyway Blog gets scraped regardless of a partial feed.

    And you're right, it really pisses off reader. For example: http://www.wolf-howl.com/blogs/putting-partial-fe

  • Posted by Mani Karthik on 31st Aug 2007

    Kev, Thanks for notifying about the feed copyrighter plugin.

  • Posted by Kevin on 31st Aug 2007

    Bill – How did you do this? ie. it's fairly straight forward to use a banner or logo instead of an image when they hotlink but how do you turn that image into a link to your blog. Did you use htaccess?

    CT – I'm surprised that a scraper would choose a blog which uses a partial feed. If I was a scraper I would want to rip off a feed that had more content. Though I probably shouldn't be surprised, some of the guys are too lazy to even check the feed their ripping off!

    I actually agree with the wolf on this one. I think that providing a good experience for your readers should be above trying to get more page views.

    Mani – no problem bud. :)

  • Posted by Charles Lau on 31st Aug 2007

    It is terrible to see pingbacks only from scrapers… Some of them even claimed to be "inspired" by our blogs…

    Thanks for the tip…

  • Posted by Allan on 1st Sep 2007

    I have just written a post on RSS republishing.

    What are your views on webmasters republishing your feed with attribution or part of your feed?

  • Posted by Allan on 1st Sep 2007

    Allan – Can you define what you mean by republishing your feed? If you mean someone reproducing all the content from a blog I run without permission then I'd label them the same as scrapers

  • Posted by Flug Brasilien on 14th Aug 2008

    It´s a really nasty problem, especially google will punish your blog because of duplicate content. the only thing you can do is to prosecute because of copyright violation, what can be a longsome way.

  • Posted by kredit move on 5th Nov 2008

    Thanks for your Tipps!

  • Posted by Martin Bay on 14th Sep 2010

    Thanks for the good post. I've just added some of these tips to my site. Google just dropped my site from page three to nowhere? I'm really not impressed with google it should not be that difficult to identify scrapers!