March 9th, 2008If You’re Going To Rip Me Off, Get My Name Right
With the amount of information we can readily find on the internet, it’s not hard to find authors that have very similar writing styles to ours and discuss the same topics. But how often do we read an article on these sites only to find that we were the original authors and our story was scooped without permission?
Content theft is nothing new in the world of publishing, and it’s gone on for thousands of years. There are all kinds of people that are too lazy to think for themselves and wish to make a buck off someone else’s brainpower, robbing the true creator of the credit and respect they may (or may not) deserve. But that said, many of these lazy content thieves are also quite successful in their subversive craft.
I’ve recently found a few sites that are clearly filled with content scraped from other sites. What bugs me the most about these sites is that the dolts who operate the things rip off an entire post, change the name of the author, put some lame introductory sentence in place saying “they found it”, and then have the audacity to post a link back to the original article. When I see this pop up in my “Incoming Activity” box, I check out the site to find that my name has been replaced with something stupid like “Yahoo! News Search Results for surfer“. In this particular case, the blog that’s stealing my content and changing my name is run by someone named “surfer”.
First off, surfer — if that is your real name, which I doubt — don’t post sections of my posts without permission unless you have something of value to add. If something I’ve written warrants a post on your blog, that’s great. I really have no problem if someone posts parts of my articles to poke holes in my arguments. But to take a section and add no value whatsoever, change my name, and then post a link back to my site is just insulting.
There have been many, many more instances of this in the past, but they all follow the very same formula. I’m quite certain that anyone who has been blogging for more than six months has had at least one post completely ripped off and posted elsewhere.
Google’s Not Stupid, So Don’t Be Lazy
Naturally, this “poster” is nothing more than a bot that rips through RSS feeds searching for content that matches certain criteria (perhaps), makes a few subtle changes and then automatically posts it to anywhere from one to one hundred other sites. Heck, how else could these relatively new blogs build 1000+ articles in a month? What I don’t understand, though, is how the site creators actually expect to make money.
Duping people online can be pretty easy, but if nobody knows your site exists, then you’re not going to get very far.
I’m hardly an expert on stealing content for my own financial gain, but if I were, I could tell you that I’d do it very differently. I’d want Google to pick up my site. I’d want Google to index it. Heck, these splogs that take our work are often loaded with AdSense. The amount of work that goes into building one of these sites is perhaps minimal, and the initial startup cost can be as low as $6 if you just need to buy a domain. So why the rush to fill the site with content?
Building a successful splog should be done in baby steps. This way, it looks and feels legit. Here’s how I’d do it:
- Crawl for key words and steal just the ones that fit into the most lucrative markets
- Eliminate all links in the post and change the name of the author to whatever the main “author’s” name of the splog should be
- Add some automated SEO keyword tools
- Have the automated agent post no more than 3 articles per day, queuing the others to ensure a steady posting schedule
- Crawl for comments, and post some legit-sounding comments on the site every few days
- Don’t ever link to the original content
- Give the site six months to take hold
Sure, the site would have a lousy PageRank, and it would be a splog to anyone that looked closely at it. But it would be one of those sites that could fly under the radar for a long, long time. This could potentially offer a greater return on investment as the site would eventually emerge from the Google Sandbox and start appearing (potentially) in some pretty juicy spots with Google’s search results.
I’d be willing to bet that someone could easily buy themselves ten domain names for various niches and set up a crawler to do this in the span of a single weekend. Initial cost would be about $100 USD, and the return would likely appear four months after pressing “Go”.
So why rush? If you’re going to steal our work, why not take your time and our credit?
It’s Jason with a ‘J’, Thank You
I like to see people link to this site. It lets me know if I’ve done something right, or if I’m just being an idiot on a subject. When people post snippets of my articles, I’m quite happy to see that they’ve spelled my name correctly and provided a link to the source, regardless of what they’re saying about my mental faculties. That said, it’s terribly disappointing when I follow that new “Incoming” link and find that my content was stolen along with two thousand articles from other writers around the blogosphere.













































There’s a Wordpress plugin that automatically adds a copyright message to your RSS feed. You’ve probably seen it before, but is worth mentioning anyway. I doubt it would stop the “bots” completely, but it might help curb the plagiarism a little.
Yep, I have Taranga’s copyright plugin installed and active on all of my sites. When it was first installed, I noticed that a few sites stopped ripping my entire post. However, since December, I’ve noticed that it’s not really stopping people anymore.
Perhaps it’s time to open a can of Whoop-Ass and have Chuck Norris hunt them down. Less than an hour after posting this article, it appeared on five freakin’ splogs, and each splog said the content was authored by someone different.
I wonder if these are the same people that download music and then try to pass off the lyrics as their own
Wow, it’s interesting to read this as I currently have the same issue. The worst part is that while it shows up on my incoming links and from Technorati it looks like the entire post is being ripped, but when I try to browse to the site it won’t work. Makes me wonder what they are really doing with my content (not that I’ve been creating much lately).
Regardless of how often we create content, it sucks when somebody else steals it and doesn’t give us any credit aside from a worthless backlink.
I really wish web hosts would make it easy for people to report splogs and offending content. Rather than try and complain to a fictional “admin” on these sites about their use of our content, I’d rather have some other body go into the database and delete our work.
Of course that would open up a whole can of worms … but still. It’d be nice to have some options for dealing with these dolts
Two plugins you might be interested in are Antileech, which can detect some scraper bots and redirect them to a fake feed, and CopyFeed, which adds the IP address of the scraper to the feed and lets you ban them from your site.
Both plugins seem to work pretty well and are highly recommended by my readers. They’re both working checking out.
On that note, I’m sorry that you’ve had such a major problem with this, if there is anything I can do to help, please let me know!
Thank you for the suggestions, Jonathan. I’ll check them out tonight
Here’s an evil idea…if they are just scraping and not filtering at all, create a post with a tone of backlinks to your site and what-not and talking about how great your site is (build yourself up) as well as that it was originally posted on your site.
See what happens then
That’s not a bad idea, Nick … it seems that almost everything I’ve written in the last few weeks has appeared on a splog within 20 minutes of being posted here, so even if it’s just a temporary thing, I’ll be laughing