Last week shortly after I posted this I started to receive something like 50 trackbacks/pingbacks per day that had nothing to do with that post. After little inspection I started to suspect that it was to do with word combination “Flash issues” which may be was considered close enough to “flesh issues” as many of those blogs linking to me were in some way connected to spamming about drugs, operations, body weight loss etc.

What are pingbacks/trackbacks?

They are an automatic feedback mechanic for blogging society. It is automatic way to notify blog posts about them being linked to from other blog posts. Pretty important and useful thing huh?

Problem is about “automatic” part. It just that blogs do it automatically for the user. You linked to some post. Your blog will try to send pingback/trackback comment to it. As it is automatic means to filter them out like CAPTCHAs obviously will not work.

As a first measure I turned off pingback/trackback for that post. Then started to investigate more.

Further investigation

First thing I noticed was that those posts were not linking to me in them. Then little bit below it I noticed “Related blog posts” block where link to my post was listed… At that point some doubt hit me. What if this is some weird blog plugin that for some reason finds my post related to those and trackback/pingback is a side effect. But posts were so different in nature that it was hard to believe that some plugin would fail so miserably. Also those blogs were very weird and spammy in nature so my conclusion was that it is just a tricky way to hide spamming nature of all this. Also I suspected that those links to me were irrelevant and not useful to my blog.

Next thing that I found funny was what happened with visits/traffic to my blog in next days. As this type of spam needed those blogs to link to me to work they also were sending people in my direction too… So visits and traffic jumped some 5x-20x times over what I had before… So why to stop them if they drive traffic my way? Well because those visitors have nothing to do with my blog. They don’t register to RSS, don’t leave relevant comments etc etc… They only eat traffic and don’t bring any value. They are not my clients. It’s like crowds of people interested in drugs come to book store. Book store does not get much from it or rather may even loose real customers…

After some more inspection I found out that 50% of visitors were spending less then 30 seconds on my site and that’s exactly those who are not truly my visitors.

Though there are almost 20% who spend half an hour or even more who probably are my visitors. I am not sure how this info is collected by Awstats so I am not sure.

Anyways it is pretty clear that I don’t want to be involved with that kind of activities.

What I did so far?

So what kind of ways to battle this exist?

First one and one I was using before was CAPTCHA. But as I mentiond it does not and should not work for trackbacks/pingbacks. And that’s something those people exploit.

Another thing I switched on next day after it started was moderation. This way no comment is shown until I personally approve it. I hate that to be honest. On my blog as I need spend time approving them and on other sites as it slows down the communication, adds censorship feeling etc etc. So I did turn it on but it was a temporary solution. In next 3 days I received 150~ spam posts to disapprove.  But at least they were not shown publicly.

Akismet

Another and probably best solution there is to spam is collaborative spam filtering. Something I think first was tried by Google in Gmail. What is aim of spam? To push some information to as many people as possible, to advertise it almost for free this way. Same or almost the same info sent to as many people as possible that don’t want to receive it? Don’t you see some weakness here? What if this mass unwillingness to receive something could be exploited to make something like a learning collaborative spam filtering service? Simply saying imagine it like that. Each mail you receive goes trough service. You receive spam. You mark it as spam. It is sent to a service. Service receives some 100 such  complains for that message. It starts to filter and automatically mark as spam this message for other users. Kind of collaborative filtering. Of course it is not as simple as that but it is part of it.

After looking for a bit I stumbuled on Akismet.  Akismet is a Plugin/Service that works as an automatic filter on comments your blog receives. It marks spam automatically, puts other messages in to custom approval. I am not shore if it can approve automatically without my approval as I did not receive any real comments during that week.  Then when it misses something and you mark it as spam it is sent back to the service to learn. As a result for you and other service users spam detection becomes better. There is a worse situation possible though. It may automatically mark real comment as a spam. In this case you can unspam. As a result such comment will be sent to service to learn that it is not a spam.

In the end it is all about numbers. There are less spammers then their targets so such service succeeds.

I heard about Akismet before but did not had need for it. Now I had need + use case to test it on. How well it performs? So far it filtered out 95% of comments and all 95% were spam as far as I can say. Sadly other 5% it did not filtered out were spam too. Still it is 90-95% less work for me :)

What now?

Even thought I filter out those comments and they do not get publishes my blog still is mentioned on all those spammy blogs. And I still get way too many irrelevant visits. I wonder when they realize that they are not linked back and do not gain anything from my blog anymore.

But I will be curiously keeping an eye on it all. It is kind of interesting isn’t it?