This HuffPost Canada page is maintained as part of an online archive.

Are Those Robots or People Clicking on Your Site?

The use of robots to crawl the Internet to grab as much information for possible in a malicious way is nothing new. The ability for website owners to get smarter and ensure that they are protecting their consumers (from both the robots and third-party deals) is nothing new, either, but the numbers are getting out of control.
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

The robots have landed.

It should come as no surprise to you that some of the traffic to your websites and mobile apps are not a real human being. There are spiders out there crawling websites to index them, there are malicious hackers poking and prodding away to find a moment of vulnerability and, of course, there are technologies in place to track people and their usage. It sounds a little too George Orwellian for some, but it's a functional part of the digitization of media.

The bigger question: Is all of this non-human traffic getting to be a little too much?

Last week, Tom Foremski had a post over at ZDNet titled, Report: 51 per cent of web site traffic is "non-human" and mostly malicious. The title of the news piece tells the entire story. Before looking at the two major issues that need to be thought about moving forward, here is how the website traffic is broken down (according to a study done by Incapsula - a company that provides cloud-based security for websites. The study is based on a sample of 1000 websites/clients of Incapsula):

  • 5 per cent is hacking tools searching for an unpatched or new vulnerability in a web site.
  • 5 per cent is scrapers.
  • 2 per cent is automated comment spammers.
  • 19 per cent is from "spies" collecting competitive intelligence.
  • 20 per cent is from search engines - which is non-human traffic but benign.
  • 49 per cent is from people browsing the Internet.

The high cost of living.

Who pays for this traffic? You do. Along with the server and usage costs, all of this non-human traffic is also affecting overall performance as well. The more people and technology sucking bandwidth, the slower the response time is of your servers. If over half of this traffic isn't even real people, just imagine what your bandwidth and server costs could look like. Above and beyond that, what is likelihood of this non-human traffic decreasing?

Marketing is become that much more sophisticated and technologically inclined, so these types of pings and pokes are clearly going to increase over the next short (and long) while. Once this gets to the point where more website owners are aware of this intrusion, the government will step in and legislate this. Nobody wants government intervention here, but this is another prime case of technology and new media companies stepping over the line by the sheer act of overdoing it.

The third-party problem.

It's one thing for websites to be tracking their usage and allowing non-human crawlers from search engines to index their websites in order to rank higher. But -- if you look at the list above -- you'll note that scrapers, automated comment spammers, and spies are all third-parties trying to leverage the website for its own, personal marketing initiatives. This makes up over 25 per cent of all traffic. This allowance of third-parties to infiltrate and leverage website traffic is only a small fraction of the issue.

What about the other third-parties that the website has partnered with and allows them access to the website and their users? It's probably unimaginable to think about what that combined piece of website traffic may look like. We have to remember, that most consumers simply don't understand the terms and conditions of a website and have little knowledge and understanding into all of this tracking that is happening. The number must be nothing short of astounding.

It's time for fair play.

If we, as the New Media collective, do not start self-governing ourselves, you can rest assured that public outcry will increase and the government will step in. What information are we keeping and what information are we tracking and do we need it all? Understandably, it will be next-to-impossible to stop the malicious spies and infiltrators that are leveraging this information for spam (and knowing that this clocks in at over twenty-five of all website traffic, it should come as a rude awakening for publishers), but the crawling and sniffing that we can control, should be looked at with a discerning eye. The use of robots to crawl the Internet is nothing new.

The use of robots to crawl the Internet to grab as much information for possible in a malicious way is nothing new. The ability for website owners to get smarter and ensure that they are protecting their consumers (from both the robots and third-party deals) is nothing new, either, but the numbers are getting out of control and they're only going to increase.

It's time to act. What are we going to do about it?

Mitch Joel is president of Twist Image -- an award-winning digital marketing agency. His first book, Six Pixels of Separation, named after his highly-successful blog and podcast of the same name is a business and marketing bestseller. His next book, CTRL ALT DEL, comes out in Spring 2013.

Close
This HuffPost Canada page is maintained as part of an online archive. If you have questions or concerns, please check our FAQ or contact support@huffpost.com.