Home Google Indexes Pushing Unhealthy Knowledge- Google’s Latest Black Eye

Pushing Unhealthy Knowledge- Google’s Latest Black Eye

Pushing Unhealthy Knowledge- Google’s Latest Black Eye

Article by Hribar Thornbrugh

Pushing Unhealthy Knowledge- Google’s Latest Black Eye – Other

Search by Author, Title or Content

Article ContentAuthor NameArticle Title

Submit Articles
Author Guidelines
Publisher Guidelines
Content Feeds
RSS Feeds
Contact Us

Google stopped counting, or not less than publicly displaying, the variety of pages it listed in September of 05, after a faculty-yard “measuring contest” with rival Yahoo. That count topped out round 8 billion pages before it was faraway from the homepage. News broke just lately via numerous website positioning boards that Google had instantly, over the past few weeks, added one other few billion pages to the index. This would possibly sound like a reason for celebration, however this “accomplishment” would not reflect well on the search engine that achieved it.

What had people buzzing was the nature of the contemporary, new few billion pages. They were blatant spam- containing Pay-Per-Click (PPC) ads, scraped content, they usually had been, in many instances, showing up properly in the search results. They pushed out far older, more established websites in doing so. A Google consultant responded through boards to the problem by calling it a “dangerous knowledge push,” one thing that met with various groans throughout the web optimization community.

How did somebody handle to dupe Google into indexing so many pages of spam in such a brief time frame? I’ll provide a high level overview of the method, however don’t get too excited. Like a diagram of a nuclear explosive is not going to show you how to make the real factor, you are not going to have the ability to run off and do it yourself after studying this article. Yet it makes for an interesting story, one which illustrates the ugly issues cropping up with ever growing frequency on the planet’s most popular search engine .

A Darkish and Stormy Evening
Our story begins deep within the heart of Moldva, sandwiched scenically between Romania and the Ukraine. In between fending off local vampire assaults, an enterprising local had a brilliant idea and ran with it, presumably away from the vampires… His concept was to use how Google dealt with subdomains, and not just a little bit, but in a giant way.

The center of the issue is that presently, Google treats subdomains a lot the same means because it treats full domains- as unique entities. This implies it’ll add the homepage of a subdomain to the index and return at some point later to do a “deep crawl.” Deep crawls are simply the spider following links from the area’s homepage deeper into the positioning till it finds the whole lot or offers up and comes again later for more.

Briefly, a subdomain is a “third-level domain.” You’ve probably seen them before, they appear one thing like this: Wikipedia, as an example, uses them for languages; the English version is “”, the Dutch version is “” Subdomains are one solution to organize large sites, as opposed to a number of directories or even separate domains altogether.

So, we’ve got a kind of page Google will index nearly “no questions asked.” It’s a marvel no one exploited this example sooner. Some commentators imagine the rationale for which may be this “quirk” was introduced after the recent “Huge Daddy” update. Our Eastern European pal received together some servers, content material scrapers, spambots, PPC accounts, and some all-necessary, very inspired scripts, and mixed them all together thusly…

5 Billion Served- And Counting…
First, our hero here crafted scripts for his servers that would, when GoogleBot dropped by, begin producing an essentially countless variety of subdomains, all with a single web page containing key phrase-rich scraped content, keyworded links, and PPC ads for those keywords. Spambots are despatched out to put GoogleBot on the scent via referral and remark spam to tens of thousands of blogs around the world. The spambots provide the broad setup, and it doesn’t take much to get the dominos to fall.

GoogleBot finds the spammed links and, as is its function in life, follows them into the network. Once GoogleBot is distributed into the net, the scripts running the servers simply hold generating pages- page after page, all with a singular subdomain, all with keywords, scraped content, and PPC ads. These pages get indexed and suddenly you’ve got acquired your self a Google index three-5 billion pages heavier in below 3 weeks.

Experiences indicate, at first, the PPC advertisements on these pages were from Adsense, Google’s own PPC service. The ultimate irony then is Google benefits financially from all the impressions being charged to Adsense customers as they seem throughout these billions of spam pages. The Adsense revenues from this endeavor had been the purpose, after all. Cram in so many pages that, by sheer force of numbers, individuals would find and click on on the adverts in these pages, making the spammer a nice profit in a really quick quantity of time.

Billions or Tens of millions? What is Broken?
Word of this achievement unfold like wildfire from the DigitalPoint forums. It unfold like wildfire within the search engine marketing group, to be specific. The “common public” is, as of yet, out of the loop, and will most likely remain so. A response by a Google engineer appeared on a Threadwatch thread about the matter, calling it a “dangerous knowledge push”. Principally, the corporate line was they have not, in fact, added 5 billions pages. Later claims embody assurances the difficulty might be mounted algorithmically. These following the state of affairs (by monitoring the identified domains the spammer was utilizing) see solely that Google is removing them from the index manually.

The tracking is completed using the “site:” command. A command that, theoretically, shows the total variety of indexed pages from the location you specify after the colon. Google has already admitted there are issues with this command, and “5 billion pages”, they appear to be claiming, is merely one other symptom of it. These problems lengthen past merely the site: command, however the display of the number of results for many queries, which some feel are extremely inaccurate and in some cases fluctuate wildly. Google admits they have indexed some of these spammy subdomains, but to this point haven’t offered any alternate numbers to dispute the 3-5 billion showed initially via the site: command.

Over the previous week the variety of the spammy domains & subdomains listed has steadily dwindled as Google personnel take away the listings manually. There’s been no official assertion that the “loophole” is closed. This poses the apparent drawback that, because the method has been shown, there might be a lot of copycats speeding to money in earlier than the algorithm is changed to cope with it.

There are, at minimum, issues damaged here. The site: command and the obscure, tiny little bit of the algorithm that allowed billions (or at the least thousands and thousands) of spam subdomains into the index. Google’s current precedence ought to in all probability be to close the loophole earlier than they’re buried in copycat spammers. The issues surrounding the use or misuse of Adsense are just as troubling for individuals who could be seeing little return on their adverting finances this month.

Can we “hold the faith” in Google within the face of those occasions? Most definitely, yes. It’s not so much whether or not they deserve that faith, but that most people will never know this happened. Days after the story broke there’s nonetheless little or no mention in the “mainstream” press. Some tech sites have talked about it, however this isn’t the kind of story that can end up on the evening news, mostly as a result of the background knowledge required to understand it goes beyond what the average citizen is ready to muster. The story will most likely find yourself as an fascinating footnote in that the majority esoteric and neoteric of worlds, “ search engine marketing History.”

About the Author

If you want extra knowledge in relation to google plus 1 services, pay a visit to Harry D Yigeranitz’s Site unhesitatingly.

Use and distribution of this article is subject to our Publisher Guidelines
whereby the original author’s information and copyright must be included.

Hribar Thornbrugh

RSS Feed

Report Article

Publish Article

Print Article

Add to Favorites

Article Directory
Contact Us
Advanced Search
Privacy Statement
Disclaimer ? 2012, All Rights Reserved.

If you want extra knowledge in relation to google plus 1 services, pay a visit to Harry D Yigeranitz’s Site unhesitatingly.

Use and distribution of this article is subject to our Publisher Guidelines

whereby the original author’s information and copyright must be included.

 Share on Facebook Share on Twitter Share on Reddit Share on LinkedIn
No Comments  comments