My recent post about Wikipedia’s Wikia linking brought on some emotional responses. Since there seems to be some misunderstandings about what I’m arguing, I’ll in this post lay out what is hopefully a more succinct description of why Wikipedia’s actions are both unfortunate, and ironically enough promotes spam rather than combats it on a large scale.

First of all, please be mindful that nowhere am I making the argument that ‘spam is good’, nor that Wikipedia should be a platform for spam. You will notice that Playing With Wire is not linked to by Wikipedia, and that we have no direct self interest in the use or lack of ‘no-follow’ tags on Wikipedia. What I do have an interest in is the health of the internet as a whole, and I believe that there is a risk that Wikipedia is causing harm to this health with its recent actions.

At the crux of the matter is the way modern search engines separates spam from useful content. Google and other search engines separate valid content from spam by inspecting the way the world wide web is interlinked. A site is considered ‘trusted’ if it has many inbound links from other trusted sites. The theory is that since humans make most links, sites that are useful for actual people get plenty of human links over their life span, while spam sites only get links from other spam sites. If you score sites based on the quality of their incoming links, you will then over time see some sites rise above the general noise. As far as Google is concerned these are the ‘non spam’ sites – other trusted websites have confirmed their validity.

You will notice that there is something circular about this system – a catch 22 if you will. To know the trusted sites on the internet, you have to already know what sites can be trusted so that they may vote. To solve this apparent paradox, Google will seed the system so that every site has some kind of base trust. From there on Google starts to count: outgoing links ‘give’ trust to other sites, and incoming likes conversely ‘receive’ trust from other sites. A mathematical formula balances the total amount of ‘trust’ so that eventually a stable structure crystalizes.

You may think of this trusted structure as the sea with little trusted islands rising out of it. Google gives you good search results because most of the time it can find you an island rather than having to dive into the sea floor mud of spam and noise that is the general internet.

It is this structure and balance that makes Wikipedia’s choice of anti-spam technique so unfortunate. Since a lot of trusted sites have given their vote for Wikipedia, they have essentially lowered themselves a little bit into the sea in the process. Normally, this would be fine because when a trusted sites lowers itself in this way, it will cause other islands to rise. These other islands in turn give away some of their buoyancy to yet other islands, and so forth. In the greater scheme of things the mud stays on the bottom and the islands stay on top.

Wikipedia has over time built a very strong position within this system. Wikipedia is one of the most trusted sites on the web as far as Google is concerned. It still amazes me how often Wikipedia comes up right on top in search queries. Wikipedia is essentially one of very few mountains in our sea analogy. But by not voting on other valid sites, Wikipedia is pushing every other island back into the mud by its own sheer weight. Google can no longer give us as many valid search results because the islands are closer to the mud as compared to Wikipedia. These sites gave away their trust in the greater balance to a site that doesn’t give anything back.

In a system where you measure importance as the relative difference between the average and the peaks, having an enormous peak will reduce the effectivity of the system. What’s worse, Wikipedia is setting a very distressing example. Imagine for a moment that every site on the internet decided to do what Wikipedia is doing now, and only use no-follow tags for external links. This is in the individual interest of every site, as they no longer give away their votes. But in doing so, the whole system is ruined as every site would be reduced to the level of the mud.

Ironically, Wikipedia is promoting spam on the internet in the process of trying to rid itself of it.

  1. Curt Sampson says:

    Looking at the comments on the previous post, a specific example might demonstrate better what the problem is.

    The Wikipedia article Atomic Bombings of Hiroshima and Nagasaki includes 84 footnotes, a fairly good demonstration of how reliant Wikipedia is on external sources. It also includes 92 links to pages on other sites that the authors either used as references or felt offered further useful information on the topic.

    Wikipedia, with its nofollow strategy, is telling Google that every one of those sites is equal in importance to a spam site. Is that a good thing? Is that even an appropriate thing, given how heavily Wikipedia used information from those sites in its article?

