Filtering the Web of Noise

by Louis Marascio on January 1, 2010

A man holding his ears in an attempt not to hear

The modern Internet citizen is faced with what I’ll call a participation dichotomy. The social web, through its emphasis on connections and sharing has emerged as a predominate force driving findability on the web. Unfortunately, to participate in this valuable human filtering of information one must connect and follow vast streams of input data. This has the side effect of creating massive amounts of noise, making the goal of learning potentially more time consuming than before.

During Internet 1.0 we found things on the web in two ways:

  1. A personally directed effort through search and browsing
  2. Sharing amongst directly connected, real-life peers

Contrast this today where there is a third option, becoming more and more dominate: the sharing of information amongst loosely connected peers—namely people you’ve chosen to follow via asynchronous means (Twitter, RSS, whatever).

My Google Reader is filled with over 200 subscriptions to blogs and news outlets. I’m fairly selective about what I subscribe to, but even this modest amount of daily input is overwhelming. I follow approximately 800 people on Twitter, and that is beyond manageable. Why do I do this and why not reduce my information input by unsubscribing and unfollowing? Simply, that would disconnect me from the social web, something that I find value in participating in. Even though I can’t read every incoming tweet, I find nuggets of wisdom on a daily basis. And even though I can only scan my Google Reader, I still find valuable information.

Unfortunately, it takes entirely too much time to maintain. This is the Participation Dichotomy of the Social Web. We are motivated to participate, connect, and follow so that we can gain the benefits of human filtering and information sharing; however, the tools to manage the information inflows have not evolved to handle the amount of data we are dealing with on a daily basis. The noise level grows in proportion to the amount of input, and eventually that noise level makes finding personally valuable information in the input stream unbearable.

There are ways to solve this today. You can meticulously organize your inputs, so that they are categorized based on topic or priority; or, you can delegate the task of first line filtering to someone else, and only read the information of value that passes the equivalent of a narrow-band human information filter. I find both of these options inadequate, unscalable, and potentially disjoint to the goals of the social web.

First, it is very hard to maintain accurate categorization on incoming data. I follow people, and those people share a vast array of different types of information. One person might share links to articles about entrepreneurship, machine learning, and NBA basketball. Categorizing this manually is next to impossible, especially without help from the sharer in terms of tagging.

Second, having an assistant filter my information is, personally, unappealing. Delegating the task of learning to another human, and allowing them to judge whether a piece of the web is worthy of my attention goes against my nature. I expect that I would find it challenging to effectively manage the configuration of that person so that they could effectively filter for me. My interests are dynamic and the priority to which I would assign each changes daily. This is not a scalable solution.

Fred Wilson, a prominent venture capitalist and blogger, recently wrote a blog post entitled “People First, Machines Second“. I found the article very interesting and the discussion within the comments even more so. I think Fred is right, the first level of filtering on the Internet needs to occur within people. This has been happening for a long time: Google’s PageRank algorithm emphasizes the supposadly human action of linking to rank the search results returned. To summarize Fred’s point:

Someday machines may be smart enough that they don’t need humans to give them cues, but today I believe the state of the art in machine intelligence right now is ‘humans first, machines second’ as Google did it… [You] need humans first, then the machines can take over.

This filtering is happening across the social web everyday on Twitter, in blog posts, and on Facebook. Information of value is being shared by people who found utility in a site, blog post, PDF article, or YouTube video. It is this sharing that is the fuel for the social web’s engine.

How then, do we evolve our tools to handle our personal information inflow? The tools need to become smart. Machine learning, through personal reinforcement, should be used to create a dynamic information filter that I can apply to my daily life. Science Fiction is riddled with examples of this. One of my favorites comes from Peter Hamilton’s Night’s Dawn Trilogy. In it, the future-people have “e-butlers” that manage the information flow to them on a continuous basis. BTW, the books are great—I highly recommend them.

This is different than having a human filter my daily information in-flow because the tool can continuously learn what’s interesting to me. Online learning algorithms already exist and can continuously learn and adapt. The key to making this all work is the passivity of filter in terms of feedback. I don’t want to always have to tell the tool what’s most important to me as that changes. I want the tool to do its best at learning, and then ask me to reinforce (positively or negatively) whether it made the correct choice.

Building such a tool is not trivial from either an algorithm, technology, or user experience standpoint. That’s probably why it has yet to be done in a broadly successful way. In my mind, this represents a major problem with the scalability of the social web and will, in turn, be a large opportunity for smart entrepreneurs in the next decade.

(photo credit: striatic on Flickr)

Previous post:

Next post: