There's been a bit of discussion lately about search and tagging (or "folksonomy"). Recently my friends Danny Sullivan (here and here) and Gary Price (here) have commented. I had dinner with Gary a few months ago in New York, so in addition to his published comments I know where he's coming from. Now, I'm the first to agree that tagging has a long way to go before it's accessible to millions of web users, but it holds a ton of promise. They both make a lot of great points about how usable this stuff is for the general web population.
But, sadly, I think they both miss the mark. Danny has picked a poor analogy for tagging - meta keywords. The meta keyword tag allows content publishers to insert keywords into their pages that describe the content. Since the keyword tag is invisible in a browser, it promised to be a mechanism for delivering metadata to search engines. It was a dismal failure. As Danny says:
In addition, none -- NONE! -- of these search engines now or ever has made use of the tag in a way to let you perhaps see all the pages "tagged" to be on a particular subject. Why not? The data is largely useless.
Thinking that tagging would lead to top rankings, some people misused the tag. Other people didn't misuse the tag intentionally, but they might poorly describe their pages.
He's exactly right. The data, for the most part, are garbage. Then Danny makes a flawed syllogistic leap - since meta keywords failed, tagging is doomed. But there's a subtle difference between meta keywords and tagging, and it's an important one. Meta keywords allowed content publishers to describe their own content. Tagging allows anyone to describe anyone's content. And this, as they say, makes all of the difference.
Let's take an illustrative example. Ask a random friend to describe his or her personality with a series of adjectives (like "funny," "quiet," and "arrogant"). Now, ask 50 people who know him or her well to do the same thing. Which is the better source of information? The 50 people of course. Even if your friend doesn't lie, you'll still get better data from the friends. Why? Because other people know us better than we know ourselves. Responses from a large, diverse group of people tend to be more accurate than responses from any one individual (it's the wisdom of crowds). Even better, the more people you ask, the more accurate it will get because the more you'll be able to identify trends and popular opinions, the more you'll get a power law distribution of the data. If you notice that 60% of the people describe your friend as "abrasive" that will tell you something. (How many people would describe themselves as abrasive?)
It's the same with tagging. Several hundred users (or even a thousand, or a million) can be an amazing source of valuable metadata about content on the web. But there are challenges. Here are some of the objections commonly raised: People will lie! People aren't using the same vocabulary - what if you say "bonds" and mean Barry Bonds and I say "bonds" and mean stocks and bonds?! People are terrible spellers! Welcome to the world of search.
I made this point at a recent BayCHI panel - tagging isn't new; the web is full of tags. But they're not in meta keywords, they're in the links. The text of the links pointing to other web pages are simply the web publisher's best effort to describe the page she's linking to. And it turns out those links are some of the most valuable metadata we have to work with in search. And you know what? They're subject to all of the flaws people say will doom tagging. Spammers lie. The spelling is atrocious. And there's ambiguity everywhere. But given a huge population of links, you can begin to make sense of the madness. Why? Well, there are humans on both ends of the search rope. There's a person searching, and there's a person who's written some content. The job of the search engine is to simply connect the two. Traditional software engineers, in their endless pursuit of the elimination of ambiguity, sometimes forget this. Search engineers embraced it. Take an example - the misspellings can actually be useful to a search engine. If somebody misspelled Britney Spears' name when they blogged about her, there's a good chance somebody will misspell her name when they search for her.
Now, I don't want to under represent the enormity of the search problem. Thousands of engineers at the major search engines (and lots of startups) are constantly battling to make the data valuable. PageRank, for example, is effectively a system for overlaying reputation on top of web links. We'll need the same thing for tagging. But tagging is promising because it brings a new voice into the equation - the users. Link text is great, but link text only represents the voices of the webmasters and the content publishers. It's the users that ultimately consume the content, and there are orders of magnitude more of them. That's why I disagree with Danny - if we can make tagging work, it could be a killer solution for search.
Comments