Kosmix and the Semantic Web

Saturday, February 07, 2009

I just recently ran across an “interview”:http://www.beet.tv/2008/06/kosmix-topical.html with Anand Rajaraman, founder of Kosmix, and something that was said toward the end of the interview piqued my interest. The subject of the Semantic Web came up, the existence of which Anand claimed would far more likely be brought about by apps “mining intelligence” out of the internet’s squall of information, rather than the universal adoption of a common semantic ontology like RFL. We certainly agree with that, as we believe that the winners of the race to establish the next generation of web search will be the ones who mine intelligence the most efficiently. However, something stuck in my craw about Kosmix being held up as an example of the various expeditions presently being made in this general direction. Which isn’t to say that I think Kosmix is not on such an expedition, but rather that I seem to be feeling the same vague perturbation I felt when I first made an expedition of my own through the flurry of noise on Kosmix, after hearing about the explorative experience supported by their search engine.

Even though Kosmix can most certainly be counted among the various search engines taking steps, in one way or another, toward bringing to life what folks call the Semantic Web, the steps being made range from insufficiently ambitious for my taste to simply misguided (if one’s goal is to intelligence-mine one’s way to the Semantic Web). Let’s start with a property of Kosmix which I’ve already voiced my complaints over: the noise. In my last blog post, I responded to a blog article written by Mr. Rajaraman, in which he likened Kosmix searches to “exploring haystacks” rather than looking for needles, as in most keyword searches. A series of commenters to the blog, who I quoted in my article, noted in various terms that there was simply too much noise in the search results, and it was too hard to bring the search into focus when one wanted. These complaints remind me of an “article”:http://www.techcrunch.com/2008/04/17/web-30-will-be-about-reducing-the-noise%E2%80%94and-twhirl-isnt-helping/ I caught on TechCrunch a long while back by Erick Schonfeld, entitled “Web 3.0 Will be about Reducing Noise—And Twhirl Isn’t Helping.” As one would expect, among Erick’s various complaints about Twhirl is the claim that the next generation of web search/exploration will be very much concerned with noise reduction, and that this is not incompatible with many people’s view that it will be about the establishment of a Semantic Web. But it seems pretty clear to me that, not only are these two visions of the future compatible, they are necessarily intertwined.

Tim Berners-Lee has described the transition to the Semantic Web as being centered around a shift from a view of the internet as a collection of documents to one of a collection of data or knowledge. Building the Semantic Web means turning the internet from a collection of documents over which keyword searches allow for retrieving pieces of information, into a coherent, navigable corpus of knowledge, from which one can similarly retrieve pieces or bodies of information of arbitrary breadth and depth. Making this transition means seeking to establish a representation of every piece of information in every document online, as well as the many ways in which each piece of information relates to other pieces of information. Establishing the requisite coherence in the web needed to support the sort of web search we dream about, in which one can retrieve information ranging from a simple answer for a simple question to a crash course in an academic discipline, means applying an overwhelming amount of structure to the web, and presenting users with a representation of the web (or small parts of it at least) that reveals that underlying structure and lets them freely navigate it. Although Kosmix may seem to so apply structure to the web, by seeking out all the possible types of search results the user may be looking for, and amassing them all in a profile page for the topic submitted by the user, if one spends a fair amount of time really trying to explore a subject, whatever structure is represented only becomes increasingly obscured. For any query submitted, Kosmix assembles a “profile page” for that topic, which amounts to a little bit of everything you could have possibly been asking for. Attempting to refine the search by selecting a related item “In the Kosmos” from the right margin, only leads to Kosmix casting yet another wide net, and another mess of results which could be what you were looking for. As I’ve said before, I respect their attempt to support the sort of web exploration we want to see made possible, but the utter lack of dexterity on the user’s part in navigating the search results makes it utterly impossible to truly explore. Furthermore, simply retrieving every peripherally related search result for a given query, and trying to fit as many of them on one page as possible, does nothing to reveal any underlying structure. Any intelligence they may be mining out of the internet on the back-end is lost in the noise on the front.

Again, we count Kosmix among the groups participating in the gradual progression towards the Semantic Web. We greatly respect what they’re trying to do, and Anand’s metaphorical contrast between exploring haystacks and searching for needles on the web certainly resonates with our goals. But as I’ve written previously, the degree to which the user can effectively engage in such exploration is closely correlated with the dexterity with which the user can sift through search results. When there’s no way for the user to focus a search at will, and clear out the noise, that dexterity is greatly limited. Kosmix needs to restructure the way they present their search results, as well as how they let users navigate results, so that they’re not just returning haystacks, and the user can most effectively explore “The Kosmos.” In the interview I mentioned above, Anand notes that the Semantic Web will most likely be brought about by the efforts of a number of different companies. Without a doubt, the push toward the Semantic Web will draw upon the collective efforts of a diverse range of organizations taking a number of different approaches to mining intelligence out of the web, and many of these organizations would greatly benefit from collaborating with or learning from others taking different approaches to the push forward. Just as we’re always looking for ways to ultimately better allow our users to forage the web, and new audiences or organizations who would find our search and research services particularly useful, we hope the Kosmix team will consider venturing a bit further from the traditional way users interact with a search engine, in order to maximize the usefulness of their search engine.