Footsteps in the SnowBy Pita Enriquez Harris
A lack of history is something to suffer from; this has made the Internet economy an environment that is simultaneously exhilarating and uncomfortable in which to operate.
Now that the Web has all of (phew) six years under its belt it is interesting for those of who remember the first days to see some of the early technology promises come good. For me, the experience recalls one I enjoyed in my last year as a scientist, when, attending a seminar by my doctoral supervisor, I was delighted to see that years after I had left his lab, experiments which we'd only dreamt of previously, had finally been done and even demonstrated the results which we'd predicted.
What you spend minutes dreaming about in technology inevitably takes many years to achieve.
Web searching technology has made huge strides in just a few years. Searching for pictures of Arsenal and Liverpool FC badges recently, I remembered with amusement the sorts of errors such a search would have been prone to in the past. Hundreds of results about weapons and Beatles memorabilia perhaps? And why not - there are many other contexts for the words 'arsenal' and 'Liverpool'. Search engine technology has moved on since those days now, thank goodness, and some of the increased relevancy features we were promised back then have now come good.
The most significant breakthroughs have come from recognising that searching the Web is NOT like searching a 'traditional' electronic database (using the word 'traditional' in its loosest sense here!).
The Web is a literally finite space but might more usefully be thought of as an infinite space. A search engine's index of the Web is finite yet unfinished - there are always more pages to spider and the index is always out of date.
As an information resource, the Web is a very human construct. The extensions to its networks and pathways develop stochastically, like a society. Efficient searching of the Web ought therefore to take this into account and as far as possible, mirror the 'real' human experience of information retrieval.
In a primitive society, people's information retrieval strategies involve following the line of least resistance - asking someone who might know. But few of us live in such a society and haven't for hundreds of years. There's too much to know for any one person, even for a few people, and the chances of finding all the information you need from just your immediate social group are now minuscule. So we have books, catalogues, libraries, newspapers, archives.
With the Web, however, we can get back to behaving in our 'primitive society' mode. The number and types of people whose knowledge we now have easy access to has quite suddenly ballooned.
It is appropriate that Web searching technology should adapt to that particular fact of the Web-enabled society.
In 1998, I co-authored a paper with Andreas Dieberger of Emory University, in which he expounded upon the metaphor of the Web as an 'Information City'.
Andreas wrote: "There are two main reasons why the Information City is an interesting metaphor for an information space like the Web: A city is both a spatial construct as well as a dynamic environment that lives from the interaction of the people in it. On the Web today you might be the only person looking at a particular Web page, or there might be 50 other users looking at that very same page at the same time. You will never find out.
... Essentially a metaphor like an Information City thus serves both as a structuring device for the information space, as well as a framework to support social interactions that influence navigation decisions. It also can serve as a skeleton for sub-metaphors describing enclosure, forbidden access and so forth. The city itself may not be the best of all spatial metaphors for representing the Web, but is has definite advantages: it is a conceptually big metaphor, consisting of many sub metaphors that can be used to represent concepts like enclosure, access, transportation etc. It is also a metaphor most users are very familiar with and it is a space where we are used to encounter many people we might not know yet."
In the Information City, we can once more seek information by asking "Who else knows this?" or, "Who else knows this because they too have asked this question?". Thus, by direct and indirect 'social navigation' we can follow the 'footsteps in the snow' of people who have queried search engines for the same or similar information.
Google and Direct Hit and CLEVER are three search engines whose relevancy algorithms incorporate some form of 'social navigation' - and all in different ways.
Google uses 'link popularity' to rank pages, which is something like Web pages voting for each other. Put simply, when you search for a word using Google, your list of results is ranked according to which pages score most highly because other Web sites link to those pages.
Direct Hit uses this popularity-based method to enhance search results, measuring what people click on in the search results presented at its own site and at partner sites, such as HotBot. To click on a site is to raise it in Direct Hit's rankings.
CLEVER is a development of the IBM Almaden research labs and has yet to be licensed to provide core technology to any of the major Web search engines, although presumably this is the eventual plan. In contrast to Google, Clever first searches and then ranks the results according to link analysis of the 'live' results. The process is slower, but apparently more successful at identifying what IBM researchers call 'authority' and 'hub sites'; hubs have many good links out, and authorities have many good links in. CLEVER gives each page both a hub score and an authority score, in contrast to Google which looks only for authorities.
I used the Web a great deal before Google and Direct Hit; trust me, things have improved a great deal. Moreover, I look forward to the benefits that CLEVER technology will bring. But meaning is a complex issue, even humans can rarely agree on the meaning of subtly different information. Technology has a long way to come before it solves search - but until we have the highway, we at least have footsteps in the snow.




