Re-thinking Research Strategies in the Global Information Village: suddenly everyone's an Information ProfessionalBy Dr. Pita Enriquez Harris
From: Proceedings of Online Information
Abstract
The Internet has made 'information professionals' of everyone who spends any of their salaried time looking up information on the web. With interfaces designed for the mass market, traditional information professionals throughout the world are having to readjust from the logic of data mining using Boolean operators, to the broader idiosyncrasies of Web research.
Fundamental to any attempt to use the web as a research tool, should be the recognition of the differences between the web and the traditional information database. That there was no such recognised ailment as 'Database Addiction' whilst there is apparently such a thing as 'Internet Addiction' should tell us something: unlike the online database, the Web is more analogous to a real geographic 'place' than a simple collection of documents. The Web may be described as the first home of the 'Global Village'.
This talk will examine five aspects of the Web which complicate web-based information research, developments which reflect the behaviour patterns of the Web-based business community as much as the technologies which they use. These are:
1. The capricious nature of web page authorship;
There is no guarantee that the information has been correctly classified, meta-tagged or even authored, which immediately implies a greater responsibility for quality control on behalf of the information 'consumer'. Disinformation can be the result of human error as well as intentionally misleading information. The world does not yet speak the same language, let alone the same meta-language; education in the use of meta-data will help to improve the way we understand and relate to information on the web. The printed word has come to play a huge role in modern life; however, it has had four hundred years to establish itself. Similarly, electronic information must undergo an evolution within society before cohesiveness is truly achieved;
2. The endless stream of new web pages;
No search engine can guarantee to include every page - and is unlikely to without a fundamental and draconian change in Internet publishing legislation. The libertarian nature and origins of the Internet make it unlikely that such moves will take place. At present the only solution for wide research is to use an overlapping panel of search engines, or the meta-search sites;
3. The reinvention of search engines as content-rich sites with brand identities;
Driven by advertising revenues, search engines are cultivating an audience among the mass of Internet users and not necessarily the information professional. In the end these sites do not necessarily attract more visits by providing the right information first time and may increasingly find that complex search facilities do not appeal to the majority of their users. The expertise which an information professional can offer lies in knowing not just how to look but where;
4. The increase in search engine 'spamming';
Web site marketers conspire to increase artificially the relevancy of their web sites by crafty use of meta-tags and 'bridge' pages. Whe most search engines offer results which default to ten-per-page, it will be simply impossible for all the relevant sites to be included in the first results-page view. Yet this is the most coveted position and it is not surprising that businesses will tweak the system to achieve it. Relevancy scores will soon become meaningless unless measures are taken to prevent these types of actions.
5. The wheat and the chaff;
Potential distraction by advertising and irrelevant search results can cause time-wasting to a greater extent than experienced when searching traditional databases. With information, too much can be as useless as none at all. An evolution of the Web towards communities interested in similar information has already begun and provides perhaps the best route to information research. Search engines which take advantage of these communities as well as the naturally hyperlinked architecture of the Web (such as the citations-based HITS engine) will be most suited to providing solutions to the problem of how to locate a web site.
Keywords
information, knowledge, research, disinformation, www, Web, "internet searching", "searching the web", "search engines", "global village", "spatial navigation", "information city", "spamming"
1. Living in the Global Information Village
The Internet has been compared to the Wild West, a new frontier, a land of opportunity. People head there in droves, seeking to change their lives and make their fortune. In interacting with other 'wired' individuals, communities are formed. The information space has become a place.
For anyone whose introduction to the Internet was in the pre-graphical browser days of mailing lists, newsgroups and real-time chat forums, information-finding strategies developed as a natural consequence of existing in this 'place'. For newer users of the Internet, and those accustomed to finding information in the more rigorously categorised world of databases, old habits may need to unlearned and new ones adopted. Susan Feldman of Datasearch provides an example, as she advises in a byline "Why we need to learn a new searching language" to her revealing article "The Internet Search-Off" [2], which compares the searchability and results obtained from the WWW vs. Dow-Jones Interactive and DIALOG.
Far form being at a disadvantage, however, information professionals have an opportunity to share with the rest of the digital community, the benefit of their experience. For if the Web is to become more navigable, organised and easy to use, every web publisher must acquire skills in information categorising and the attachment of meta-data.
1.1 Suddenly everyone's an 'information professional'
We can use an example from a popular US animated sitcom, "King of the Hill" [3], to illustrate three issues about information retrieval on the Web.
One episode recounts an event during which in an innocent attempt to explore his own ethnic roots, Bobby (aged 12) goes to the Internet to search for information on 'white' humour, types in keywords ("suddenly, everyone - even a 12 year old - is an 'information professional'"), finds a page about White supremacists (search engine returning results with the content but not the meaning required) and because he doesn't know enough about the subject, (the web's lack of easy validation of information, as a source of unclassified information, the wisdom needed to interpret the world of the web) uses it as a source for a stand-up comedy act... with unfortunate results.
This story also illustrates the increasing universality of the experience of 'database searching' - something now shared by information professional and child, alike.
At one time "information professionals" were the only people who were familiar with the use of keyword and Boolean operators in text-searching. Now everyone who uses the Internet needs to have some understanding of these issues.
1.2 The Web Is Not A Database - the Information City metaphor
What is the Internet? 320 million linked documents with a search interface? A more personalised, slightly more interactive version of television? People can become addicted, it is said to the Internet, whiling their lives away surfing the Web, emailing, reading newsgroups and chatting? Was anyone ever addicted to Medline-on-a-SilverPlatter? The fact that people become addicted is an important indication of one key aspect of the Internet - it fulfils the huge drive of human beings to interact socially. This is not a characteristic of any simple database. Furthermore, there is one good reason why to regard the Web merely as a large, disorganized database is to miss the point - it misses the main value of the Internet.
One metaphor which one might use for the Web is that of a City. Such a metaphor should naturally suggest strategies for navigation and also for information discovery
There are two main reasons why the Information City is an interesting metaphor for an information space like the Web: A city is both a spatial construct as well as a dynamic environment that lives from the interaction of the people in it. On the Web today you might be the only person looking at a particular Web page, or there might be 50 other users looking at that very same page at the same time. You will never find out.
Wouldn't it be an interesting idea to create "awareness" of the other people accessing the same information at the same time? If you think one step further you let people interact and communicate, either directly or indirectly. The interactions of a whole population of users on an information space can change the structure and the character of that space. Instead of a space it becomes a place and the residue of people's interactions with the space and the information itself can influence the space and therefore your navigation behavior. We can call this "social navigation" because navigation in such a social place is influenced by social processes. Social navigation can be direct, as when somebody recommends certain information or guides another person to that information or indirect, when we decide to access certain information because many other people did (it seems to be interesting) or because many people recommended this document, or when we follow a certain path through the information space because an especially large number of people used that same path before (very much like a well-trodden path in a forest).
The spatiality of a city metaphor is not strictly necessary for such social processes to happen, but it supports them. People have a tendency to organize things spatially and spatial concepts have a big influence on social interactions and behavior. Just think of the many chat-"rooms" on the Net which are not spatial at all, but people feel more comfortable using a spatial metaphor.
Essentially a metaphor like an Information City thus serves both as a structuring device for the information space, as well as a framework to support social interactions that influence navigation decisions. It also can serve as a skeleton for sub-metaphors describing enclosure, forbidden access and so forth. The city itself may not be the best of all spatial metaphors for representing the Web, but is has definite advantages: it is a conceptually big metaphor, consisting of many sub metaphors that can be used to represent concepts like enclosure, access, transportation etc. It is also a metaphor most users are very familiar with and it is a space where we are used to encounter many people we might not know yet.
One of the interesting aspects of spatial metaphors and social navigation is the concept of trust and authority. The important question is how much a recommendation of a person you never saw before is worth.
While it may well be that a large number of people strongly recommend a certain document, this does not at all mean that this particular document is indeed of interest for me. If however a highly regarded colleague recommends this document, then this one vote may well weight heavier than all the other recommendations together. This is a point where the strength of a direct (non-anonymous) social interaction between people becomes apparent.
It becomes apparent then, that trust must somehow be built into the relationship between web directories, search engines which try to incorporate human judgement (see section 2.4.5) into the ranking system. Whilst people still use it primarily as a directory, Yahoo does not stand or fall on the basis of it's interface design but by the quality of its information classification. This is because the wired community has come to know and trust its methodology. As soon as people, their experience, wisdom and knowledge become part of the equation, opinion and values will be involved. And the lack of universal agreement on anything is one of the universal constants - as well as a welcome source of business opportunities!
1.3 The Real Home of the Global Village
Since Marshall McLuhan's popularization of the term 'global village' [4] people have pointed to manifestations of the electronic media - radio, television, now the Internet, saying 'here, finally is the Global Village'.
Can the Internet be said to be the Real Home of the Global Village? Unlike radio or television, which depend on a strict hierarchy of powerful producers to select who or what is given air-time, the Internet can at least claim to be the opposite, built from the ground up rather than from the corporate down. An Internet presence - participation in the Global Village - is a possibility for a far larger proportion of the population than was ever possible through television.
Yet it is still far from being universal. And the commercialization of the Internet will necessarily force a battle between its inhabitants - one that we are already seeing.
2. The Use and Abuse of Search Engines
2.1 The Glorious Tyranny of the Search Engines
Search engines are simultaneously the saviours and tyrants of the Web. Without them we would either drown in the unstructured maze of information, or else remain rather provincial staying in our own neighbourhood of interests. We would be almost totally reliant on our virtual communities to provide a guide to Internet resources, knowing only the same group of sites, only very slowly branching out to find new information.
Search engines allowed us to discover, suddenly, sites about anything for which we were able to construct a query.
But in an web becoming populated with increasing millions of documents, search engines are doomed to fall behind the task - it has been estimated that even now less than a third of the documents are indexed by any one search engine.
Faced with such overwhelmingly references, all that is possible is what the semiotician and author Umberto Eco has called "the art of decimation" [5] - killing one person in ten or more accurately, killing nine hundred and ninety-nine thousand, nine hundred and ninety search (999,990) results in one million.
On our behalf, search engines conduct this decimation: this is the source of their tyranny.
2.2 Search Engines and the Art of Deconstruction
The tangible inhabitants of the Internet are the ideas, the writings of the individuals who sit behind the desktop terminals. These inhabitants have substance words and phrases- a material divisible into its summary components, which are then available for analysis, dissection, evaluation.
This is the job of search engines, to break each document (indexing), dismember even the most carefully crafted sentences until everything becomes stripped down to its most unpretentious components. Postmodernist writers have already raised the question of whether or not this is a valid a way of reading a document. In Italo Calvino's influential novel, "If On A Winter's Night A Traveller" [6], the literal deconstruction of texts is satirized in the story of a character who only reads literature after it has been indexed, claiming to read as much or more into the book from a simple list of frequently used words as from the entire text itself. (For an excerpt, see the appendix.)
It is clearly a ridiculous idea, worthy of such a satire; it is, however, what the vast majority of us are doing each time we use a search engine. We are deciding that we want to read a document in which such-and-such appears often, and so-and-so never, as though this can be some guarantee of meaning. Unless it is possible to be completely certain about the context in which a particular phrase will appear, the chances of finding exactly what you are looking for will depend on serendipity as well as good indexing.
2.3 Hunting vs. Harvesting
There are two distinct behaviour patterns employed by people looking for information on the Internet.
There are the hunters, who are looking for a particular piece of information. This type already knows what the information will look like; what will appear in the title, the URL, which keywords will be used. This type will benefit from the most powerful and largest search engines, which allow field restricted searches (e.g. title:The Oxford Knowledge Company), image research, all such features.
Then there are the gatherers. This type 'doesn't know exactly what I'm looking for but I'll know it when I see it.' They are gathering relevant information, often more than one document. In an ideal world this type would never need to type a query into a search engine but would simply use an intelligently organized directory and find everything there.
From the beginning searchers have been offered the two different models - search engine vs. directory - the text-searching power of Altavista vs. the human categorization of Yahoo.
Now search engine companies are beginning to blend in a combination of the two: witness Lycos's Web Guides powered by the Wisewire spidering/information categorising technology.
There is no argument that people need both directories and text-searching engines. The question is whether or not the technology or manpower available is adequate for the task of logging all the new web sites that are constantly being added.
In the Global Information Village we have come to rely on the search engines and directories to lead us to the information we need. We have made them powerful - and like all powerful entities they now fall prey to the uses and abuses of power.
2.4 Understanding Five Issues Which Complicate the Web
Any system for indexing the information on the WWW must currently face a number of complications, which arise from the freedom and ease of web publishing.
We will review these issues and discuss possible remedies.
1. The capricious nature of web page authorship - misinformation and disinformation
2.
The potential for disinformation via the Internet has been written about by Luciano Floridi in his 1996 paper (revised in Jan 1998) "Brave.Net.World: The Internet as a Disinformation Superhighway" [7, 8]. He examines two types of disinformation, involuntary and voluntary, which occur in all media but are especially prolific on the Internet.
Floridi argues that any information management system will inevitably involve the incorporation of some human-based errors into a document that may end up being accessed by many readers. Any business which uses a knowledge management package that relies on end-users attaching meta-data will appreciate that errors in document classification often occur when such classification is carried out by non-information professionals. Any end-user who has searched even professional databases will appreciate that even professionally classified information can be mistakenly labeled.
On the Internet this problem is potentially massive. Assuming that the majority of information is published in good faith with no wish to mislead, even so the majority of such information is, to all intents and purposes, ill-classified by the authors.
Meta-data that is commonly attached to any web page includes the description and keywords fields. Used correctly, this data can help search engines to carry out the task of returning the most relevant results to a search. However, a survey by SiteMetrics showed that only about 30 percent of sites on the web seem to make use of meta-tags [9]. Based on a study of 40,000 home pages, the survey found that only 30 percent used the meta-keywords tag and 27 percent used the meta-description tag, a finding almost identical with a similar study carried out last year.
With no or inappropriate meta-tagging, a document may sink lower in the relevancy ranking of a search engine's returned results. Involuntary disinformation, then serves to complicate the task of locating a relevant document and increases the requirement to use a variety of search engines for any serious research, spreading one's risk by making use of the fact that each engine's method of relevancy ranking differs slightly from another's.
There is clearly scope for improvement here. Better technique in information classification and more widespread use of the meta-tag are factors that can feasibly be brought about by education. A movement backed by the World-Wide Web Consortium to establish meta-data standards for the WWW has resulted in the Resource Description Framework (RDF). This standard draws upon the XML [XML] design as well as proposals from Microsoft's (XMLDATA) and Netscape (MCFXML]. Other meta-data efforts, such as the Warwick Framework and the Dublin Core have been influential in the design of the RDF [10, 11].
Whilst there is little doubt that universal adoption of such meta-data attachment would greatly improve the searchability of the Web, the entire movement makes one potentially erroneous assumption: that people can be incentivised to categorise information and attach meta-data. Most word-processing or spreadsheet programs have long offered the ability to include meta-data and yet it is likely that only the tiniest minority of most company's documents are so categorised.
Once it becomes more widely appreciated that this is a problem, it is possible, even likely that a 'law of the jungle' will operate, in which webmasters and other information publishers learn that unless their information is properly categorised, it is less likely to be read.
Voluntary disinformation brings its own set of issues. When this happens in a truly interactive environment, such as a newsgroup or an unmoderated but well cited forum like The Motley Fool, an erroneous post is very likely to be followed up by multiple denials. Indeed, sites like The Motley Fool rely on diligent and well-informed investors to repudiate any false information that may be posted about a company. So long as a researcher allows the constraints of a search query to return rebuttals as well as other statements, no harm is likely to be done.
When this type of activity takes place on a web site, which for all intents and purposes is that of a company (albeit a bogus one), it is much harder to detect and even professional journalists have been taken in and encouraged to write stories based on totally false information.
Guidelines for evaluating information found on the Internet have been published on the web sites of many academic libraries. In his paper "The Six Quests for the Electronic Grail: Current Approaches to Information Quality in WWW Resources", T. Matthew Ciolek [12] compares the Internet to a "hall of mirrors, each reflecting a subset of the larger configuration". The metaphor aptly illustrates a world populated by truths, half truth, outright falsehood, some larger and brighter than others, some reflected, or as the term itself implies, "mirrored", many times across the world. He also points out that standards for quality in the publication of books and journals have taken many years (400 since the invention of the printing press) to come about and that "it would be reasonable to assume that perhaps some fraction of that time, perhaps 10-15 years, might be enough to see all the current content, structural and organisational problems of the Web diminish and disappear".
3. The endless stream of new web pages
4.
In a publication entitled "Searching the World Wide Wed" [13], scientists at the NEC Research Institute confirmed what until then had merely been suspicion; the majority of the pages on the Web are not indexed by any individual search engine and search engines themselves each index non-identical fractions of the Web. The authors estimated the size of the indexable Web at 320 million pages. For reference it can be noted that at the time of writing, the current largest search engine Altavista, claims to index 140 million pages [14].
An immediate implication for information researchers is that it becomes necessary to rely not on one engine but many, or instead to use meta-search engines of the online or desktop variety.
The nature of the spidering technology used by most search engines is such that it is not feasible to expect that any will ever have a complete index of the web. They rely principally on the seeding of a site by its creators or else, the linking to it of another, already indexed site. Only draconian legislation of Internet publishing could change this - for example, FTP software that bundled compulsory submission of a new document to search engines. The libertarian nature of the Internet is likely to resist any such move - and rightly so. If change is to come about it must be from within the community itself and driven by enlightened self-interest.
5. The reinvention of search engines as content-rich sites with brand identities
6.
The past year has witnessed the rise of the 'portal' sites such as Netscape's NetCenter; NBC/Cnet's Snap; the new-look Lycos; the promised synthesis of content from Disney, Infoseek and Starwave [15]
The consistent pattern is that content providers have teamed with search engine companies to produce something that is a cross between a television channel and a newspaper, with the added attraction of personalization. The ability to search from these sites has become a logical accessory to a multimedia experience, rather than the other way around.
However, whereas initially search engines were created with information professionals and scientists in mind, equipped to support complex and technical search parameters, they have evolved into something quite different. Driven by advertising revenues, search engines are cultivating an audience among the mass of Internet users and not necessarily the information professional. Interestingly, most of the earlier search facilities are still available, if less well publicized than previously. But rather than educate the mass of new Internet users in the use of more traditional search methods, search engines have attempted to adjust to the reality of these newer users, increasingly supporting natural language searching into the interface.
For the information professional then, it may well become easier to increase the level of expertise compared with what might expect from a database-searching novice. The former is more likely to delve into the full text searching possibilities of a search engine. And provided that natural language searching always returns some documents of relevance, the novice is likely to be satisfied.
Search engines take great advantage of the 80:20 rule - for most situations a solution needs only to reach 80% to be adequate. In practice, however, they may provide significantly less than this. In a paper entitled "Metadata, an overview", Warwick Cathro [11] writes "… in most circumstances, searchers would be content with a small number of relevant documents, and would be willing to scan through a few dozen citations to identify them. Recall and precision factors of 10-20% are often acceptable for most purposes. However, our own experiences with Web search engines frequently involve precision factors of much less than one percent."
7. The increase in search engine 'spamming';
8.
Since search engines employ word frequency as part of the relevancy ranking of their results, they are susceptible to techniques that artificially increase the apparent relevancy of a document.
Search engine 'spam' comes in the form of meta-tags which repeat keywords many times or else, include non-relevant yet often requested keywords (such as words with erotic connotations). Or else, so-called 'bridge' pages are submitted for indexing - pages which do not hold actual content of a web site but which lead directly to a web site whilst themselves containing improbably high occurrences of key words under which that web site would like to be indexed.
Danny Sullivan's Search Engine Watch Update [16] is an excellent resource for surveys and case studies relating to the 'spamming' of search engines. One example described a company, which combined clever use of IP restriction (allowing pages to be viewed only be computers connected to a certain IP address) and so-called 'bridge' pages, to tailor sites for the major search engines. Each 'bridge' page was maximised for a different topic relating to the company's activities. The overall result is that in fact a good number (over 60) of the company's web pages were designed only for the aim of drawing traffic from search engines, and not to provide any useful information in their own right.
Such sophisticated means of manipulations of the search engines are a sign of the frustration that web designers inevitably feel at the difficulty of getting a web site into the 'Top ten' results. The deal between Altavista and Real Name, whereby individuals can pay for a Real Name address which automatically guarantees a number one rating at Altavista for that name, shows that businesses in particular, are likely to pay for a decent search engine ranking.
Those who can afford the time, money and expertise to elevate their web sites in the rankings may be able to use clever and legitimate types of 'spam' to gain good ratings. The quality of information does not, however, automatically equate with the budget available to publicize it.
9. The wheat and the chaff
Umberto Eco is fond of telling a story that illustrates the danger into which the Internet might fall as a result of the excess retrievability of information [17]. In a lecture to the Italian Academy for Advanced Studies in America, he states, "….certainly the Sunday NYT is the kind of newspaper where you can find everything fit to print. Its 500 hundred pages tell you everything you need to know about the events of the past week and ideas for the new one. However, a single week is not enough to read the whole Sunday NYT. Is there a difference between a newspaper which says everything you cannot read, and a newspaper which says nothing, is there a difference between NYT and Pravda?"
The search engine companies and now, the Web portals, balance a fine line between helping someone to find information and distracting them. It cannot be in these companies' ultimate interest for someone to type in a query, find exactly what they were looking for and immediately leave the search site. There would surely be no revenue from advertising if not for the fact that people demonstrably do click through to the sponsors' sites.
And the truth is that for many people, information of almost any kind has the potential to be distracting. The net result is to cause time-wasting to a greater extent than experienced when searching traditional databases. Until better ways of navigating the information on the web are implemented, this is time that must simply be factored into the cost of gathering information from the Web.
Perhaps the most exciting and encouraging development is the new citations-based search, Hyperlink-Induced Topic Search (HITS), developed by Jon Kleinberg whilst at Cornell University [18, 19]. This algorithm takes the result set of a standard text-based search and then expands the set to include all the pages linked to by pages in the root set. Then pages are ranked according to the number of pages that link to them and by how many pages they link to. The assumption is that citations in the form of hyperlinks to a page can be used as a measure of the relevancy of a page, within a particular field of knowledge. The result is a search that combines the power of a text-search with relevancy-ranking based on human judgement.
Whilst this approach is also open to abuse by web developers eager to increase a site's ranking in a search engine, it marks a real step towards systems for collecting and gauging knowledge about a subject on the Web. Such a system does not require 'voting' for sites (such as the Wisewire guides used on Lycos), but takes advantage of the natural architecture of the Web and the reality of it as a virtual community, an information city, a global village.
Conclusions
Improvements to the quality of meta-information on the Internet could simplify the job of finding information. However, to judge relevance one needs wisdom and experience. To remain on top of the developments in the Internet it becomes necessary to 'inhabit' the Internet. Virtual and actual Internet guides, machine-built as well as human, will be key in building sense into the chaos of the Internet. Information consultants and other professional filters will increasingly act as diplomats of the Web - representing their clients in a world of enormous complexity and interpreting that world for them.
Dr. Pita Enriquez Harris
Director, The Oxford Knowledge Company Limited,
Oxford Centre for Innovation, Mill Street, Oxford OX2 0JX, England
Phone +44 (0) 1865 251566
Email pita@oxford-knowledge.com
WWW http://www.oxford-knowledge.co.uk
Appendix
From Italo Calvino's novel "If On A Winter's Night A Traveler", Chapter Eight.
Lotaria shows me another series of lists. "This is an entirely different novel. It's immediately obvious. Look at the words that recur about fifty times:
#
#
had, his, husband, little, Riccardo (51), answered, been, before, has, station, what (48), all, barely, bedroom, Mario, some, times (47), morning, seemed, went, whom (46), should (45), hand, listen, until, were (43), Cecilia, Delia, evening, girl, hands, six, who, years (42), almost, alone, could, man, returned, window (41), me, wanted (40), life (39)
"What do you think of that? An intimatist narration, subtle feelings, understated, a humble setting, everyday life in the provinces…As a confirmation, we'll take a sample of words used a single time:
#
#
chilled, deceived, downward, engineer, enlargement, fattening, ingenious, ingenuous, injustice, jealous, kneeling, swallow, swallowed, swallowing…
"So we already have an idea of the atmosphere, the moods, the social background…. We can go on to a third book:
#
#
according, account, body, especially, God, hair, money, times, went (29), evening, flour, food, rain, reason, somebody, stay, Vincenzo, wine, death, eggs, green, hers, legs, sweet, therefore (36), black, bosom, children, day, even, ha, head, machine, make, remained, stays, stuffs, white, would (35)
"Here I should say we're dealing with a full-blooded story, violent, everything concrete, a bit brusque, with a direct sensuality, no refinement, popular eroticism. But here again, let's go on to the list of words with a frequency of one. Look, for example:
#
#
ashamed, shame, shamed, shameful, shameless, shames, shaming, vegetables, verify, vermouth, virgins…
"You see? A guilt complex, pure and simple! A valuable indication: the critical inquiry can start with that, establish some working hypotheses…. What did I tell you? Isn't this a quick, effective system?"
Biographical details
A graduate of St. Catherine's College and St. Cross College, Oxford, Dr. Pita Enriquez Harris worked for five years as a research molecular biologist in the University of Oxford, during which time she published several scientific papers. Inspired by the revolutionary potential for networking and communication inherent in the growing popularity of the Internet, Pita decided to put her experience and knowledge of the biomedical field to another use. In June 1997 Pita co-founded The Oxford Knowledge Company Limited, a company that exists to assist businesses and individuals to extract relevant information from external sources. As well as working as a biotechnology industry analyst, in the past year Pita has devoted her time to cracking the secrets of finding information on the Internet and to helping design the database technology which powers OKSYS (Oxford Knowledge System), the flagship knowledge-discovery service from The Oxford Knowledge Company.
References
1. Most of the references listed are to URLs, which were accurate at the time of writing (July 1998). If any of these sites are no longer up-to-date, please try the following URL, where our database technology is being used to ensure that the references are current: http://www.oxford-knowledge.com/online98
2. Feldman, Susan, (1998) "The Internet Search-Off" in Searcher Vol 6, No. 2 p28-38 http://www.infotoday.com/searcher/feb/story1.htm
3. Fox TV's "King of the Hill" episode "Traffic Jam", for summary see: http://www.foxworld.com/koth/guide214.htm
4. McLuhan, Eric, "The source of the term, 'Global Village'" in McLuhan Studies Issue 2 http://www.chass.utoronto.ca/mcluhan-studies/v1_iss2/1_2art2.htm
5. Coppock, Patrick (1995) in "'A Conversation on Information', an interview with Umberto Eco by Patrick Coppock, February 1995" for Multimedia World http://www.cudenver.edu/~mryder/itc_data/eco/eco.html
6. Calvino, Italo (1979) "If On A Winter's Night A Traveller" English translation by William Weaver, published 1981 by Picador ISBN 0330267150
7. Floridi, Luciano (1998) "Brave.Net.World: The Internet as a Disinformation Superhighway" (Version 3.1) http://www.wolfson.ox.ac.uk/~floridi/disinfor.htm
8. For a long list of articles about guidelines for evaluating the quality of information on the Web, see http://www.lib.auburn.edu/madd/docs/eir.html
9. SiteMetrics Web Content Survey http://www.sitemetrics.com/contentsurvey/
10. Miller, Eric (1998) "An Introduction to the Resource Description Framework" in D-Lib Magazine May 1998.http://www.dlib.org/dlib/may98/miller/05miller.html
11. Cathro, Warwick (1997) "Metadata: An Overview" in paper to Libraries Division at the Standards Australia Seminar, "Matching Discovery and Recovery" August 1997 http://www.nla.gov.au/nla/staffpaper/cathro3.html
12. Ciolek, T. Matthew (1996) "The Six Quests for the Electronic Grail: Current Approaches to Information Quality in WWW Resources" http://www.ciolek.com/PAPERS/QUEST/QuestMain.html
13. Lawrence, Steven and Giles, Lee (1998) "Searching the World Wide Web", Science vol 280 April 1998, p98-100
14. Ni hEilidhe, Sorcha (1998) in an editorial to NUA Internet Surveys, June 20th 1998 http://www.nua.ie/surveys/analysis/weekly_editorial/archives/issue1no30.html
15. Altavista 140 million pages (June 1998) http://altavista.digital.com/av/oneweb
16. Sullivan, Danny (1998) Search Engine Watch Report, #15 Feb 1998 http://searchenginewatch.com/sereport/
17. Eco, Umberto (1996) "From Internet to Gutenberg", a lecture to the Italian Academy for Advanced Studies in America at Columbia University, November 1996 http://www.columbia.edu/cu/casaitaliana/internet.htm
18. Gibson, David, Kleinberg, Jon and Raghavan, (1998) "Inferring Web Communities from Link Topology" in the Proceedings of the 9th ACM Conference on Hypertext and Hypermedia 1998
19. "Hits and misses" 1998 in The Economist June 20th 1998




