The early world wide web was chiefly used by companies to publish content and advertisements about themselves. In Ireland, it used to be the “worldwide wait” while our weak broadband links retrieved pages of interest. Today, the web has developed into a social network in which all of us can easily contribute content ourselves, by Tweets, Facebook posts, online comments and so on. However sometimes I wonder has it become the “worldwide what the heck” as inane and puerile content is frequently automatically presented to us as a frustrating distraction.
Too often when we search the web, completely irrelevant search results are presented which often hide the real answers which we seek. Frequently when using a social network, pointless advertisements are served up to most of us, for extraneous offers such as animal print clothing, teeth implants and half marathons. Why is the web so weird and witless ?
Sir Tim Berners Lee is the founding father of the web, and built the world’s first web site in 1991. Since 2001, he has been promoting the “semantic web” as an extension of the current world wide web to give well-defined meaning to the information available via the web, so enabling better co-operation of computers and people. Much of today’s web software does not understand the meaning of web pages: while it may understand that a page should be formatted in a certain way to look pretty on a screen, it may not understand that for example my current page is relating to prescribed treatment from my mother’s doctor and now requires identifying a reputable physiotherapist within 10km of her home. Sir Berners Lee advocates that if a reasonably large amount of the information and data available worldwide on the web can be categorized, sorted and understood by computers, then the web would become immeasurably more valuable as a global resource.
The first step for the semantic web has been to classify information using taxonomies (akin to the Dewey Decimal Classification system widely used in libraries). This can then be augmented by ontologies, which are akin to equipping a computer with concepts: for example, the concept of a “company” is a “legal entity” owned by “shareholders” and at set of “places” where “people” come together to offer a “service” or “product” bought by “customers” in conjunction with “partners” and “suppliers” and obeying “regulations” established by “government authorities”. In principle, once data and content is labelled and tagged using these approaches, then more intelligent software tools could be engineered to not only understand how to lay out a web page for a screen, but also to understand what each page is actually describing.
There are some obvious challenges. Some of the content on a specific web page could be vague and incomplete – or even sarcastic, ironic or deceitful. Taxonomies and ontologies should work across all languages. In the absence of a global central authority, independently developed ontologies may be inconsistent: is the concept of a “company” in Ireland entirely correct for, for example, an IFSC back-office operation? Less obvious, but a key technical point, is that on the one hand web pages are currently structured as “trees” (a page contains sections which in turn contain paragraphs which in turn contain sentences..), but on the other hand knowledge is structured as “graphs” (for example, people do not contain other people, but rather could be related, and/or friends, and/or share interests, and/or work for the same company).
The “social semantic web” adds human intervention via social networks to the semantic web. The automatic classification of web content using taxonomies and ontologies can be augmented by collaborative labeling and tagging of data by humans. Some strategists believe that exploiting social networking can lead to higher quality results: for example, if I am seeking a good physiotherapist within 10km of my mother’s home, would asking my social circle of friends and acquaintances lead to a better result than that from an automatic search? If physiotherapists want to advertise online, which is the optimal online advertiser to use?
The quest for higher quality online advertising – the “right ad at the right time in the right place” – is a strong commercial catalyst for a better and wiser web. Google attempts to solve the right ad challenge by inferring our interests based on observing for what we search. Facebook attempts to solve it by analyzing our chit-chat with our friends. The more that an online advertiser can encourage us to directly or indirectly tell it about our interests, the more likely it can become highly successful – and useful! Despite what many in the traditional newspapers believe, I believe that for them there is a substantial opportunity online since they could then observe which specific articles we each read, and tailor advertisements to each once of us accordingly.
However, it would be disappointing if the sole benefit of the semantic web were better targeted advertising. Rather we should expect the semantic web to actively assist us, gently intervening when appropriate, politely bringing things to our attention. Currently we browse the web based on keywords we give the search engines, what our friends recommend, and what we come across. Rather than results from simple matching of keywords and just from what our friends like, imagine if web software tools were sufficiently powerful to unearth the latent intelligence already in the web. Medical clinicians may discover new links between diseases, deduced from research results already available today but currently lost in the mass of the web. Historians may realize that particular events are related, based on evidence whose importance had been overlooked. Researchers, journalists, genealogists and indeed all of us may all discover new relationships between stories, events and data which were latent but hitherto unrecognized in the web.
There have been two decades of the world wide web, and a decade of the semantic web, but the web still has many “what the heck” moments. There is substantial opportunity for innovation to make the web wise and intelligent.
[Disclosure: Chris Horn is an Advisory Board member of the SFI funded DERI project at NUIG researching the semantic web. He is also Chairman of Sophia Search, a Belfast based company with semantic search and discovery solutions.]