Sunday, September 26, 2010

An encounter with Tim Berners-Lee and the Semantic Web

The students file into an ordinary, medium-sized classroom in building 4, near the center of campus. Outside, it's a beautiful afternoon, a few days before the autumnal equinox. The room is brightly lit, thanks to the room's tall windows. Muffled sounds of trumpets and horns can be heard nearby -- there is an active music community at MIT, and some students take classes in music and the performing arts in Building 4.

After everyone has settled into their seats, the professor gets up in front of the class. He is thin, has gray hair, and wears the standard faculty attire -- khakis and a long-sleeved, light blue button-down shirt without a tie. Seeing him walking down the corridor, most would have no idea who he is, but to a few he's given away by the large MacBook Pro tucked under one arm, covered with stickers, including one from the W3C -- the World Wide Web Consortium.

The man is actually the director of the W3C and has played a remarkable role in the history of computing, and, indeed, the course of human history. He's Tim Berners-Lee, the inventor of the World Wide Web -- arguably the most important communications invention since Gutenberg used movable type to create the first printed bible.

Everyone reading this post has been touched by the Web in untold ways. For some people, including me, the Web has changed their lives. Now I am about to hear about another Internet technology that Berners-Lee hopes will make as big an impact: the Semantic Web.

Tim Berners-Lee in the classroom

Berners-Lee starts talking. He has an English accent, I'm guessing from somewhere in the Southeast. In front of this new audience he talks quickly, the thoughts sometimes tumbling out faster than he can speak them.

The first thing he writes on the chalkboard is http:// and a domain name -- two of the fundamental elements of the World Wide Web. He adds an anchor tag.

"To a certain extent, when you go to the Semantic Web, you'll have to leave that all behind," he says.

Berners-Lee writes a URI, http://www.w3.org/People/Berners-Lee/card#i, and explains that it returns data, not a Web page.

"This," he says, pointing to the URI, "is me."

As a muffled horn ensemble begins to warm up in the next room, he gives a primer on the Semantic Web, how it's different than the World Wide Web, and some of the basic concepts that make it work -- URIs (not URLs), XML, RDF (see my post from earlier in the week), triples, ontologies. These technologies can turn the World Wide Web into a linked, queryable database, and give relationships and meaning to otherwise unstructured data on the Web.

Berners-Lee likes to draw diagrams of the RDF graphs, and sometimes uses the circle/arrow notation that's used to model Linked Data relationships (I am using "Semantic Web" and "Linked Data" interchangeably, per the usage employed by one of the other instructors later in the class). He shows the standard "Subject-Predicate-Object" (aka subject-verb-object) format used for triples, and describes how they might be used to describe certain relations:

Semantic Web Triple
Tim Berners-Lee (subject) has an assistant (predicate) Amy (object) . 

And vice-versa:
Each one of the elements in these relationships will be links. For unique entities, like a person, there should be a document that describes all of the properties of that individual. As described above, Tim Berners-Lee's is http://www.w3.org/People/Berners-Lee/card#i, and contains information such as his public home page, photographs, projects he's participated in, and even the people he knows. Everything in the list is a link. For common verbs or relations, there are definitions already in existence that can also be referenced by a link, so new definitions need not be created from scratch. The idea of the Semantic Web is these machine-readable entities, relationships, and descriptions can be used for queries or specialized applications -- for instance, "Who is Tim Berners-Lee's current assistant?" or "What is TBL's assistant's email address" or "return a list of all of the email address of current MIT faculty assistants". The beauty of the Semantic Web is the data is (ideally) readily available on the Web, instead of a proprietary database somewhere, and can be manipulated by software agents.

Linked Data diagram
Linking Open Data cloud diagram, by Richard
Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Students in the class ask questions. They vary in complexity. The audience is a mixed bunch of Computer Science graduate students, Sloan MBAs, and the odd LGO and Sloan Fellow. Some of the CS students already get this. To others with non-technical backgrounds, it's completely new. I fall somewhere in-between -- I can code HTML and am familiar with XML, but other Semantic Web technologies were unknown to me before I registered for the course.

An MBA asks: What happens when inconsistencies arise in linked data? For instance, what if Amy leaves her job, but only one of the reciprocal links above is adjusted to reflect that?

"This is the Web!" Berners-Lee declares. "It's not consistent!"

This leads to a discussion of the value of having links in both directions from RDF graphs talking about the same thing, and then his "five-star" system of rating sites (or organizations?) on their ability to post data openly on the Web, especially machine-readable data.

Trust and the Semantic Web

I want to ask a lot of questions, but I hesitate. My background is online media, and the creator of the Web is standing in front of the class. It's like being able to ask Gutenberg a question about his next generation of printing presses.

"Can you talk a little bit about trust?" I finally ask. I'm thinking about the reliability of the relationships identified in triples, and the potential for the linked data system to be abused, much as earlier Internet platforms such as email and the Web have been overrun by spam and malware.

Berners-Lee pauses, expressionless. A few people laugh. Have I really asked that stupid a question, or does everyone think I am talking about the broader concept of trust?

I make a clarification. "At last week's lab, we were shown the layers of the Semantic Web, and one of them was --"

He interrupts me, and gestures toward the blackboard. "I can talk about it, but I am afraid it would take hours," he says. The long and short of it: It's a complex area, and the subject of much of the current Semantic Web research. "There's a big social element," he concludes, and leaves the discussion at that.

1 comment:

  1. Once you grok what his URI implies, you can digest:

    1. http://esw.w3.org/WebID - WebID (synonym for Personal URI)
    2. http://www.w3.org/DesignIssues/CloudStorage.html -- Socially aware Cloud Storage
    3. http://www.w3.org/DesignIssues/ReadWriteLinkedData.html -- ReadWriteWeb of Linked Data.

    Kingsley

    ReplyDelete

All comments will be reviewed before being published. Spam, off-topic or hateful comments will be removed.