The Semantic Web: semantic when?

W3C Semantic Web Logo

Developers always like to know what’s happening next with the World Wide Web. For a long time, it’s all been about front end presentation- standards like CSS gradients and HTML5 video. While these things are great, they are far from a step change. Then we have the broad classifications such as ‘Web 2.0′- loosely defined as being about collaboration, standards and the user experience. Blogs and Wikis are often associated with the Web 2.0, heralding the rise of user generated content.

So what really is next now? Well we have HTML5, but that’s all about improving the HTML standard, providing tags for content such as video along with some new APIs for JavaScript. This sort of incremental front-end standards change is what a lot of people consider the evolution of the web to be all about. What’s that? They added a header tag? Great! Now that’s one less div on the page. These changes are nice, but I find the term HTML5 often gets abused. ‘Why didn’t they just make a HTML5 app?’ ‘Wow this is a cool HTML5 site!’ ‘we don’t need Flash now there is HTML5!’. It’s just an evolution of then standard. Much of what can be done with HTML5 can be done in the previous version of the standard, HTML4. Okay so they added a video tag. But that’s all the standard did, defined the standard syntax for the tag (among others). It’s up to browsers to handle the video- which is why we saw the big disagreement on what video codec to use. Having said this, front end standards are of course, still important. But they don’t really advance the web as whole, the general concept remains the same.

The world of the front-end Web is getting better, JavaScript performance has sky-rocketed, standards support has improved dramatically and now web apps can challenge their desktop counterparts. But then there is the Web 3.0, a shift in a very different direction. The curious web developer may be very disappointed when they Google Web 3.0. It’s not shiny, it’s not even that well defined and for the most part it remains a pipe dream.

So what is the Web 3.0? Known as the Semantic Web, it’s all about creating a web which can be understood by machines. Semantics is all about meaning. The current web is overflowing with information but that information is all targeted at humans. To a machine the web is a flat and boring place- reams of text, images and other media none of which share any kind of relation. Of course clever algorithms can dig into that content and make inferences based upon the language used- this is what search engines do- but there are big limitations. A question as simple as ‘give me the details about the person who is described on this page’ is complicated with the current web. A scraper has to go over the page, digging into the syntax put there so it would look nice trying to find words which imply a person is being described. The same for a question like ‘what is the price of the product on this web page?’. Because of this, websites are like silos- data goes in but it’s hard to get out and sharing data between sites is a pain.

The Semantic Web gives us the RDF standard- the Resource Description Framework. A standard for describing resources in a explicit, machine understandable way. Using RDF, we can provide a description of data on a website in a format which other websites can consume and process.

But before we talk details- it’s time to sell the vision. You have some ingredients and you’d like to search the web for recipes which use them but do not contain nuts. This is basically impossible to do today, it’s a very manual process at best. ‘Ah ha’, you say, I can put the ingredients into Google followed by ‘-nuts’. Uh huh, but Google doesn’t really know what you mean, it’s doing a textual search. You just filtered out every recipe page that says something like ‘this recipe contains no nuts’. And Google will have a hard time determining if a page is actually a recipe or just mentions a bunch of foodstuffs. And it’s really going to struggle when it tries to work out if an ingredient is in a particular recipe.

Enter the semantic web. You pose your query via a semantic r engine, it finds recipes from various sites all of which match what you want. ‘hmm’ you think, ‘I have this wine, I wonder which recipes would go best with it?’. So you ask and are provided with a list of recipes which go best with the wine you stated. And just like that the Semantic Web has saved the day.

So how does it work? Well the example above made the SW sound mighty clever. It isn’t. It is just a way to describe the data in an explicit, agreed way. That is, it is all about building up a machine readable web of data. Like the existing web, but just for machines. An extra, better defined dimension if you like. Applications can then do clever stuff with the data. Just as a database isn’t intelligent but a website which uses the database may do some very clever things with the data. Does the database understand the meaning of the data it holds? No. Does the website that uses it? No. Then who does? The person who made the website of course, they made it do clever things with the data even though, the website naturally has no idea what it is is doing. Why do I make this distinction? Because the idea of a ‘machine understandable web’ may make one think that machines will understand what it means for something to be, for example, a person. That’s highly advanced artificial intelligence. To the semantic web, it’s all just a bunch of text but luckily us clever developers can apply meaning to that and manipulate in such a way that cool things happen. Even cooler than PowerPoint slide transitions.

So how do we make this web of data work? If you consider a database, primary keys are used to identify records uniquely. Well, with the Semantic Web, the entire web is our database- so picking non-conflicting keys would be very hard. Enter URIs -Uniform Resource Identifiers. A resource on the Semantic Web is identified uniquely using a URI. A URI is just like a URL except it does not need to dereference to a web page. That is, a URI looks exactly like a URL except when you type it into your browser it may (probably) go nowhere. So I may have a RDF description of myself identified uniquely by the URI ‘http://ashleyfaulkner.co.uk/ashley’. Now whenever someone talks about me on the web, they can provide some RDF with that as the subject. In fact the semantic web is all about triples: subject, predicate and object. So I may say:

http://ashleyfaulkner.co.uk/ashley   a   Person
http://ashleyfaulkner.co.uk/ashley   name   Ashley Faulkner

So here I’ve used the URI ‘http://ashleyfaulkner.co.uk/ashley’ as the key to myself. Yes, the actual me. This is why you would expect the URI to go nowhere when put into a browser. Browsers retrieve resources and in this case I am the resource. Short of teleportation being invented, I’m not going to show up when you type that in. If the URI actually went anywhere (didn’t 404) then the assumption is that the resource that turns up is what is being described. So if that URI went to a page about me, then I’d have just said that the page is a person and is called Ashley Faulkner.

But enough with going all Alice in Wonderland, where do I put that data and how? Well, I could embed it in a web page using RDFa. This standard allows RDF to be put into HTML pages. Or I could provide it as a file formatted in one of the many RDF serialisations. Yes, it gets a bit more complicated because RDF is an abstract language with no concrete syntax- there are various ways you may express RDF but one of the most popular is XML. So I could put the above into an RDF/XML file and have it accessible via a URL. Then an application can come along, read the file and say ‘aha! Ashley is a Person with name Ashley Faulkner’.

Which brings us to another problem. What is a person? In fact, what does ‘name’ mean anyway? If we are being completely explicit we need unique keys for these too. So, I use a URI for the predicate and object too:

http://ashleyfaulkner.co.uk/ashley   http://www.w3.org/1999/02/22-rdf-syntax-ns#type   http://ashleyfaulkner.co.uk/Person
http://ashleyfaulkner.co.uk/ashley   http://ashleyfaulkner.co.uk#name   Ashley Faulkner

Notice I use the URI for RDF type- as this is an URI with an agreed meaning, anyone anywhere knows that this means: ‘The subject is an instance of a class.’- the W3C said so. So now somebody wants to get some information from my site, how do they do so? Well they need to know what they are looking for. Specifically if they are looking for descriptions of people on my site, they need to know I arbitrarily decided that ‘http://ashleyfaulkner.co.uk/Person’ means a human being in the usual sense. How will they know that? They either manually looked into what URIs I use or they are stuck.

What is needed is an agreed URI for people and names so that when a machine wants to find out my name, it just asks for it using the standard URI and the application can then assume that it means the same as any other use of that URI on any other site. And this is the issue, people need to come together and agree on what URIs to use for particular resources. Well as it happens in this instance, the generally agreed URI for people comes courtesy of the Friend of a Friend project. So I can say I’m a person using the following triple:

http://ashleyfaulkner.co.uk/ashley   http://www.w3.org/1999/02/22-rdf-syntax-ns#type   http://xmlns.com/foaf/0.1/Person

The issue is that agreeing upon common meanings and URIs for the vast array of resources people may want to describe is a big task. There has to be a critical mass- we need more people to start publishing data using semantic web standards. RDF, in Web termsĀ  is ‘old’ but it still hasn’t seen widespread adoption. The technologies and standards areĀ  there, but the use simply isn’t- not on any large scale. Thus it may still be a very long time before we see the Semantic web reach its potential- if it ever does.

This entry was posted in Semantic Web and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>