Developers always like to know what’s happening next with the World Wide Web. For a long time, it’s all been about front end presentation- standards like CSS gradients and HTML5 video. While these things are great, they are far from a step change. Then we have the broad classifications such as ‘Web 2.0′- loosely defined as being about collaboration, standards and the user experience. Blogs and Wikis are often associated with the Web 2.0, heralding the rise of user generated content.
So what is the Web 3.0? Known as the Semantic Web, it’s all about creating a web which can be understood by machines. Semantics is all about meaning. The current web is overflowing with information but that information is all targeted at humans. To a machine the web is a flat and boring place- reams of text, images and other media none of which share any kind of relation. Of course clever algorithms can dig into that content and make inferences based upon the language used- this is what search engines do- but there are big limitations. A question as simple as ‘give me the details about the person who is described on this page’ is complicated with the current web. A scraper has to go over the page, digging into the syntax put there so it would look nice trying to find words which imply a person is being described. The same for a question like ‘what is the price of the product on this web page?’. Because of this, websites are like silos- data goes in but it’s hard to get out and sharing data between sites is a pain.
The Semantic Web gives us the RDF standard- the Resource Description Framework. A standard for describing resources in a explicit, machine understandable way. Using RDF, we can provide a description of data on a website in a format which other websites can consume and process.
But before we talk details- it’s time to sell the vision. You have some ingredients and you’d like to search the web for recipes which use them but do not contain nuts. This is basically impossible to do today, it’s a very manual process at best. ‘Ah ha’, you say, I can put the ingredients into Google followed by ‘-nuts’. Uh huh, but Google doesn’t really know what you mean, it’s doing a textual search. You just filtered out every recipe page that says something like ‘this recipe contains no nuts’. And Google will have a hard time determining if a page is actually a recipe or just mentions a bunch of foodstuffs. And it’s really going to struggle when it tries to work out if an ingredient is in a particular recipe.
Enter the semantic web. You pose your query via a semantic r engine, it finds recipes from various sites all of which match what you want. ‘hmm’ you think, ‘I have this wine, I wonder which recipes would go best with it?’. So you ask and are provided with a list of recipes which go best with the wine you stated. And just like that the Semantic Web has saved the day.
So how does it work? Well the example above made the SW sound mighty clever. It isn’t. It is just a way to describe the data in an explicit, agreed way. That is, it is all about building up a machine readable web of data. Like the existing web, but just for machines. An extra, better defined dimension if you like. Applications can then do clever stuff with the data. Just as a database isn’t intelligent but a website which uses the database may do some very clever things with the data. Does the database understand the meaning of the data it holds? No. Does the website that uses it? No. Then who does? The person who made the website of course, they made it do clever things with the data even though, the website naturally has no idea what it is is doing. Why do I make this distinction? Because the idea of a ‘machine understandable web’ may make one think that machines will understand what it means for something to be, for example, a person. That’s highly advanced artificial intelligence. To the semantic web, it’s all just a bunch of text but luckily us clever developers can apply meaning to that and manipulate in such a way that cool things happen. Even cooler than PowerPoint slide transitions.
So how do we make this web of data work? If you consider a database, primary keys are used to identify records uniquely. Well, with the Semantic Web, the entire web is our database- so picking non-conflicting keys would be very hard. Enter URIs -Uniform Resource Identifiers. A resource on the Semantic Web is identified uniquely using a URI. A URI is just like a URL except it does not need to dereference to a web page. That is, a URI looks exactly like a URL except when you type it into your browser it may (probably) go nowhere. So I may have a RDF description of myself identified uniquely by the URI ‘http://ashleyfaulkner.co.uk/ashley’. Now whenever someone talks about me on the web, they can provide some RDF with that as the subject. In fact the semantic web is all about triples: subject, predicate and object. So I may say:
http://ashleyfaulkner.co.uk/ashley a Person http://ashleyfaulkner.co.uk/ashley name Ashley Faulkner
So here I’ve used the URI ‘http://ashleyfaulkner.co.uk/ashley’ as the key to myself. Yes, the actual me. This is why you would expect the URI to go nowhere when put into a browser. Browsers retrieve resources and in this case I am the resource. Short of teleportation being invented, I’m not going to show up when you type that in. If the URI actually went anywhere (didn’t 404) then the assumption is that the resource that turns up is what is being described. So if that URI went to a page about me, then I’d have just said that the page is a person and is called Ashley Faulkner.
But enough with going all Alice in Wonderland, where do I put that data and how? Well, I could embed it in a web page using RDFa. This standard allows RDF to be put into HTML pages. Or I could provide it as a file formatted in one of the many RDF serialisations. Yes, it gets a bit more complicated because RDF is an abstract language with no concrete syntax- there are various ways you may express RDF but one of the most popular is XML. So I could put the above into an RDF/XML file and have it accessible via a URL. Then an application can come along, read the file and say ‘aha! Ashley is a Person with name Ashley Faulkner’.
Which brings us to another problem. What is a person? In fact, what does ‘name’ mean anyway? If we are being completely explicit we need unique keys for these too. So, I use a URI for the predicate and object too:
http://ashleyfaulkner.co.uk/ashley http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://ashleyfaulkner.co.uk/Person http://ashleyfaulkner.co.uk/ashley http://ashleyfaulkner.co.uk#name Ashley Faulkner
Notice I use the URI for RDF type- as this is an URI with an agreed meaning, anyone anywhere knows that this means: ‘The subject is an instance of a class.’- the W3C said so. So now somebody wants to get some information from my site, how do they do so? Well they need to know what they are looking for. Specifically if they are looking for descriptions of people on my site, they need to know I arbitrarily decided that ‘http://ashleyfaulkner.co.uk/Person’ means a human being in the usual sense. How will they know that? They either manually looked into what URIs I use or they are stuck.
What is needed is an agreed URI for people and names so that when a machine wants to find out my name, it just asks for it using the standard URI and the application can then assume that it means the same as any other use of that URI on any other site. And this is the issue, people need to come together and agree on what URIs to use for particular resources. Well as it happens in this instance, the generally agreed URI for people comes courtesy of the Friend of a Friend project. So I can say I’m a person using the following triple:
http://ashleyfaulkner.co.uk/ashley http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://xmlns.com/foaf/0.1/Person
The issue is that agreeing upon common meanings and URIs for the vast array of resources people may want to describe is a big task. There has to be a critical mass- we need more people to start publishing data using semantic web standards. RDF, in Web terms is ‘old’ but it still hasn’t seen widespread adoption. The technologies and standards are there, but the use simply isn’t- not on any large scale. Thus it may still be a very long time before we see the Semantic web reach its potential- if it ever does.