xmlhack: Practical RDF Town Hall

Around 30 people attended the Practical RDF Town Hall at the XML 2003 conference this week, watching demos and asking questions.

Norm Walsh inaugurated the festivities, noting that "I initially ignored RDF for a number of years, and eventually Dan Connolly showed me a few clever things you could do with it, and then I realized I could come up with a few clever things as well without waiting for the Semantic Web to emerge in the fullness of time."

Dan Connolly presented on "The Semantic Web and its applications at W3C." After noting the advantages of RDF being XML, he explored RDF's merge-ability, including namespaces, RDF's Web-friendliness, consistent stack of tech, and the advantage of riding the URI network effects. Connolly noted that "The interesting thing about this layer cake is that we're really using this technology." The Technical Reports page used to be managed by hand, despite holding over 400 entries. The W3C started to automate this with RDF, and combined it with technologies like XSLT to test conformance to publication rules.

Eventually, the process shifted to a completely automatic one, now in production. Beyond XSLT's ability to publish the actual documents however, the W3C also uses XSLT to generate metadata about the documents (based on style information) which can be used for reports, as well as automating the bibliography creation process. It also creates an overview of W3C groups and their dependencies using SVG. "The data is never in one place - it flies all over the Web."

Dan Brickley then explored RDF hyperlinking using rdfs:seeAlso , a part of the RDF Schema work that has proven useful in other fields. It provides hyperlinks between RDF files and permitting the exploration of related descriptions. By adding the extra reference - something not strictly necessary to using RDF - it becomes possible to move from one set of graphs to another, so "seeAlso gets us from semantics to the the Semantic Web."

Given these pieces, you can combine and compile data into ever-larger webs of related materials. Dan presented HTML and SVG views of the same information sets, some individually and some in enormous (for a slide) aggregate. The seeAlso approach lets the software deal with complex issues like provenance and the scattered nature of data in an open Web, as well as the sometimes difficult question of figuring out when two things talk about the same thing. Crawlers can use the information to build much richer and more coherent collections of information, in whatever domain that information happens to describe.

Kal Ahmed asked some tough questions about how to scale this to the kinds of levels that Google reaches to crawl the entire Web. Brickley noted that this has the benefit of dealing with much smaller and more structured data, but acknowledged that this isn't a magic way to reach Google's capabilities without the same level of build-out.

Next, xmlhack editor Edd Dumbill explored how he applies RDF to his personal data integration problems, running personal information through the Friend-of-a-Friend (FOAF) RDF vocabulary, using the Redland framework as a foundation for processing.

His first use case was IRC messaging, building an IRC bot to "help out, lubricate these social situations where you don't know people." His foafbot scoops up RDF information from around the Web. People can described themselves as well as other people, so there's lots of information available. Provenance is important, of course, and foafbot keeps track of whose files made which assertions. It's not trivial to associate information with a particular person (or their seriousness in saying it), however, though he uses digital signatures as an identifier that someone means what they wrote. Escaping the problem of lies is very difficult, though - if someone wants to make false assertions and sign them, they can.

Also, the co-depiction experiment lets you use this same information to identify which people go with what pictures. Asking the foafbot for pictures of someone returns a list of URIs to pictures, built from Dublin Core information in the FOAF files. The bot can process the assertions to identify multiple people in the same figure, hence co-depiction.

Dumbill also emphasized the usefulness of the C-based Redland framework, which has sprouted context features and Python bindings to support this work. (Other bindings are available for C, Java, Perl, Ruby, and C#.) Like the C-based expat parser, it's easy to compile for particular applications, and it also relies on the Berkeley DB XML database from Sleepycat.

Next Dumbill showed off Dashboard, Ximian's desktop search tool for personal information, which links into various apps to collect context. While Dashboard's internals aren't RDF, "the mentality similarly involves joining things together." He demonstrated how Dashboard could respond to activity in a Web browser, using a meta element in HTML for hints. Dumbill is also working on cellphone and Bluetooth integration for this kind of interapplication communication, as well as building stronger links with IM, email, bookmarks, and more. Bluetooth between cell phones seems to be helping people meet in bars already.

In the last presentation, Norm Walsh explained how he was using RDF to make better use of information he already had. Walsh explained that he had lots of data in various devices about a lot of people and projects, but no means of integrating it. Thanks to various RDF toolkits - "just by dumping it into RDF, it just kind of happens for free." Aggregation and inference are easy - and Walsh can get convenient notifications of people's birthdays without duplicating information between a file on a person and a calendar entry noting that.

The cwm ("closed world machine") tool at the heart of Walsh's site combines all kinds of information and integrates it with the essays he writes for his weblog. He showed a photograph taken at the very beginning of the Town Hall, its technical metadata, and then information linking that photo to its taker and details about who is in the photo, making co-depiction queries possible.

In the question period, James Clark confessed that "I'm not a complete RDF convert yet," and asked the generation of RDF using XSLT: "Is there some way to point to an XML and XSLT file rather than an RDF file." The answers were sort of yes and sort of no. rdfs:seeAlso was one possible route, though it had only one URI for the data source and none for transformation. Dan Connolly also pointed out that running an XSLT program is easier to support as a service, "safer by default", than most other programming evironments.

There were also a few questions about merging, integration, and the rules for combining rules, as well as questions about the process of creating RDF/XML and the XSLT to do so.