Monday, February 1, 2016

RDF for talking about people

History

RDF has now been around for over 15 years and over that time period there have been many efforts to use RDF to describe many kinds of things (or rather, should I say rdf:Resource's or owl:Thing's?). One of the most fundamental types of thing that one might want to describe is a person.  One of the early efforts to describe people was the Friend-of-a-Friend (FOAF) project.  If you want some historical perspective, you can read the original project description from 2000 here. To make it easy to describe a person, sometime before 2003 Leigh Dodds created FOAF-a-Matic, a little Javascript-based web page that generated an XML serialized RDF description of a person using FOAF properties.  But what good is a description of a person that is denoted by a blank node?

In 2006, Tim Berners-Lee, the creator of the web and promoter of RDF, suggested that everyone "Give yourself a URI".  His suggested method: create a FOAF page, add a hash fragment identifier to the page URL so that the URI for you will be different from the URI for the FOAF document, and you're done!  You can see his FOAF document at http://www.w3.org/People/Berners-Lee/card.rdf, which you should get when you dereference his URI: http://www.w3.org/People/Berners-Lee/card#i, but currently don't because the content negotiation seems to be broken.

This strategy is pretty good if you are TimBL and are the dictator for life of the W3C, which would pretty much allow you to guarantee that your URI in the http://www.w3.org/ domain would be stable.  For the rest of us mere mortals, it is a bit harder.  When I made my first FOAF profile, I used http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me as my URI.  This was not a very good URI for several reasons:
1. Including the .rdf was bad form for a cool URI.  I was stuck with it because I didn't have any control over content negotiation on the server.
2. Who put the stupid tilde in there???
3. One day Vanderbilt suddenly decided that they didn't like the people.vanderbilt.edu subdomain, but rather liked the my.vanderbilt.edu subdomain better.  So they shut down people.vanderbilt.edu.  ... URI permanently broken.

After that, I used (and still use) http://bioimages.vanderbilt.edu/contact/baskauf as a URI for myself, which is probably more stable since I control the bioimages.vanderbilt.edu subdomain (at least until Vanderbilt decides that I don't control it any more).

VIAF and ORCID

To my surprise, I one day discovered that I had a VIAF URI (http://viaf.org/viaf/63557389) !  I'm not sure exactly how that happened, but somebody at OCLC decided that I should have one.  This was really great, because VIAF URIs are well-known and probably pretty persistent.  The downside was that I had no control over what RDF was delivered when a client dereferenced that URI.  A couple years later, I learned about ORCID and was able to set up an ORCID ID for myself (http://orcid.org/0000-0003-4365-3135).  To some extent, one can control the RDF content that is delivered from ORCID by adding to your profile things like links to a website, and publications.  However, I can't indicate everything I want, including important things like the fact that there are two other URIs that refer to me!  I've put that in the text of my ORCID biography, but that won't generate a machine readable link.  In the RDF that I provide when http://bioimages.vanderbilt.edu/contact/baskauf is dereferenced, I assert an owl:sameAs relationships to the other two URIs.  But anyone dereferencing either the VIAF or ORCID URIs won't discover this relationship.

What RDF do you get from ORCID and VIAF?

I was curious to know what kind of properties VIAF and ORCID used in the RDF they provide and to what extent you get the same information from both places.  Our Semantic Web Working Group has been working through the book Semantic Web for the Working Ontologist, and as an exercise for chapter 3, we were looking at a prototype RDF description of our colleague Cliff Anderson.  Since Cliff was agreeable to being our guinea pig, I decided to dereference his VIAF and ORCID URIs to see what kind of RDF I got.  I won't go into the details of how I did it, since the methods and the actual RDF are already posted in the SWWO Chapter 3 notes.

ORCID

Here are a few things that I noticed about the record that was delivered when Cliff's ORCID was dereferenced with a request for RDF:

  1. The actual ORCID ID http://orcid.org/0000-0003-0328-0792 definitely refers to Cliff, a foaf:Person.  Other things related to Cliff are distinguished from Cliff by adding hash fragment identifiers to Cliff's URI in order to make their URIs different.  This is the opposite approach to what TimBL suggested, but is now perfectly kosher under the resolution to the HTTP Range 14 issue. (see this email for the original description of the issue). ORCID differentiates between the ORCID ID for Cliff (http://orcid.org/0000-0003-0328-0792#orcid-id), a document about Cliff's publications (http://orcid.org/0000-0003-0328-0792#workspace-works), Cliff's personal profile document (http://pub.orcid.org/orcid-pub-web/experimental_rdf_v1/0000-0003-0328-0792), and Cliff himself.  
  2. ORCID primarily uses FOAF vocabulary terms to describe Cliff.  This has been standard practice for many years, despite the fact that FOAF isn't any kind of standard.  ORCID also uses rdfs:label to provide Cliff's name, which is nice since this is a fairly universal way to label resources.  
  3. ORCID uses PROV and PAV ontology terms to describe the provenance information about the record.  The personal profile document is the subject of the provenance triples.  PROV is a W3C Recommendation.  PAV is not, see this article for its rationale.  
  4. The foaf:primaryTopic property is used to link the personal profile document (and hence provenance information) to Cliff.
  5. A GeoNames URI is used to refer to the United States.

VIAF

Here are things I noticed about the record from VIAF:

  1. The actual VIAF ID http://viaf.org/viaf/168432349 definitely refers to Cliff, a schema:Person.  VIAF differentiates between Cliff and the document about Cliff by including a trailing slash after the VIAF ID to generate a URI for the document: http://viaf.org/viaf/168432349/ .  This is a clever trick to cause redirection to the document when Cliff's URI is dereferenced.  However, it is probably less easier for humans to distinguish than the hash URI trick used by ORCID.  
  2. VIAF primarily uses Schema.org terms to describe Cliff, although it still uses FOAF vocabulary terms to describe other relationships and classes.  It uses skos:prefLabel rather than rdfs:label to indicate the preferred label for Cliff in two languages.  However, since skos:prefLabel is a subproperty of rdfs:label, the more well-known relationship would be entailed if a client performed reasoning.  
  3. VIAF could provide provenance information about the record as properties of the document.  But it doesn't.
  4. The foaf:primaryTopic property is used to link the document to Cliff.
  5. The record includes definitions of skos:Concept's that have Cliff as their foaf:focus.  The implications of these semantics are not clear to me.

FOAF vs. Schema.org

The current version of FOAF (Paddington Edition - if you read this far you now know why I put his picture at the top of the post!) is 0.99 and was issued in January 2014.  There must be something psychological about working on a vocabulary for 14 years and still holding off on calling it version 1.0!  Will the next version be 1.0, or will there be no more versions after this one?  The change log of version 0.99 notes that it declares equivalence between foaf:Person and schema:Person, between foaf:Image and schema:ImageObject, and between foaf:Document and schema:CreativeWork (click here to see the FOAF RDF).  The declaration seems to be reciprocal for schema:Person in the Schema.org RDF, but not for schema:ImageObject and schema:CreativeWork. However, schema:ImageObject is an equivalent class to dcmitype:Image.  I'm a bit uncertain about all this because I'm not 100% where authoritative RDF for Schema.org resides (see this for some Turtle RDF of Schema.org term definitions).  

The question that I'm wondering about here is whether the Schema.org terms are destined to replace the FOAF terms.  Dan Brickley, one of the FOAF authors, now runs the daily operations of Schema.org and chairs the Schema.org Community Group.  So his efforts now are clearly focused on Schema.org.  Schema.org also has buy-in from Google, Microsoft, and Yahoo, vs. FOAF, which has no particular organizational support.  Schema.org terms can be used not only with RDF, but with various other Linked Data technologies including Microdata and JSON-LD - while FOAF is only for RDF. So it looks like Schema.org may rule the future.  But FOAF has been so widely used for so long that it probably isn't going away soon.  We seem doomed to having two competing RDF vocabularies to describe people for a long time to come.  

VIAF vs. ORCID

Are we also doomed to having two competing systems of URI identifiers for people?  In 2013, ORCID and ISNI (which has a core relationship with the VIAF database) issued a Joint Statement on Interoperation and committed to investigate the feasibility of a shared identifier scheme for a single number to represent an individual in both databases.  The ORCID Registry assigns IDs from a block of numbers that ISNI has set aside to avoid having the same number assigned to different people in the two systems.  However, as far as I can tell, there has been no progress since 2013 in dealing with the opposite problem: avoiding having the same person being assigned different numbers in the two systems.  As with the two vocabulary schemes, we seem to be doomed to having two competing systems for assigning identifiers to people. 

owl:sameAs or SPARQL as a solution to dealing with duplicate infromation?

There doesn't seem to be any way to force either of the ID systems (VIAF or ORCID) to link to the other's IDs.  However, anyone can declare two resources to be equivalent by linking the two URIs of the resources by owl:sameAs in triples that they assert.  I hope that our Semantic Web Group can play around with merging information from VIAF and ORCID using the StarDog reasoner to materialize triples entailed by the use of owl:sameAs and other terms of equivalence such as owl:equivalentProperty.  Alternatively, we could work out some SPARQL queries that would merge information from both types of records and make it possible to assert properties from both sources using either FOAF or Schema.org properties.  I will plan to report in a future blog if we come up with anything interesting.


1 comment: