Friday, April 25, 2014

Some thoughts after encountering Digital Humanists



As I have been thrashing around with RDF in the context of TDWG over the past couple years, I have wondered if there was anyone at Vanderbilt besides me who was working on anything remotely related to RDF, Linked Data, or the Semantic Web.  I never searched systematically, but when I brought the issue up with science and computer people, I usually encountered blank stares and the question "What?" 

Recently, I started following Clifford Anderson (@andersoncliffb) and David Michelson (@davidamichelson) on Twitter and got interested in the "Topics In Digital Humanities" course  they were teaching this semester.  I decided to get out of my biodiversity informatics silo and attend part of their final student presentations this past Monday.  I couldn't stay for the whole thing, but I was fascinated by the part I saw. 

The three talks I saw were related to digitizing metadata and images related to early Christian artifacts - particularly in the context of the syriaca.org project.  Although it seems like there would be little relationship between those projects and biodiversity informatics, one thing that really struck me as I watched the presentations was how similar the problems they faced were to those involved in digitizing natural history museum specimens and recording species occurrence metadata.  They struggled to find terms in controlled vocabularies to describe their artifacts.  They dealt with issues of demarking segments of an image that documented several features of interest.  They were working out how to work collaboratively on common data sources. 

At the same time, I was struck how the tools they used were different from those that are used or talked about in TDWG. 

First and foremost, all of their work involved using XML.  I've heard almost nothing positive about XML in the context of TDWG: it's too verbose and takes too long to transmit, it's confusing and not readable, etc.  So I was surprised to see that it was central to what they were doing.  There seems to be a simple reason for this: it enables them to mark up text using very simple tags (looked to me at a glance like XHTML) and then use existing technology (primarily XQUERY, I think) to search the marked up text.  In other words, they are immediately accomplishing useful things using off-the-shelf technology.  This is in marked contrast to the biodiversity informatics community where years have been spent arguing about whether GUIDs and RDF are going to solve our problems, or if they are a useless waste of time, and then having nothing functional to show for all of the arguing and effort. 

The second thing that struck me was how little emphasis there was on URIs or any sort of GUID, including DOIs.  I was a bit surprised by that.  I asked a question about URIs and it seemed to go right past the speaker.  I suppose this is a function of the fact that the documents on which they are working exist in a local database and there isn't a requirement at this point for them to link to records elsewhere.  But it seems that they will have to face that issue at some point.

The final thing that seemed really odd to me was the whole identification as "digital humanists".   I have to say that I don't exactly understand what that means, but after looking at things like https://www.hastac.org/ and https://my.vanderbilt.edu/digitalhumanities/ I'm getting a better idea.  I think that one reason why this puzzles me is that the Linked Data world (with which I'm more familiar) is fixated on connecting all information of all kinds and therefore Linked Data advocates in the biodiversity informatics community aren't interested in calling themselves "Digital Museum Curators", "Digital Scientists", or something like that because they consider their interests to include agents, literature references, and geography in addition to collections.  

I think that some of the differences I've seen here in approach are related to a difference in scale: biodiversity informatics involves assembling many small individual records that are scattered in many places vs. digital humanists marking up larger works that are localized in a few places.  In any case, I'm impressed with what the Digital Humanists at Vanderbilt have accomplished and I'm looking forward to learning more from them.

No comments:

Post a Comment