Trevor Owens

Linked Open Crowdsourced Description: A Sketch

Systems and tools for crowdsourcing transcription and description proliferate, and libraries and archives are getting increasingly serious about collectively figuring out how to let others describe and transcribe their stuff. At the same time, there continues to be a lot of interest in the potential for linked open data in libraries archives and museums. I thought I would take a few minutes to try and sketch out a way that I think these things could fit together a bit.

I’ve been increasingly thinking it would be really neat if we could come up with some lightweight conventions for anyone anywhere to describe an object that lives somewhere else. At this point, things like the Open Annotation Collaboration presumably provide a robust grammar to actually get into markup and whatnot if folks wanted to really blow it out, but I think there is likely some very basic things we could just do to try and kick off an ecosystem for letting anyone mint URLs that have descriptive metadata that describe objects that live at other URLs.

My hope in this, is that instead of everyone building or standing up their own systems, we could have a few different hubs and places across the web where people describe, transcribe and annotate that could then be woven back into the metadata records associated with digital objects at their home institutions. In some ways this is really the basic set of promises and aspirations that Linked Open Data is intended to help with. Here I am just intending to try an think through how this might fit together in a potential use case.

A Linked Open Crowdsourcing Description Thought Experiment

With a few tweaks, we are actually very close to having the ability to connect the dots between one situation in which people further describe archival materials (in this case to create bibliographies) that could provide enhanced metadata back to a repository. I’ll talk through how a connection might be forged between Zotero and one online collection, but I think the principles here are generic enough that if folks just agreed on some conventions we could do some really cool stuff.

The Clara Barton papers are digitized in full, but in keeping with archival practice, they are not described at the item level. In this case, the collection has folder level metadata. So since it’s items all the way down in a sense, the folders are the items.

As a result, you get things that look like this, Clara Barton Papers: Miscellany, 1856-1957; Barton (Clara) Memorial Association; Resolutions and statements, 1916, undated. This is great. I am always thrilled to see folks step back from feeling like they need item level description to make materials available on the web. Describe to whatever level you can and make it accessible.

With that said, I’m sure there are people who are willing to pitch in and make some item level metadata for the stuff in that folder. Beyond that, if a scholar is ever going to actually use something in that folder and cite it in a book or a paper they are going to have to create item level description. Wouldn’t it be great if there was a generic way for the item level description that happens as a matter of course to put a footnote in an article or a book could be leveraged and reused?

Scholars DIY Item Level Description in Zotero

Everyday, a bunch of scholars key in item level description for materials in reference managers like Zotero. To that end, I’ll briefly talk through what would happen if someone wants to capture and cite something from the Clara Barton Papers in Zotero. Because there is some basic embedded metadata in that page, if you click the little icon by the URL you get that initial data, which you can then edit. You can also then directly save the page images into your personal Zotero library.

So you can see what that would look like below. I started out by saving the metadata that was there, I logged the URL that the actual item starts at inside the folder, changed it from a web page to a document, keyed in the title and the author of the document. I also saved the 2 actual images that are associated with the two images from the 19 images that are actually part of the item I am working with as attachments to my Zotero item.

describe-in-zotero — Creating an item level record for materials in the Clara Barton papers folder in Zotero for the purpose of citing it.

So, now I can go ahead and drag and drop myself a citation. Here is what that looks like. This is what I could put in my paper or wherever.

Logan, Mrs. John. A. “Affidavit of Mrs. John A. Logan,” 1916. Miscellany, 1856-1957; Barton (Clara) Memorial Association; Resolutions and statements, 1916. Clara Barton Papers. http://www.loc.gov/resource/mss11973.116_0449_0467/#seq-3.

Now, wouldn’t it be great if there was a way for Zotero to ping, or do some kind of track back to the repository to notify folks that there is potentially a description of this resource that now exists in Zotero. That is, if I could ask Zotero’s API to see every public item they have that is associated with a loc.gov URL. In particular, every item that someone actually went through the trouble to tweak and revise as opposed to the things that are just the default information that came out to begin with.

Connecting Back from the Zotero instance of the Item

At this point, I added in descriptive information, and because I have the two actual image files, I also know that the information I have refers directly to mss/mss11973/116/0400/0451.jp2 and mss/mss11973/116/0400/0452.jp2. So, from this data we have enough information to actually create a sub-record for 2 of the 19 images in that folder.

Because I have a public Zotero library, anyone can actually go and see the Item level record I created for those 2 images from the Clara Barton Papers. You can find it here https://www.zotero.org/tjowens/items/itemKey/IHKBH5WQ/. In this case, the URL tells you a lot about what this is off the bat. It’s an item record from Zotero.org user tjowens and it has a persistent arbitrary item ID in tjowens’ library (IHKBH5WQ). Right that page could track back to the URL it is associated with, or even something simpler than that, just a token in the link that a repository owner could look for in their HTTP referrer logs as an indicator that there is some data out there at some URL that describes data at a URL that the repository has minted. So for instance, just stick ?=DescribesThis or something on the URL, like http://www.loc.gov/resource/mss11973.116_0449_0467/#seq-3?=DescribesThis . Then tell folks who run online collections to go and check out their referrer traffic for any incoming links that have ?DescribesThis in them. From there, it would be relatively trivial to review the incoming links from logs and decide if any of them were worth pulling over to add in as added value of descriptive metadata.

zotero-item-page — Here is an image of the Item page created for the record I made in Zotero

Aside from just having this nice looking page about my item, the Zotero API means that it’s trivial to get the data from this marked up in a number of different formats. For instance, you can find the JSON of this metadata at https://api.zotero.org/users/358/items/IHKBH5WQ?format=json

zoter-json — The JSON from the Zotero API for the item I created there. It’s easy enough to parse that you can pick out the added info I have in there, like the title and author.

So, if someone back at the repository liked what they saw here, they could just decide to save a copy of this record, and then ingest it or integrated it with the existing records in your index through an ETL process.

What I find particularly cool about this on a technical level, is that it becomes trivial to retain the provenance of the record. That is, an organization could say “description according to Zotero user tjowens” and link out to where it shows up in my Zotero library. This has the triple value of 1) giving credit where credit is do and 2) offering a statement of caveat emptor regarding the accuracy of the record (That is, it’s not minted in the authority of the institution but instead the description of a particular individual) and 3) providing a link out to someone’s Zotero library that likely could enable discovery or relate materials from other institutions.

Linked Open Crowdsourced Description

The point of that story isn’t so much about Zotero and the Clara Barton Papers, but more about how with a little bit of work, those two platforms could better link to each other in a way that the repository could potentially benefit from the description of it’s materials that happens elsewhere. If a repo could just get a sense of what people are describing of it’s materials, they could start playing around with ways to link to, harvest, and integrate that metadata. From there, organizations could likely move away from building their own platforms to enable users to describe or transcribe materials and instead start promoting a range of third party platforms that simply enable users to create and mint descriptions of materials.

Uncategorized

Published by

tjowens

Responses

Ed Summers

September 6, 2014 at 4:09 pm

Thanks for the interesting ideas Trevor. It does feel like there is a lot of untapped potential here, specifically in Zotero, and it’s super to see IMLS support Tiltfactor to explore what a consortium for crowdsourcing might look like.

I reckon we (the cultural heritage sector) often think of crowdsourcing as happening in platforms, and it’s easy to forget that the Web is itself a crowdsourcing platform that allows anyone to say what they want about anything, with a link. But like you point out, it’s often difficult to hear what people are saying about your stuff.

Web server logs often have hints in the form of referrals. So when I follow a link on a page at http://www.trevorowens.org to inkdroid.org I see something like this in my server log:

173.79.152.48 - - [06/Sep/2014:19:15:30 +0000] "GET http://inkdroid.org/ HTTP/1.1" 200 509 "http://www.trevorowens.org/2008/11/using-zotero-as-a-personal-library-catalog/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36

This same referral information is often tallied up in web metrics reports that help website owners determine who is driving traffic to them, and what content is being used the most. But we don’t often think about how we can pull that information back into our own website — to surface the conversations people are having elsewhere about the things we publish.

Part of the problem is that if you run a popular website there are a lot of people saying things about your stuff; and some of those people are just spammers looking for you to link back to them, so they can get ranked better in Google. This is the push model, where annotations are pushed to what is being annotated. As you know trackbacks are a form of this, and sites like arXiv have been using them, it would be interesting to hear how useful they have been.

I think your instinct about Zotero is a good one. As you say, there are a lot of people (in theory, I haven’t seen it myself) citing things on the Web in Zotero. Instead of requiring Zotero to push me a notification when someone cites my something on my website, and for me to listen for them, what if I could ask Zotero (api.zotero.org?) what public citations are pointing at my website — or any website? Perhaps it’s as simple as a search API call, or maybe it’s a dedicated call. This way I and other interested parties could choose to pull those results into my own systems, and choose to integrate it.

I’m thinking of something like what Pinterest do. I can ask Pinterest (once I’ve logged in) for all the pins that pulled content from loc.gov. Or similarly I can ask Wikipedia for all the articles that link to loc.gov. It’s not as pretty as the Pinterest page, but there’s a way to do it. What if Zotero offered a similar page to get people thinking?

I think it’s tempting to work on the technical solution first, and then expect people to start using it — but showing people how their stuff is being used in popular platforms like Zotero is an important first step to getting people to understand why it’s important to work on a technical solution. Hopefully this can be a big component to the Open Annotation work and the Crowdsourcing Consortium for Libraries and Archives.

Oh, and before I forget, on the technical side it might be worth tuning into the work that the indieweb community are doing around webmention and comment. This is the same community of people that worked on microformats, and are very grassroots and web centric (or is that de-centric) in their approach (which I like alot).

LikeLike

Reply
Lise Summers

September 8, 2014 at 5:17 am

A different Summers, and from a different perspective 🙂

This is a fascinating piece, not just because of the way that it explores potential ways of linking data, but also because it exposes some of the different ways of thinking about archives description which underscores the need for us to develop a common vocabulary long before we get to linking data. It also exposes the different ways in which we might, even with a common library like Zotero, describe things, which leads to a need for standardised and standards based descriptive practices.

You talk about the need, or, rather, the desire, for item level description. As an Australian, I go, “But we already provide that?”, because our item level is your folder level. We don’t do fonds (although that is implicit in the intellectual arrangements that we develop) and we concentrate on the way in which records are aggregated at series, and then item (file/folder/dossier) level. In my archives, if letters were kept as single entities, rather than placed on a file, then we might list them as items, much as we would list individual photographs in a box, but also a photo album as one item. I suspect that this is true for most of the larger institutions, whereas a page by page, letter by letter description may be more appropriate for manuscript collections and small private archives.

In my reference work, I save the item (file) id, title, creating agency and the archival location data (archive location, series and accession. If using archives internationally, then I would have the fonds details, too). If I save information about individual letters, I may create individual references or I might put the information in the notes field. My way of saving information is therefore slightly different to yours, and the metadata that is collected will be different too. Not only that, but I don’t record the details of every letter on a file – just the ones of interest to me.

So, the description of the contents of an item may be in several different styles, and is possibly incomplete, and in considering if we should choose to include this metadata in our listing, then the question must then be asked if half a loaf is indeed better than none. We also need to think about whether or not we have the resources to match the metadata created with the description standards used by the archives institution. Coming from a resources perspective, I like Ed’s idea of pulling descriptions rather than having a push notification.

However, you have already said that you digitised the letters. In that case, provided the item (file/folder/dossier) metadata is correct, could we link your images to our item description, and let individual researchers do what they want with the contents? (At this point, copyright and the international differences in use and licencing also rears its ugly head, so it’s definitely time to stop. And I haven’t even gone into whether or not this will privilege certain collections or information over others)

As I said at the beginning, this is a fascinating idea. I can see some advantages and a not inconsiderable bunch of issues in the model you propose, but this is true for any form of crowdsourced information. However, I think it important that we engage in these discussions, and I thank you for starting this conversation.

LikeLike

Reply
Rebe Taylor

September 9, 2014 at 2:55 am

Thanks Trevor for such an interesting reflection, and thanks too to the two Summers for their excellent comments.

I thought it might be helpful to add how we dealt with question of citing records in the web resource Stories in Stone: an annotated history and guide to the collections and papers of Ernest Westlake (1855-1922) which was created by me with Michael Jones and Gavan McCarthy and published by the eScholarship Research Centre (ESRC), University of Melbourne, with the Pitt Rivers Museum, Oxford (PRM) and the Oxford University Museum of Natural History (OUMNH).

Like the Clara Barton Papers, Stories in Stone includes the entire papers of English amateur scientist Ernest Westlake held in the PRM and the OUMNH.

Also like the Barton papers, the archive is described down to ‘item’ level, which may mean an individual photo, or it might mean a folder of letters that comprises of many images.

While the metadata for many of the items in Stories in Stone is pretty detailed, the image viewer in its current form, does not transcribe each page within the folders. The image viewer does, however, include a ‘cite this button’, for each image. A page of a letter might provide a citation such as this one:

Image 42, WEST00017, Correspondence to Edward Burnett Tylor: August 1893 – October 1898, Series 2, Pitt Rivers Museum: Manuscript Collections, Westlake Papers, Folder 1, Folios 1-30. Accessed online via: Stories in Stone: an annotated history and guide to the collections and papers of Ernest Westlake (1855-1922).
http://www.westlakehistory.info/viewer/WEST/item/WEST00017/42

The ‘cite this image’ button was designed by the ESRC at the request of the Pitt Rivers Museum in order that scholars would cite their records ‘properly’ (and not as Trevor puts it ‘have to create’ them). But the ESRC also ensured that the citation included the name of the web resource and our own control codes and URL, and not only the folder and folio numbers and name of the collection and repository. This was not only to include reference to our scholarly publication, but the wider protocol that we hold to be so important, ‘cite what you see’: if you are not sitting at the table in the archive, then don’t cite the digitised record if you are, but cite where you found it online, and when.

Of course, the ‘cite this image button’ helps to ensure that people cite things ‘properly’ in accordance with the creators of the web resource, and with the keepers of the records, but it does not do what I agree with Trevor would be very interesting: give the repository ‘the sense of what people are describing of its materials’. Such an ability would not only allow for more linked data, it would give repositories and creators of archive guides more understanding of the multiplicity of ways, and even reasons, that records are accessed and referenced. Such information might help strengthen proposals for further digitisation projects, and influence how they are designed.

LikeLike

Reply
Antoine Isaac

September 9, 2014 at 4:21 am

Great post.
It’s important indeed to get the basics right: give proper web identification to objects, and make sure the description can be accessed.
Otherwise this seems to go in the direction taken by Pundit and the DM2E project:
https://thepund.it/
http://dm2e.eu/
especially on the technical side. Of course the ‘business’ side (motivations for transcription, scholarly annotation, etc) could differ.

One important point is the linking to relevant authorities. For example Annotorious has been recently updated with a plug-in that allows to pick concepts from SKOS vocabularies:
https://github.com/ait-ngcms/annotorious-openskos-demo/
The recent British Musum / UCL crowdsouring experiment also re-uses the Pleiades Vocabulary for ancient places:
http://crowdsourced.micropasts.org/app/phototaggingHorsfield/

LikeLike

Reply
…Three’s a Crowdsource | Librarian Squared

September 10, 2014 at 7:08 pm

[…] Owens, Trevor. “Linked Open Crowdsourced Description: A Sketch.” Trevor Owens. Accessed September 10, 2014. http://www.trevorowens.org/2014/09/linked-open-crowdsourced-description-a-sketch/ […]

LikeLike

Reply