Crowdsourcing Cultural Heritage: The Objectives Are Upside Down

Still not the droid… By Stéfan: Our crowdsourcing conversation is upside down, much like how Calculon is holding these stormtroopers upside down.

Some fantastic work is going on in crowdsourcing the transcription of cultural heritage collections. After some recent thinking and conversation on these projects I want to more strongly and forcefully push a point about this work. This is the same line of thinking I started nearly a year ago in Meaningification and Crowdscafolding: Forget Badges. I’ve come to believe that conversations about the objective of this work, as broadly discussed, are exactly upside down. Transcripts and other data are great, but when done right, crowdsourcing projects are the best way of accomplishing the entire point of putting collections online. I think a lot of the people who work on these projects think this way but we are still in a situation where we need to justify this work by the product instead of justifying it by the process.

Getting transcriptions, or for that matter getting any kind of data or work is a by-product of something that is actually far more amazing than being able to better search through a collection.  The process of crowdsourcing projects fulfills the mission of digital collections better than the resulting searches. That is, when someone sits down to transcribe a document they are actually better fulfilling the mission of the cultural heritage organization than anyone who simply stops by to flip through the pages.

Why are we putting cultural heritage collections online again?

There are a range of reasons that we put digital collections online. With that said the single most important reason to do so is to make history accessible and invite students, researchers, teachers, and anyone in the public to explore and connect with our past. Historians, Librarians, Archivists, and Curators who share digital collections and exhibits can measure their success toward this goal in how people use, reuse, explore and understand these objects.

In general, crowdsourcing transcription is first and foremost described as a means by which we can get better data to help better enable the kinds of use and reuse that we want people to make of our collections. In this respect, the general idea of crowdsourcing is described as an instrument for getting data that we can use to make collections more accessible. Don’t get me wrong, crowdsourcing does this. With that said it does so much more than this. In the process of developing these crowdsourcing projects we have stumbled into something far more exciting than speeding up or lowering the costs of document transcription. Far better than being an instrument for generating data that we can use to get our collections more used it is actually the single greatest advancement in getting people using and interacting with our collections. A few examples will help illustrate this.

Increased Use, Deeper Use, Crowdsourcing Civil War Diaries

Last year, the University of Iowa libraries crowdsourced the transcription of a set of civil war diaries. I had the distinct privilege of interviewing Nicole Saylor, the head of Digital Library Services, about the project. From any perspective the project was very successful. They got great transcriptions and they ended up attracting more donors to support their work.

The project also succeeded in dramatically increasing site traffic. As Nicole explained, “On June 9, 2011, we went from about 1000 daily hits to our digital library on a really good day to more than 70,000.” As great as all this is, as far as I’m concerned, the most valuable thing that happened is that when people come to transcribe the diaries they engage with the objects more deeply than they would have if transcription was not an option. Consider this quote from Nicole explaining how one particular transcriptionist interacted with the collection. It is worth quoting her at length;

The transcriptionists actually follow the story told in these manuscripts and often become invested in the story or motivated by the thought of furthering research by making these written texts accessible. One of our most engaged transcribers, a man from the north of England, has written us to say that the people in the diaries have become almost an extended part of his family. He gets caught up in their lives, and even mourns their deaths. He has enlisted one of his friends, who has a PhD in military history, to look for errors in the transcriptions already submitted. “You can do it when you want as long as you want, and you are, literally, making history,” he once wrote us.  That kind of patron passion for a manuscript collection is a dream. Of the user feedback we’ve received, a few of my other favorites are: “This is one of the COOLEST and most historically interesting things I have seen since I first saw a dinosaur fossil and realized how big they actually were.” “I got hooked and did about 20. It’s getting easier the longer I transcribe for him because I’m understanding his handwriting and syntax better.” “Best thing ever. Will be my new guilty pleasure. That I don’t even need to feel that guilty about.

The transcriptions are great, they make the content more accessible, but as Nicole explains, “The connections we’ve made with users and their sustained interest in the collection is the most exciting and gratifying part.”  This is exactly as it should be! The invitation of crowdsourcing and the event of the project are the most valuable and precious user experiences that a cultural heritage institution can offer its users. It is essential that the project offer meaningful work. These projects invite the public to leave a mark and help enhance the collections. With that said, if the goal is to get people to engage with collections and engage deeply with the past then the transcripts are actually a fantastic byproduct that is created by offering meaningful activities for the public to engage in.

Rationing out Transcription

Part of what prompted this post is a story that Ben Brumfield gave on crowdsourcing transcription at the recent Institute for Museum and Library Services Web Wise conference. It was a great talk, and when they get around to posting it online you should all go watch it. There was one particular moment in the talk that I thought was essential for this discussion.

At one point in a transcription project he noticed that one of his most valuable power users was slowing down on their transcriptions. The user had started to cut back significantly in the time they spent transcribing this particular set of manuscripts. Ben reached out to the user and asked about it. Interestingly, the user responded to explain that they had noticed that there weren’t as many scanned documents showing up that required transcription. For this user, the 2-3 hours they spent each day working on transcriptions was such an important experience, such an important part of their day, that they had decided to cut back and deny themselves some of that experience. The user needed to ration out that experience. It was such an important part of their day that they needed to make sure that it lasted.

At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory.

Crowdsourcing is better at Digital Collections than Displaying Digital Collections

What crowdsourcing does, that most digital collection platforms fail to do, is offers an opportunity for someone to do something more than consume information. When done well, crowdsourcing offers us an opportunity to provide meaningful ways for individuals to engage with and contribute to public memory. Far from being an instrument which enables us to ultimately better deliver content to end users, crowdsourcing is the best way to actually engage our users in the fundamental reason that these digital collections exist in the first place.

Meaningful Activity is the Apex of User Experience for Cultural Heritage Collections

When we adopt this mindset, the money spent on crowdsourcing projects in terms of designing and building systems, in terms of staff time to manage, etc. is not something that can be compared to the costs of having someone transcribe documents on mechanical turk. Think about it this way, the transcription of those documents is actually a precious resource, a precious bit of activity that would mean the world to someone. It isn’t that any task or obstacle for users to take on will do, for example, if you asked users to transcribe documents that could easily be OCRed the whole thing loses its meaning and purpose. It isn’t about sisyphean tasks, it is about providing meaningful ways for the public to enhance collections while more deeply engaging and exploring them.

Just as Ben’s user rationed out the transcription of those documents we might actually think about crowdsourcing experiences as one of the most precious things we can offer our users. Instead of simply offering them the ability to browse or poke around in digital collections we can invite them to participate. We are in a position to let our users engage in a personal way that is only for them at that moment. Instead of browsing through a collection they literally become a part of our historical record.

The Important Difference between Exploitation-ware and Software for the Soul

Slide from Ruling the World

As a bit of a coda, what is tricky here is that there is (strangely) an important and  but somewhat subtle line between exploiting people and giving people the most valuable kinds of experience that we can offer for digital collections. The trick is that gamification is (for the most part) bullshit. You can trick people into doing things with gimmicks, but when you do so you frequently betray their trust and can ruin the innately enjoyable nature of being a part of something that matters to you, in our case, the way that  users could deeply interact with and explore the past via your online collections. What sucks about what has happened in the idea of gamification is that it is about the least interesting parts of games. It’s about leaderboards and badges. As Sebastian Deterding has explained, many times and many ways, the best part of games, the things that we should actually try to emulate in a gamification that attempts to be more than pointsification or exploitationware are the part of games that let us participate in something bigger. It is the part of games that invites us to playfully take on big challenges. Be wary of anyone who tries to suggest we should trick people or entice them into this work. We can offer users an opportunity to deeply explore, connect with and contribute to public memory and we can’t let anything get in the way of that.

Explore and Share Cultural Heritage Collections with Notes for WebWise Talk

This is just a quick post to share the slides and links from the talk I am giving at WebWise today.

The talk starts by explaining the idea behind the tool. Specifically, how making it easy to make interfaces to cultural heritage collections can help librarians, archivists, curators, and historians both better understand relationships between objects in a cultural heritage collection and how the tool can help them communicate those ideas to audiences. After explaining the kinds of interfaces you can make, I walk through a detailed example of what one of these views can do by looking at a prototype interface created by an Archivist at the National Gallery of Art to the Samuel H. Kress Collection History Database.

I wanted to make sure that everyone had links to all the views I mention. So here are all the links.

NDIIPP Partners Collections Interface:(On Viewshare) (Embeded on NDIIPP’s site): This is an interface to a collection of collections. It acts as a kind of directory for digital collections and it was created from a spreadsheet.

Fulton Street Trade Card View: (On Viewshare)
The Fulton Street Trade Card collection features 245 late 19th and early 20th century illustrated trade cards from merchant’s along the Fulton Street retail thoroughfare in Brooklyn, NY. Using a Viewshare pie chart view, the user is able to run queries and faceted search on the cards’ metadata in ways a simple catalog or scroll would not allow. Using the facets you can limit the chart to a certain element, such as business type, and then get numbers and percentages about the subjects, format, or other elements of the cards’ content.

History of Fairfax County in Postcards: (On Viewshare): A very simple view from a simple spreadsheet. If you like, you can find the spreadsheet this is based in the Viewshare documentation and work from it to get a sense of how the tool works.

Cason Monk-Metclaf Funeral Directors View: (On Viewshare): (My View on Viewshare): (Embeded on East Texas Digital Archives & Collections Site) This is one of the most interesting datasets uploaded to Viewshare. It is a set of data transcribed from historic funeral records.

Samuel H. Kress Collection History Database Prototype View: National Gallery of Art (On Viewshare) This view allows users to explore the relationships between purchase information for a work of art and other aspects of the object, including its current location. This data comes from the Samuel H. Kress Collection History and Conservation Database. The relational database documents the art collection’s acquisition, dispersal, and conservation over time and was created by the National Gallery of Art’s Gallery Archives with funding from the Samuel H. Kress Foundation. The data shared here is not complete. Viewshare data and views are intended only for preliminary demonstration of the data and should not be cited in research.