Some fantastic work is going on in crowdsourcing the transcription of cultural heritage collections. After some recent thinking and conversation on these projects I want to more strongly and forcefully push a point about this work. This is the same line of thinking I started nearly a year ago in Meaningification and Crowdscafolding: Forget Badges. I’ve come to believe that conversations about the objective of this work, as broadly discussed, are exactly upside down. Transcripts and other data are great, but when done right, crowdsourcing projects are the best way of accomplishing the entire point of putting collections online. I think a lot of the people who work on these projects think this way but we are still in a situation where we need to justify this work by the product instead of justifying it by the process.
Getting transcriptions, or for that matter getting any kind of data or work is a by-product of something that is actually far more amazing than being able to better search through a collection. The process of crowdsourcing projects fulfills the mission of digital collections better than the resulting searches. That is, when someone sits down to transcribe a document they are actually better fulfilling the mission of the cultural heritage organization than anyone who simply stops by to flip through the pages.
Why are we putting cultural heritage collections online again?
There are a range of reasons that we put digital collections online. With that said the single most important reason to do so is to make history accessible and invite students, researchers, teachers, and anyone in the public to explore and connect with our past. Historians, Librarians, Archivists, and Curators who share digital collections and exhibits can measure their success toward this goal in how people use, reuse, explore and understand these objects.
In general, crowdsourcing transcription is first and foremost described as a means by which we can get better data to help better enable the kinds of use and reuse that we want people to make of our collections. In this respect, the general idea of crowdsourcing is described as an instrument for getting data that we can use to make collections more accessible. Don’t get me wrong, crowdsourcing does this. With that said it does so much more than this. In the process of developing these crowdsourcing projects we have stumbled into something far more exciting than speeding up or lowering the costs of document transcription. Far better than being an instrument for generating data that we can use to get our collections more used it is actually the single greatest advancement in getting people using and interacting with our collections. A few examples will help illustrate this.
Increased Use, Deeper Use, Crowdsourcing Civil War Diaries
Last year, the University of Iowa libraries crowdsourced the transcription of a set of civil war diaries. I had the distinct privilege of interviewing Nicole Saylor, the head of Digital Library Services, about the project. From any perspective the project was very successful. They got great transcriptions and they ended up attracting more donors to support their work.
The project also succeeded in dramatically increasing site traffic. As Nicole explained, “On June 9, 2011, we went from about 1000 daily hits to our digital library on a really good day to more than 70,000.” As great as all this is, as far as I’m concerned, the most valuable thing that happened is that when people come to transcribe the diaries they engage with the objects more deeply than they would have if transcription was not an option. Consider this quote from Nicole explaining how one particular transcriptionist interacted with the collection. It is worth quoting her at length;
The transcriptionists actually follow the story told in these manuscripts and often become invested in the story or motivated by the thought of furthering research by making these written texts accessible. One of our most engaged transcribers, a man from the north of England, has written us to say that the people in the diaries have become almost an extended part of his family. He gets caught up in their lives, and even mourns their deaths. He has enlisted one of his friends, who has a PhD in military history, to look for errors in the transcriptions already submitted. “You can do it when you want as long as you want, and you are, literally, making history,” he once wrote us. That kind of patron passion for a manuscript collection is a dream. Of the user feedback we’ve received, a few of my other favorites are: “This is one of the COOLEST and most historically interesting things I have seen since I first saw a dinosaur fossil and realized how big they actually were.” “I got hooked and did about 20. It’s getting easier the longer I transcribe for him because I’m understanding his handwriting and syntax better.” “Best thing ever. Will be my new guilty pleasure. That I don’t even need to feel that guilty about.
The transcriptions are great, they make the content more accessible, but as Nicole explains, “The connections we’ve made with users and their sustained interest in the collection is the most exciting and gratifying part.” This is exactly as it should be! The invitation of crowdsourcing and the event of the project are the most valuable and precious user experiences that a cultural heritage institution can offer its users. It is essential that the project offer meaningful work. These projects invite the public to leave a mark and help enhance the collections. With that said, if the goal is to get people to engage with collections and engage deeply with the past then the transcripts are actually a fantastic byproduct that is created by offering meaningful activities for the public to engage in.
Rationing out Transcription
Part of what prompted this post is a story that Ben Brumfield gave on crowdsourcing transcription at the recent Institute for Museum and Library Services Web Wise conference. It was a great talk, and when they get around to posting it online you should all go watch it. There was one particular moment in the talk that I thought was essential for this discussion.
At one point in a transcription project he noticed that one of his most valuable power users was slowing down on their transcriptions. The user had started to cut back significantly in the time they spent transcribing this particular set of manuscripts. Ben reached out to the user and asked about it. Interestingly, the user responded to explain that they had noticed that there weren’t as many scanned documents showing up that required transcription. For this user, the 2-3 hours they spent each day working on transcriptions was such an important experience, such an important part of their day, that they had decided to cut back and deny themselves some of that experience. The user needed to ration out that experience. It was such an important part of their day that they needed to make sure that it lasted.
At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory.
Crowdsourcing is better at Digital Collections than Displaying Digital Collections
What crowdsourcing does, that most digital collection platforms fail to do, is offers an opportunity for someone to do something more than consume information. When done well, crowdsourcing offers us an opportunity to provide meaningful ways for individuals to engage with and contribute to public memory. Far from being an instrument which enables us to ultimately better deliver content to end users, crowdsourcing is the best way to actually engage our users in the fundamental reason that these digital collections exist in the first place.
Meaningful Activity is the Apex of User Experience for Cultural Heritage Collections
When we adopt this mindset, the money spent on crowdsourcing projects in terms of designing and building systems, in terms of staff time to manage, etc. is not something that can be compared to the costs of having someone transcribe documents on mechanical turk. Think about it this way, the transcription of those documents is actually a precious resource, a precious bit of activity that would mean the world to someone. It isn’t that any task or obstacle for users to take on will do, for example, if you asked users to transcribe documents that could easily be OCRed the whole thing loses its meaning and purpose. It isn’t about sisyphean tasks, it is about providing meaningful ways for the public to enhance collections while more deeply engaging and exploring them.
Just as Ben’s user rationed out the transcription of those documents we might actually think about crowdsourcing experiences as one of the most precious things we can offer our users. Instead of simply offering them the ability to browse or poke around in digital collections we can invite them to participate. We are in a position to let our users engage in a personal way that is only for them at that moment. Instead of browsing through a collection they literally become a part of our historical record.
The Important Difference between Exploitation-ware and Software for the Soul
As a bit of a coda, what is tricky here is that there is (strangely) an important and but somewhat subtle line between exploiting people and giving people the most valuable kinds of experience that we can offer for digital collections. The trick is that gamification is (for the most part) bullshit. You can trick people into doing things with gimmicks, but when you do so you frequently betray their trust and can ruin the innately enjoyable nature of being a part of something that matters to you, in our case, the way that users could deeply interact with and explore the past via your online collections. What sucks about what has happened in the idea of gamification is that it is about the least interesting parts of games. It’s about leaderboards and badges. As Sebastian Deterding has explained, many times and many ways, the best part of games, the things that we should actually try to emulate in a gamification that attempts to be more than pointsification or exploitationware are the part of games that let us participate in something bigger. It is the part of games that invites us to playfully take on big challenges. Be wary of anyone who tries to suggest we should trick people or entice them into this work. We can offer users an opportunity to deeply explore, connect with and contribute to public memory and we can’t let anything get in the way of that.
40 Replies to “Crowdsourcing Cultural Heritage: The Objectives Are Upside Down”
This is a terrific post, Trevor. You’ll want to keep an eye out for a very smart paper on “Crowd-sourcing ‘true meaning'” by Jan-Christoph Meister, coming out soon in a festschrift for Harold Short, to be published by Ashgate. And the “Prism” project being taken on by our Praxis Program grad students in the Scholars’ Lab is a direct response to the kinds of issues you highlight here. We see the project as a gesture toward “crowd-sourcing interpretation.”
Love the notes in there about “interpretive energy” and I will be excited to read “Crowd-sourcing ‘true meaning’”. I keep feeling like there are some major implications in things like participatory culture that we can get in the heart of our designs.
This is very eloquently put and a great direction to pursue. For me, the basic question is how to engage people. How can crowdsourcing projects tap this kind of intrinsic motivation when people aren’t already bored? At THATCamp AHA’s Crowdsourcing session, Chris Linnot of GalaxyZoo called this “finding the bacon.” I hate this phrase but continue to use b/c it’s just too good of a way to describe it–bacon being fatty and salty and triggering human nutritional instincts, and the popularity of bacon being fad-like but reaching meme status to make it recognizable as a cultural good, except to those (like me) who don’t eat it. I don’t play many video games, either, but I understand an engaging story is also a part of gamification, and I won’t discount it completely as a useful and not-evil crowdsourcing method.
I really enjoyed reading your post, especially your points about inviting individuals to participate in public memory, but my pushback is that finding how to make that happen on a large scale is not just a design issue. Where would the Civil War Diaries Transcription project be without Reddit? Reddit knows bacon better than anyone. If a cultural heritage project relies solely on the intrinsic motivation of citizen scholars, it is going to have a hard time achieving its goals. And worse, the projects that can achieve scale-up are going to be the projects that are popular, while other historically important collections won’t ever find or be able to mobilize their fan base.
Thanks for the comment! One of the points that Ben made at webwise is that a lot of the crowdsoucing transcription projects are much more about a small number of individuals doing a lot of work over a period of time. So, part of the answer here is that at the end of the long tail of potential crowdsourcing projects there are still people that are potentially interested in any number of topics.
With all this said, I wholeheartedly agree that projects can’t succeed solely on the intrinsic motivation of citizen scholars. In my mind, the future of this involves great teams of librarians, archivists, curators, scholars, programmers, UIUX folks and system administrators working together to create meaningful experiences and invite participation with historical materials on the web. In any given case, and for any given project those experiences should take on different characteristics and will likely connect with distinct but overlapping audiences.
Thanks for writing this Trevor. The way crowdsourcing and gamification usually gets presented often makes me feel like “users” are being tricked into giving something (of questionable value) for free. It feels kinda sleazy and cheap. Thinking of it as a strategy for increasing meaningful engagement with content is much more productive.
Great post. I couldn’t agree more with the argument that these projects engender a type of engagement that is far more meaningful that whatever discovery-enhancement the transcription data itself produces. That said, I am going to voice some concerns, partially just to stir up debate. This will probably come across as painfully relativist, but I’m wary of any overt assignations or definitions of meaning when it comes to the use of collections, if only because it presumes we can identify (which usually means track) such meanings. I don’t wonder if these usage examples don’t seem more meaningful only because they are more quantifiable. There seems an inherent danger to quantifying engagement as “number of letters transcribed” or “time spent on site” or any metrically-derived demonstrable. This is by no means an argument for some head-in-the-sand, po-mo, namby-pamby view of “who knows what people are doing with our collections so why bother to find out.” But I’m not so sure that someone transcribing a document is “better fulfilling the mission of the cultural heritage organization” than someone who reads it without transcribing it or someone who makes use of it in a way that collection managers cannot gauge and monitor. Perhaps one of those lowly flipper-throughers goes on to write a meaningful blog post about the letters, or tweets the collection link, or uses some old-timey menus on their graphic design tumblr or pinboard or whatever; and say these people repeatedly revisit the collection for continued image rips, readings/inspiration, or other not-directly-contributory activities. Distributed participation, as it were. Are those lesser uses? Would we say these users are leaving the institutional mission unfulfilled? I wonder (again, for argument’s sake, b/c I know you’re going to love this track) if we overvalue this form of participation because it happens in our interface, for our database, within… wait for it… our control (and measurements). Also, on the flip side, it would be interesting to do some UIUX study to see if the GUI apparatus of transcription (or its/any gamification) doesn’t disengage some users or how that apparatus might unintentionally obstruct less contributory means of exploration.
A couple of other points that came out of the WebWise crowdsourcing presentations were that the bulk of actual transcription is done by a small number of “well-informed enthusiasts” and that digitization is/should/could be driven by a collection’s potential to serve those enthusiasts. I guess it is here that I see some potential dangers, both in organizing delivery methods/interfaces around a small coterie of users (either because of their data productivity or because their “deep engagement” is the only engagement we can chart-ify; this is somewhat reminiscent of archivists’ old “the scholar is our only user” mentality) and the possibility of collections being appraised for digitization purely by their potential for crowdsourcing. It can be a consideration, sure, but I’m wary of collection mangers unduly prioritizing narrowly defined participatory evidence when defining mission fulfillment.
There is nothing in my prattle that runs counter to your argument and as you note in response to Melody, “for any given project those experiences should take on different characteristics and will likely connect with distinct but overlapping audiences.” Finally, I’m not much of a transcriptionist myself (though someone did once suggest, strangely enough, that I purse a career as a court reporter), so maybe this is a wordy, rambling self-justification of the browsing/poking/screwing around approach to collection exploration. And our larger point here is not about transcription/crowdsourcing itself, but about modeling participation. Those models will no doubt change and maybe it is because of those unforeseeable changes that I’m inclined to take an overly judicious approach towards how we define our successes.
Love this post…you make such a compelling case for transcription projects its hard to argue! I did focus in on your warning about gamification and the proverbial line between tricking users and engaging them. It seems to me that with the success of projects like you describe, institutions might spread themselves out and create transcription tasks “just because.” This type of reactive course of action reminds me of my days in the world of corporate web consulting, when suddenly every client wanted a blog/Facebook/Twitter/comments/badges/friends on their site because that was seen as the golden ticket to engaging users and bringing them back to the site. Caution is needed in libraries as well. Part of what makes a good transcription project is not the transcription in and of itself but the existence of a compelling, historically interesting collection that needs crowd-sourced labor to be fully utilized. Without that, or in a case where we just throw up any old collection online for users to poke at, represents a lost opportunity… just my gut observation to this post. I look forward to watching the related Web Wise talk!
Wonderful post – thanks for reminding us to tie our projects back to our missions, not just our “work.”
Hi Trevor: I love the idea of crowd-sourcing as engaging with public memory. It’s not just about building Archive X or Database Z. That makes so much sense. Thank you.
Thanks for posting this. I’m a big fan of crowd-sourcing and just worked on a mapping project released by the British Library. It’s a great way to pool resources and knowledge and encourage more engagement with texts and materials rather than just consuming them.
I really appreciate your blog. I would like to pick all of your brains. I am one of those “citizen archivists”, working on a project for over 10 years. While it is a massive project, the potential of having every federal record digitized which was created during the administration of Abraham Lincoln, all fully searchable, (all 14 million + of them), finally making use of technology to link those related records in a matter of second.
I am in the process of converting my website to Drupal, adding a transcription feature because all of these records are handwritten. But more than that, I want to add a comment feature for each document so that people can have conversations centered around that record. I have hesitated in the past to do much in the way of crowdsourcing because I just don’t have the manpower to monitor those who might choose to put derogatory comments or transcriptions online.
I would appreciate your comments and suggestions on how to make this project truly interactive.
Thanks in advance.
I have been exploring for a little for any high-quality articles or weblog posts on this sort
of area . Exploring in Yahoo I at last stumbled upon this website.
Studying this information So i’m happy to express that I have a very excellent uncanny feeling I found out exactly what I needed. I most no doubt will make certain to don?t forget this website and provides it a glance regularly.