Tag Archives: crowdsourcing

Lessons on the Internet for LAMs from The Oatmeal: Or, Crowdfunding and the Long Geeky Tail

Yesterday Matthew Inman (sole proprietor of the generally hilarious webcomic The Oatmeal) put up a post on his site to help raise funds to buy Tesla’s lab, Wardenclyffe Tower, preserve it, and make it into a Tesla Museum. At the time I’m writing this, 10,900  people have committed a total of $480,00  dollars to help make this happen.

I think folks who work at libraries, archives and museums need to pay attention to this. In particular, people who work at libraries, archives and museums that have a science and technology focus need to pay attention to this.

The Oatmeal and Tesla as the Geek of Geeks

If you don’t follow The Oatmeal you should, it’s a fun comic. If you do, you’ll know that Inman recently posted a funny and exuberant ode to Nikola Tesla as the geek of all geeks. It’s a story about an obsessive desire to make the world a better place through science and technology. (If you check that story out you should also check out this response from Alex Knapp and Inman’s critique of the critique.) The original cartoon uses Tesla to define what being a geek is. I like the sincerity in this particular quote at the front of it.

Geeks stay up all night disassembeling the world so they can put it back together with new features.

They tinker and fix things that aren’t broken.

Geeks abandon the world around them because they are busy soldering together a new one.

For someone who cares about the history of science and technology and the preservation and interpretation the cultural record of science and technology it is neat to see this kind of back and forth happening on the web. With that said, it is unbelievably exciting to see what happens when that kind of geeky-ness can be turned into a firehose of funding to support historic preservation.

How is this so amazingly successful?

As cultural heritage organizations get into the crowdfunding world it makes a lot of sense to study what about this is working so well. While one might not have the kind of audience Inman has, part of why he has that audience is that he’s a funny guy and he knows how to create something that people want to talk about all over the web. Even the name of the project,Let’s Build a Goddamn Tesla Museum, is funny. It is also participatory in the name alone. He is asking us to be a part of something. He is asking us to help make this happen.

Shortly after going up there were posts about this on a range of major blogs. It’s a great story and Inman is already a big deal on the web. Most importantly, Inman’s fans are the kind of people that can get really excited about supporting this particular cause. Aside from that, he publicly called out a series of different organizations that might get involved as sponsors. At least one of which was excited to sign on personally. Aside from getting the folks who were interested to just give money, he also asked them to reach out to the organizations. It just so happened that someone who has both Inman’s email address and the head of Tesla moters was thrileld to have the ouppertunity to connect the dots and help make this thing happen.  The project not only mobilizes supporters, it mobilizes people to mobilize supporters and in so doing lets everybody be a part of the story of making this thing happen.

Is this just a one off thing?
So Inman has been able to turn his web celebratory into a huge boon for a particular cultural heritage site. The next question in my mind is, is this a one time thing? I think there is a good reason to belive that this is actualy replicable in a lot of instances.

First off, Inman’s love for science and his audiance’s love for science isn’t an oddity. The web is full of science and tech fans and other web celebratories who might be game for doing this kind of thing to connect with fans and help support worthy causes.

Off the top of my head, here are three people I think could and very likely would, be up for this sort of thing for other projects related to scientists and engineers.

Jonathan Coulton

I would hazzard to guess that Jonathan Coulton fans would be thrilled to support at some archive to accession and digitize and make avaliable parts of Benoit Mandelbrot’s personal papers. Not sure exactly who has those papers but I am sure they are awesome, and I would hazzard to guess that the man who wrote an ode to the Mandelbrot Set and the fans who love it would come out in droves to support preserving his legacy.

If you haven’t heard Coulton sing the song take a minute and listen to it.

When you get to the end, you find the kind of sincerity about the possibility of science making our world a better place.

You can change the world in a tiny way
And you’re just in time to save the day
Sweeping all our fears away
You can change the world in a tiny way
Go on, change the world in a tiny way
Come on, change the world in a tiny way

We can change the world in a tiny way, and that is a message that Coulton’s fans want to hear. It’s really the same message for Inman’s geeks who are taking apart and rebuilding the world with new features.

Randall Munroe
I would similarly hazzard to guess that XKCD fans would follow Randall in any given campaign he wanted to start around a scientist or a technologist. You can see the same enthusiasm for science and technology in a lot of the XKCD comics. Here are a few of my favorites. For a sense of what people will do based on XKCD comics I would suggest reading the section on “Inspired Activities” on XKCD’s Wikipedia article.

For starters, there is the ever popular “Science: It Works” comic.

For a specific example of actual scientists check out this Zombie Curie comic.

Kate Beaton

Kate Beaton makes funny, clever, and rather nice looking historical comics. Many of those comics, like the comic about Rosalind Franklin below, are about scientists. I would hazard to guess that her fans would follow her to support these kinds of projects as well.

So these were just a few examples of other folks that I think could potentially pull this kind of thing off. I could also imagine all three being up for this sort of thing. In all three cases, you have geeks who have been able to do their long tail thing and find the other folks that geek out about the same kinds of things.

As a result, I think we could be looking at something that has the makings of a model for libraries, archives and museums to think about. Who has an audience and the idealism to help champion your cause? The web is full of people who care about science. Just take a look at what happened when someone remixed Carl Sagan’s cosmos into a song. There are some amazing people out there making a go of a career by targeting geeky niches on the web. If they are up for helping, I think they have a lot to offer. I’m curious to hear folks thoughts about how these kinds of partnerships might be brokered. What can we do to help connect these dots?

The Key Questions of Cultural Heritage Crowdsourcing Projects

To sum up my series of posts on different considerations for crowdsourcing in cultural heritage projects I thought it would be helpful to lay out a set of questions to ask when developing or evaluating projects. I think if a project has good answers to each of these four genres of questions it is well on its way toward success.

Four Areas of Questioning

Human Computation Key Questions: 

  • How could we use human judgment to augment computer processable information? 
  • What parts of a given task can be handled through computational processing and which cant and of those parts that can’t can we create structured tasks that allow people to do this work?

It would be a waste of the public’s time to invite them in to complete a task that a computer could already complete. The value human computation offers is the question of how the unique capabilities of people can be integrated into systems for the creation of public goods.

Wisdom of Crowds Key Questions:

  • How could we empower and consult with the people who care about this?
  • What models of user moderation and community governance do we need to incorporate?

Unlike human computation, the goal here is not users ability to process information or make judgments but their desire to provide their opinion. Here the key issues involve finding ways to also invite users to help define and develop norms and rules for participation.

Scaffolding Users Key Questions:

  • How can our tools act as scaffolds to help make the most of users efforts?
  • What expertise can we embed inside the design of our tools to magnify our users efforts?
  • How can our tools put a potential user in exactly the right position with the right just in time knowledge to accomplish a given activity?

All of these questions require us to think about amplifying the activity and work of participants through well designed tools. In a sense, these questions are about thinking through the interplay of the first two issues.

Motivating Users Key Questions:

  • Whose sense of purpose does this project connect to? What identities are involved?
  • What kinds of people does this matter to and how can we connect with and invite in the participation of those people?
  • Are we clearly communicating what the sense of purpose is in a way that the users we are trying to work with will understand?

I think it is critical that cultural heritage projects that engage in crowdsourcing do so by connecting to our sense of purpose and I would strongly suggest that projects think about articulating the sense of purpose that a given project connects to when developing user personas and that that sense of purpose should be evident in the way a project is presented and described to the public.

Example Cultural Heritage Crowdsourcing Projects

Along with these questions I figured I would share a list of different kinds of projects I consider to be crowdsourcing projects in the cultural heritage domain. I’ve only included projects that I think are doing some of these things very well and I have also tried to list out a diverse set of different kinds of projects.

Citizen Archivist Dashboard http://www.archives.gov/citizen-archivist/
Where citizen archivists can tag, transcribe, edit articles, upload scans, and participating in contests all related to the records of the US National Archives.

Trove http://trove.nla.gov.au/
User’s correct ocr’ed newspaper, upload images,  tagged items, post comments and add lists.

GLAM Wiki http://outreach.wikimedia.org/wiki/GLAM/Model_projects
The GLAM-WIKI project supports GLAMs and other institutions who want to work with Wikimedia to produce open-access, freely-reusable content for the public.

Old Weather http://www.oldweather.org/
Old Weather invites you to help reconstruct the climate by transcribing old weather records from ships logs.

Galaxy Zoo http://www.galaxyzoo.org/
Interactive project that allows the user to participate in a large-scale project of research: classifying millions of images of galaxies found in the Sloan Digital Sky.

UK Sound Map http://sounds.bl.uk/Sound-Maps/UK-Soundmap http://britishlibrary.typepad.co.uk/archival_sounds/uk-soundmap/
The UK Soundmap, invited people to record the sounds of their environment, be it at home, work or play.

What’s on the menu http://menus.nypl.org/
Help The New York Public Library improve a unique collection “We’re transcribing our historical restaurant menus, dish by dish, so that they can be searched by what people were eating back in the day. It’s a big job so we need your help!”

STEVE http://tagger.steve.museum/
A place where you can help museums describe their collections by applying keywords, or tags, to objects.

Further Reading & Viewing

My thinking on these issues has been shaped by a range of different talks, presentations and papers. The list below is more of a greatest hits than a comprehensive bibliography.

Ahn, L. von. (2006). Human Computation. Google TechTalks.

Brumfield, B. W. (2012, March 17). Collaborative Manuscript Transcription: Crowdsourcing at IMLS WebWise 2012. Collaborative Manuscript Transcription. Retrieved April 25, 2012, from

Clark, A. (2008). Supersizing the Mind: Embodiment, Action, and Cognitive Extension. Oxford University Press, USA.

Crowdsourcing Cultural Heritage: The Objectives Are Upside Down

deterding, sebastian. (2011, February 19). Meaningful Play: Getting Gamification Right.

Ford, P. (2011, January 6). The Web Is a Customer Service Medium (Ftrain.com).

Gee, J. P. (2000). Identity as an analytic lens for research in education. Review of research in education, 25(1), 99.

Gee, James Paul. (2003). What Video Games Have to Teach Us About Learning and Literacy (New Ed.). Palgrave Macmillan.

Holley, R. (2010). Crowdsourcing: How and Why Should Libraries Do It? D-Lib Magazine, 16(3/4). doi:10.1045/march2010-holley

Hutchins, E. (1995). How a Cockpit Remembers Its Speed. Cognitive Science, 19, 288, 265.

Juul, J. (2011, April 2). Gamification Backlash Roundup. The Ludologist.

Karen Smith-Yoshimura. (2012). Social Metadata for Libraries, Archives, and Museums: Executive Summary. Dublin, Ohio:: OCLC Research. Retrieved from

Oomen, J., & Aroyo, L. (2011). Crowdsourcing in the cultural heritage domain: Opportunities and challenges. Proceedings of the 5th International Conference on Communities and Technologies (pp. 138–149).

Software as Scaffolding and Motivation and Meaning: The How and Why of Crowdsourcing

Libraries, archives and museums have a long history of participation and engagement with members of the public. I have previously suggested that it is best to think about crowdsourcing in cultural heritage as a form of public volunteerism, and that much discussion of crowdsourcing is more specifically about two distinct phenomena, the wisdom of crowds and human computation. In this post I want to get into a bit more of why and how it works. I think understanding both the motivational components and the role that tools serve as scaffolding for activity will let us be a bit more deliberate in how we put these kinds of projects together.

The How: To be a tool is to serve as scaffolding for activity

Helping someone succeed is often largely about getting them the right tools. Consider the image of scaffolding below. The scaffolding these workers are using puts them in a position to do their job. By standing on the scaffolding they are able to do their work without thinking about the tool at all. In the activity of the work the tool disappears and allows them to go about their tasks taking for granted that they are suspended six or seven feet in the air. This scaffolding function is a generic property of tools.

All tools can act as scaffolds to enable us to accomplish a particular task. At this point it is worth briefly considering an example of how this idea of scaffolding translates into a cognitive task. In this situation I will briefly describe some of the process that is part of a park rangers regular work, measuring the diameter of a tree. This example comes from Roy Pea’s “Practices of Distributed Intelligence and Designs for Education.”

If you want to measure a tree you take a standard tape measure and do the following;

  1. Measure the circumference of the tree
  2. Remember that the diameter is related to the circumference of an object according to the formula circumference/diameter
  3. Set up the formula, replacing the variable circumference with your value
  4. Cross-multiply
  5. Isolate the diameter by dividing
  6. Reduce the fraction

Alternatively, you can just use a measuring tape that has the algorithm for diameter embedded inside it. In other words, you can just get a smarter tape measure. You can buy a tape-measure that was designed for this particular situation that can think for you (see the image below). Not only does this save you considerable time, but you end up with far more accurate measurements. There are far fewer moments for human error to enter into the equation.

The design of the tape measure has quite literally embedded the equations and cognitive actions required to measure the tree. As an aside, this kind of cognitive extension is a generic component of how humans use tools and their environments for thought.

This has a very direct translation into the design of online tools as well. For example, before joining the Library of Congress I worked on the Zotero project, a free and open source reference management tool. Zotero was translated into more than 30 languages by its users. The translation process was made significantly easier through BabelZilla. BabelZilla, an online community for developers and translators of extension for Firefox extensions, has a robust community of users that work to localize various extensions. One of the neatest features of this platform is that it stripes out the strings of text that need to be localized from the source code and then presents the potential translator with a simple web form where they just type in translations of the lines of text. You can see an image of the translation process below.

This not only makes the process much simpler and quicker it also means that potential translators need zero programming knowledge to contribute a localization. Without BabelZilla, a potential translator would need to know about how Firefox Extension locale files work, and be comfortable with editing XML files in a text editor. But BabelZilla scaffolds the user over that required knowledge and just lets them fill out translations in a web form.

Returning, as I often do, to the example of Galaxy Zoo, we can now think of the classification game as a scaffold which allows interested amateurs to participate at the cutting edge of scientific inquiry. In this scenario, the entire technical apparatus, all of the equipment used in the Sloan Digital Sky Survey, the design of the Galaxy Zoo site, and the work of all of the scientists and engineers that went into those systems are all part of one big hunk of scaffolding that puts a user in the position to contribute to the frontiers of science through their actions on the website.

I like to think that scaffolding is the how of crowdsourcing. When crowdsourcing projects work it is because of a nested set of platforms stacked one on top of the other, that let people offer up their time and energy to work that they find meaningful. The meaningful point there is the central component of the next question. Why do people participate in Crowdsourcing projects?

The Why: A Holistic Sense of Human Motivation

Why do people participate in these projects? Lets start with an example I have appealed to before from a crowdsorucing transcription project.

Ben Brumfield runs a range of crowdsourcing transcription projects. At one point in a transcription project he noticed that one of his power users was slowing down, cutting back significantly on the time they spent transcribing these manuscripts. The user explained that they had seen that there weren’t that many manuscripts left to transcribe. For this user, the 2-3 hours a day they spent working on transcriptions was an important part of their day that they had decided to deny themselves some of that experience. For this users, participating in this project was so important to them, contributing to it was such an important part of who they see themselves as, that they needed to ration out those remaining pages. They wanted to make sure that the experience lasted as long as they could. When Ben found that out he quickly put up some more pages. This particular story illustrates several broader points about what motivates us.

After a person’s basic needs are covered (food, water, shelter etc.) they tend to be primarily motivated by things that are not financial. People identify and support causes and projects that provide them with a sense of purpose. People define themselves and establish and sustain their identity and sense of self through their actions. People get a sense of meaning from doing things that matter to them. People find a sense of belonging by being a part of something bigger than themselves. For a popular account of much of the research behind these ideas see Drive: The Surprising Truth About What Motivates Us for some of the more substantive and academic research on the subject see essays in  The Handbook of Competence and Motivation and Csíkszentmihályi’s work on Flow.

Projects that can mobilize these identities ( think genealogists, amateur astronomers, philatelists, railfans, etc) and senses of purpose and offer a way for people to make meaningful contributions (far from exploiting people) provide us with the kinds of things we define ourselves by. We are what we do, or at least we are the stories we tell others about what we do. The person who started rationing out their work transcribing those manuscripts did so because that work was part of how they defined themselves.

This is one of the places where Libraries, Archives and Museums have the most to offer. As stewards of cultural memory these institutions have a strong sense of purpose and their explicit mission is to serve the public good. When we take seriously this call, and think about what the collections of culture heritage institutions represent, instead of crowdsourcing representing a kind of exploitation for labor it has the possibility to be a way in which cultural heritage institutions connect with and provide meaning full experiences with the past.


Human Computation and Wisdom of Crowds in Cultural Heritage

Libraries, archives and museums have a long history of participation and engagement with members of the public. In my last post, I charted some problems with terminology, suggesting that the cultural heritage community can re-frame crowdsourcing as engaging with an audience of committed volunteers. In this post, get a bit more specific about the two different activities that get lumped together when we talk about crowdsourcing. I’ve included a series of examples and a bit of history and context for good measure.

For the most part, when folks talk about crowdsourcing they are generally talking about two different kinds of activities, human computation and the wisdom of crowds.

Human Computation

Human Computation is grounded in the fact that human beings are able to process particular kinds of information and make judgments in ways that computers can’t. To this end, there are a range of projects that are described as crowdsourcing that are anchored in the idea of treating people as processors. The best way to explain the concept is through a few examples of the role human computation plays in crowdsourcing.

ReCaptcha is a great example of how the processing power of humans can be harnessed to improve cultural heritage collection data. Most readers will be familiar with the little ReCaptcha boxes we fill out when we need to prove that we are in fact a person and not an automated system attempting to login to some site. Our ability to read the strange and messed up text in those little boxes proves that we are people, but in the case of ReCaptcha it also helps us correct the OCR’ed text of digitized New York Times and Google Books transcripts. The same capability that allows people to be differentiated from machines is what allows us to help improve the full text search of the digitized New York Times and Google Books collections.

The principles of human computation are similarly on display in the Google Image Labeler. From 2006-2011 the Google image labeler game invited members of the public to describe and classify images. For example, in the image below a player is viewing an image of a red car. Somewhere else in the world another player is also viewing that image. Each player is invited to key in labels for the image, with a series of “off-limits” words which have already been associated with the image. Each label I can enter which matches a label entered by the other player results in points in the game. The game has inspired an open source version specifically designed for use at cultural heritage organizations. The design of this interaction is such that, in most cases, it results in generating high quality description of images.

Both the image labeler and ReCaptcha are fundamentally about tapping into the capabilities of people to process information. Where I had earlier suggested that the kind of crowdsourcing I want us to be thinking about is not about labor, these kinds of human computation projects are often fundamentally about labor. This is most clearly visible in Amazon’s Mechanical Turk project.

The tagline for Mechanical Turk is that it “gives businesses and developers access to an on-demand, scalable workforce” where “workers select from thousands of tasks and work whenever it’s convenient.” The labor focus of this site should give pause to those in the cultural heritage sector, particularly those working for public institutions. There are very legitimate concerns about this kind of labor as serving as a kind of “digital sweatshop.”

While there are legitimate concerns about the potentially exploitive properties of projects like Mechanical Turk, it is important to realize that many of the same human computation activities which one could run through Mechanical Turk are not really the same kind of labor when they are situated as projects of citizen science.

For example, Galaxy Zoo invites individuals to identify galaxies. The activity is basically the same as the Google image labeler game. Users are presented with an image of a galaxy and invited to classify it based on a simple set of taxonomic information. While the interaction is more or less the same the change in context is essential.

Galaxy Zoo invites amateur astronomers to help classify images of galaxies. While the image identification task here is more or less the same as the image identification tasks previously discussed, at least in the early stages of the project, this site often gave amateur astronomers the first opportunity to see these stellar objects. These images were all captured by a robotic telescope, so the first galaxy zoo participants who looked at these images were actually the first humans ever to see them. Think about how powerful that is.

In this case, the amateurs who catalog these galaxies did so because they want to contribute to science. Beyond engaging in this classification activity, the Galaxy Zoo project also invites members to discuss the galaxies in a discussion forum. This discussion forum ends up representing a very different kind of crowdsourcing, one based not so much on the idea of human computation but instead on a notion which I refer to here as the wisdom of crowds.

The Wisdom of Crowds, or Why Wasn’t I Consulted

The Wisdom of Crowds comes from James Surowiecki’s 2004 grandiosely titled book, The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. In the book, Surowiecki talks about a range of examples of how crowds of people can create important and valuable kinds of knowledge. Unlike human computation, the wisdom of crowds is not about highly structured activities. In Surowiecki’s argument, the wisdom of crowds is an emergent phenomena resulting from how discussion and interaction platforms, like wikis, enable individuals to add and edit each other’s work.

The wisdom of crowds notion tends to come with a bit too much utopian baggage for my tastes. I find Paul Ford’s reformulation of this notion particularly compelling. Ford suggests that the heart of this matter is that the web, unlike other mediums, is particularly well suited to answer the question “Why wasn’t I consulted.” It is worth quoting him here at length:

Why wasn’t I consulted,” which I abbreviate as WWIC, is the fundamental question of the web. It is the rule from which other rules are derived. Humans have a fundamental need to be consulted, engaged, to exercise their knowledge (and thus power), and no other medium that came before has been able to tap into that as effectively.

He goes on to explain a series of projects that succeed because of their ability to tap into this human desire to be consulted.

If you tap into the human need to be consulted you can get some interesting reactions. Here are a few: Wikipedia, StackOverflow, Hunch, Reddit, MetaFilter, YouTube, Twitter, StumbleUpon, About, Quora, Ebay, Yelp, Flickr, IMDB, Amazon.com, Craigslist, GitHub, SourceForge, every messageboard or site with comments, 4Chan, Encyclopedia Dramatica. Plus the entire Open Source movement.

Each of these cases tap into our desire to respond. Unlike other media, the comments section on news articles, or our ability to sign-up for an account and start providing our thoughts and ideas on twitter or in a tumblr is fundamentally about this desire to be consulted.

Duty Calls

The logic of Why Wasn’t I Consulted is evident in one of my favorite XKCD cartoons. In Duty Calls we find ourselves compelled to stay up late and correct the errors of other’s ways on the web. In Ford’s view, this kind of compulsion, this need to jump in and correct things, to be consulted, is something that we couldn’t do with other kinds of media and it is ultimately one of the things that powers and drives many of  the most successful online communities and projects.

Returning to the example from Galaxy Zoo, where the carefully designed human computation classification exercise provides one kind of input, the projects very active web forums capitalize on the opportunity to consult. Importantly, some of the most valuable discoveries in the Galaxy Zoo project, including an entirely new kind of green colored galaxy, were the result of users sharing and discussing some of the images from the classification exercise in the open discussion forums.

 Comparing and Contrasting

To some extent, you can think about human computation and the wisdom of crowds as opposing polls of crowdsourcing activity. I have tried to sketch out some of what I see as the differences in the table below.

Human Computation Wisdom of Crowds
Tools Sophisticated Simple
Task Nature Highly structured Open ended
Time Commitment Quick & Discrete Long & Ongoing
Social Interaction Minimal Extensive Community Building
Rules Technically Implemented Socially Negotiated

When reading over the table, think about the difference between something like the Google Image Labler for human computation and Wikipedia for the wisdom of crowds. The former is a sophisticated little tool that prompts us to engage in a highly structured task for a very brief period of time. It comes with almost no time commitment, and there is practically no social interaction. The other player could just as well be computer for our purposes and the rules of the game are strictly moderated by the technical system.

In contrast, something like Wikipedia makes use of, at least from the user experience side, a rather simple tool. Click edit, start editing. While the tool is very simple the nature of our task is huge and open-ended, help write and edit an encyclopedia of everything. While you can do just a bit of Wikipedia editing, it’s open-ended nature invites much more long-term commitment. Here there is an extensive community building process that results in the social development and negotiation of rules and norms for what behavior is acceptable and what counts as inside and outside the scope of the project.

To conclude, I should reiterate that we can and should think about human computation and the wisdom of crowds not as an either or decision for crowdsourcing but as two components that are worth designing for. As mentioned earlier, Galaxy Zoo does a really nice job of this. The image label game is quick, simple and discrete and generates fantastic scientific data. Beyond this, the open web forum where participants can build community through discussion of the things they find brings in the depth of experience possible in the wisdom of crowds. In this respect, Galaxy Zoo represents the best of both worlds. It invites anyone interested to play a short and quick game and if they want to they can stick around and get much more deeply involved, they can discuss and consult and in the process actually discover entirely new kinds of galaxies. I think the future here is going to be about knowing what parts of a crowdsourcing project are about human computation and which parts are about the wisdom of crowds and getting those two things to work together and reinforce each other.

In my next post I will bring in a bit of work in educational psychology that I think helps to better understand the psychological components of crowdsourcing. Specifically, I will focus in on how tools serve as scaffolding for action and on contemporary thinking about motivation.

The Crowd and The Library

Libraries, archives and museums have a long history of participation and engagement with members of the public. In a series of blog posts I am going to work to connects these traditions with current discussions of crowdsourcing. Crowdsourcing is a bit of a vague term, one that comes with potentially exploitative ideas related to uncompensated or undercompensated labor. In this series of I’ll try to put together a set set of related concepts; human computation, the wisdom of crowds, thinking of tools and software as scaffolding, and understanding and respecting end users motivation, that can both help clarify what crowdsourcing can do for cultural heritage organizations while also clarifying a clearly ethical approach to inviting the public to help in the collection, description, presentation, and use of the cultural record.

This series of posts started out as a talk I gave at the International Internet Preservation Consortium’s meeting earlier this month. I am sharing these ideas here with the hopes that I can getting some feedback on this line of thinking.

The Two Problems with Crowdsourcing: Crowd and Sourcing

There are two primary problems with bringing the idea of crowdsourcing into cultural heritage organizations. Both the idea of the crowd and the notion of sourcing are terrible terms for folks working as stewards for our cultural heritage. Many of the projects that end up falling under the heading of crowdsourcing  in libraries, archives and museums have not involved large and massive crowds and they have very little to do with outsourcing labor.

Most successful crowdsourcing projects are not about large anonymous masses of people. They are not about crowds. They are about inviting participation from interested and engaged members of the public. These projects can continue a long standing tradition of volunteerism and involvement of citizens in the creation and continued development of public goods.

For example, the New York Public Library’s menu transcription project, What’s on the Menu?, invites members of the public to help transcribe the names and costs of menu items from digitized copies of menus from New York restaurants. Anyone who wants to can visit the project website and start transcribing the menus. However, in practice it is a dedicated community of foodies, New York history buffs, chefs, and otherwise self-motivated individuals who are excited about offering their time and energy to help contribute, as volunteers, to improving the public library’s resource for others to use.

Not Crowds but Engaged Enthusiast Volunteers

Far from a break with the past, this is a clear continuation of a longstanding tradition of inviting members of the public in to help refine, enhance, and support resources like this collection. In the case of the menus, years ago, it was actually volunteers who sat at a desk in the reading room to catalog the original collection. In short, crowdsourcing the transcription of the menus project is not about crowds at all, it is about using digital tools to invite members of the public to volunteer in much the same way members of the public have volunteered to help organize and add value to the collection in the past.

Not Sourcing Labor but an Invitation to Meaningful Work

The problem with the term sourcing is its association with labor. Wikipedia’s definition of crowdsourcing helps further clarify this relationship, “Crowdsourcing is a process that involves outsourcing tasks to a distributed group of people.” The keyword in that definition is outsourcing. Crowdsourcing is a concept that was invented and defined in the business world and it is important that we recast it and think through what changes when we bring it into cultural heritage. Cultural heritage institutions do not care about profit or revenue, they care about making the best use of their limited resources to act as stewards  and storehouses of culture.

At this point, we need to think for a moment about what we mean by terms like work and labor. While it might be ok for commercial entities to coax or trick individuals to provide free labor the ethical implications of such trickery should give pause to cultural heritage organizations. It is critical to pause here and unpack some of the different meanings we ascribe to the terms work. When we use the term “a day’s work” we are directly referring to labor, to the kinds of work that one engages in as a financial transaction for pay. In contrast, when we use the term work to refer to someone’s “life’s work” we are referring to something that is significantly different. The former is about acquiring the resources one needs to survive. The latter is about the activities that we engage in that give our lives meaning. In cultural heritage we have clear values and missions and we are in an opportune position to invite the public to participate. However, when we do so we should not treat them as a crowd, and we should not attempt to source labor from them. When we invite the public we should do so under a different set of terms. A set of terms that is focused on providing meaningful ways for the public to interact with, explore, understand the past.

Citizen Scientists, Archivists and the Meaning of Amateur

Some of the projects that fit under the heading of crowdsourcing have chosen very different kinds of terms to describe themselves. For example,  Galaxy Zoo project, which invites anyone interested in Astronomy to help catalog a million images of stellar objects, refers to its users as citizen scientists. Similarly, the United States National Archives and Records Administration recently launched crowdsourcing project, the Citizen Archivists Dashboard, invites citizens, not members of some anonymous crowd, to participate. The names of these projects highlight the extent to which they invite participation from members of the public who identify with and the characteristics and ways of thinking of particular professional occupations. While these citizen archivists and scientists are not professional, in the sense that they are unpaid, they connect with something a bit different than volunteerism. They are amateurs in the best possible sense of the term.

Amateurs have a long and vibrant history as contributors to the public good. Coming to English from French, the term Amateur, means a “lover of.” The primarily negative connotations we place on the term are a relatively recent development. In other eras, the term Amateur simply meant that someone was not a professional, that is, they were not paid for these particular labors of love. Charles Darwin, Gregor Mendal, and many others who made significant contributions to the sciences did so as Amateurs. As a continuation of this line of thinking, the various Zooniverse projects see the amateurs who participate as peers, in many cases listing them as co-authors of academic papers published as a result of their work. I suggest that we think of crowdsourcing not as extracting labor from a crowd, but of a way for us to invite the participation of amateurs (in the non-derogatory sense of the word) in the creation, development and further refinement of public goods.

Toward a better, more nuanced, notion of Crowdsourcing

With all this said, fighting against a word is rarely a successful project, from here out I will continue to use and refine a definition for crowdsourcing that I think works for the cultural heritage sector. In the remainder of this series of posts I will explain what I think of as the four key components of this ethical crowdsourcing, this crowdsourcing that invites members of the public to participate as amateurs in the production, development and refinement of public goods. For me these fall into the following four considerations, each of which suggests a series of questions to ask of any cultural heritage crowdsourcing project. The four concepts are;

  1. Thinking in terms of Human Computation
  2. Understanding that the Wisdom of Crowds is Why Wasn’t I Consulted
  3. Thinking of Tools and Software as Scaffolding
  4. A Holistic Understanding of Human Motivation

Together, I believe these four concepts provide us with the descriptive language to understand what it is about the web that makes crowdsourcing such a powerful tool. Not only for improving and enhancing data related to cultural heritage collections, but also as a way for deep engagement with the public.

In the next three posts I will talk through and define these four concepts offer up a series of questions to ask and consider in imagining, designing and implementing crowdsourcing projects at cultural heritage institutions.


Crowdsourcing Cultural Heritage: The Objectives Are Upside Down

Still not the droid… By Stéfan: Our crowdsourcing conversation is upside down, much like how Calculon is holding these stormtroopers upside down.

Some fantastic work is going on in crowdsourcing the transcription of cultural heritage collections. After some recent thinking and conversation on these projects I want to more strongly and forcefully push a point about this work. This is the same line of thinking I started nearly a year ago in Meaningification and Crowdscafolding: Forget Badges. I’ve come to believe that conversations about the objective of this work, as broadly discussed, are exactly upside down. Transcripts and other data are great, but when done right, crowdsourcing projects are the best way of accomplishing the entire point of putting collections online. I think a lot of the people who work on these projects think this way but we are still in a situation where we need to justify this work by the product instead of justifying it by the process.

Getting transcriptions, or for that matter getting any kind of data or work is a by-product of something that is actually far more amazing than being able to better search through a collection.  The process of crowdsourcing projects fulfills the mission of digital collections better than the resulting searches. That is, when someone sits down to transcribe a document they are actually better fulfilling the mission of the cultural heritage organization than anyone who simply stops by to flip through the pages.

Why are we putting cultural heritage collections online again?

There are a range of reasons that we put digital collections online. With that said the single most important reason to do so is to make history accessible and invite students, researchers, teachers, and anyone in the public to explore and connect with our past. Historians, Librarians, Archivists, and Curators who share digital collections and exhibits can measure their success toward this goal in how people use, reuse, explore and understand these objects.

In general, crowdsourcing transcription is first and foremost described as a means by which we can get better data to help better enable the kinds of use and reuse that we want people to make of our collections. In this respect, the general idea of crowdsourcing is described as an instrument for getting data that we can use to make collections more accessible. Don’t get me wrong, crowdsourcing does this. With that said it does so much more than this. In the process of developing these crowdsourcing projects we have stumbled into something far more exciting than speeding up or lowering the costs of document transcription. Far better than being an instrument for generating data that we can use to get our collections more used it is actually the single greatest advancement in getting people using and interacting with our collections. A few examples will help illustrate this.

Increased Use, Deeper Use, Crowdsourcing Civil War Diaries

Last year, the University of Iowa libraries crowdsourced the transcription of a set of civil war diaries. I had the distinct privilege of interviewing Nicole Saylor, the head of Digital Library Services, about the project. From any perspective the project was very successful. They got great transcriptions and they ended up attracting more donors to support their work.

The project also succeeded in dramatically increasing site traffic. As Nicole explained, “On June 9, 2011, we went from about 1000 daily hits to our digital library on a really good day to more than 70,000.” As great as all this is, as far as I’m concerned, the most valuable thing that happened is that when people come to transcribe the diaries they engage with the objects more deeply than they would have if transcription was not an option. Consider this quote from Nicole explaining how one particular transcriptionist interacted with the collection. It is worth quoting her at length;

The transcriptionists actually follow the story told in these manuscripts and often become invested in the story or motivated by the thought of furthering research by making these written texts accessible. One of our most engaged transcribers, a man from the north of England, has written us to say that the people in the diaries have become almost an extended part of his family. He gets caught up in their lives, and even mourns their deaths. He has enlisted one of his friends, who has a PhD in military history, to look for errors in the transcriptions already submitted. “You can do it when you want as long as you want, and you are, literally, making history,” he once wrote us.  That kind of patron passion for a manuscript collection is a dream. Of the user feedback we’ve received, a few of my other favorites are: “This is one of the COOLEST and most historically interesting things I have seen since I first saw a dinosaur fossil and realized how big they actually were.” “I got hooked and did about 20. It’s getting easier the longer I transcribe for him because I’m understanding his handwriting and syntax better.” “Best thing ever. Will be my new guilty pleasure. That I don’t even need to feel that guilty about.

The transcriptions are great, they make the content more accessible, but as Nicole explains, “The connections we’ve made with users and their sustained interest in the collection is the most exciting and gratifying part.”  This is exactly as it should be! The invitation of crowdsourcing and the event of the project are the most valuable and precious user experiences that a cultural heritage institution can offer its users. It is essential that the project offer meaningful work. These projects invite the public to leave a mark and help enhance the collections. With that said, if the goal is to get people to engage with collections and engage deeply with the past then the transcripts are actually a fantastic byproduct that is created by offering meaningful activities for the public to engage in.

Rationing out Transcription

Part of what prompted this post is a story that Ben Brumfield gave on crowdsourcing transcription at the recent Institute for Museum and Library Services Web Wise conference. It was a great talk, and when they get around to posting it online you should all go watch it. There was one particular moment in the talk that I thought was essential for this discussion.

At one point in a transcription project he noticed that one of his most valuable power users was slowing down on their transcriptions. The user had started to cut back significantly in the time they spent transcribing this particular set of manuscripts. Ben reached out to the user and asked about it. Interestingly, the user responded to explain that they had noticed that there weren’t as many scanned documents showing up that required transcription. For this user, the 2-3 hours they spent each day working on transcriptions was such an important experience, such an important part of their day, that they had decided to cut back and deny themselves some of that experience. The user needed to ration out that experience. It was such an important part of their day that they needed to make sure that it lasted.

At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory.

Crowdsourcing is better at Digital Collections than Displaying Digital Collections

What crowdsourcing does, that most digital collection platforms fail to do, is offers an opportunity for someone to do something more than consume information. When done well, crowdsourcing offers us an opportunity to provide meaningful ways for individuals to engage with and contribute to public memory. Far from being an instrument which enables us to ultimately better deliver content to end users, crowdsourcing is the best way to actually engage our users in the fundamental reason that these digital collections exist in the first place.

Meaningful Activity is the Apex of User Experience for Cultural Heritage Collections

When we adopt this mindset, the money spent on crowdsourcing projects in terms of designing and building systems, in terms of staff time to manage, etc. is not something that can be compared to the costs of having someone transcribe documents on mechanical turk. Think about it this way, the transcription of those documents is actually a precious resource, a precious bit of activity that would mean the world to someone. It isn’t that any task or obstacle for users to take on will do, for example, if you asked users to transcribe documents that could easily be OCRed the whole thing loses its meaning and purpose. It isn’t about sisyphean tasks, it is about providing meaningful ways for the public to enhance collections while more deeply engaging and exploring them.

Just as Ben’s user rationed out the transcription of those documents we might actually think about crowdsourcing experiences as one of the most precious things we can offer our users. Instead of simply offering them the ability to browse or poke around in digital collections we can invite them to participate. We are in a position to let our users engage in a personal way that is only for them at that moment. Instead of browsing through a collection they literally become a part of our historical record.

The Important Difference between Exploitation-ware and Software for the Soul

Slide from Ruling the World

As a bit of a coda, what is tricky here is that there is (strangely) an important and  but somewhat subtle line between exploiting people and giving people the most valuable kinds of experience that we can offer for digital collections. The trick is that gamification is (for the most part) bullshit. You can trick people into doing things with gimmicks, but when you do so you frequently betray their trust and can ruin the innately enjoyable nature of being a part of something that matters to you, in our case, the way that  users could deeply interact with and explore the past via your online collections. What sucks about what has happened in the idea of gamification is that it is about the least interesting parts of games. It’s about leaderboards and badges. As Sebastian Deterding has explained, many times and many ways, the best part of games, the things that we should actually try to emulate in a gamification that attempts to be more than pointsification or exploitationware are the part of games that let us participate in something bigger. It is the part of games that invites us to playfully take on big challenges. Be wary of anyone who tries to suggest we should trick people or entice them into this work. We can offer users an opportunity to deeply explore, connect with and contribute to public memory and we can’t let anything get in the way of that.