Tag Archives: digital libraries

Designing Online Communities: Read My Accepted Dissertation Proposal

Wisdom of the Ancients: the web-comic-epigraph for my dissertation proposal, from XKCD

As of last monday, I have now successfully defended my dissertation proposal. In the context of my doctoral program, that means there is just one more hurdle to climb over to finish. I’m generally rather excited about the project, and would be thrilled to have more input and feedback on it (Designing Online Communities Proposal PDF). I would be happy for any and all comments on it in the comments of this post.

Designing Online Communities: How Designers, Developers, community Managers, And Software Structure Discourse And knowledge Production On The Web

Abstract: Discussion on the web is mediated through layers of software and protocols. As scholars increasingly turn to study communication, learning and knowledge production on the web, it is essential to look below the surface of interaction and consider how site administrators, programmers and designers create interfaces and enable functionality. The managers, administrators and designers of online communities can turn to more than 20 years of technical books for guidance on how to design and structure online communities toward particular objectives. Through analysis of this “how-to” literature, this dissertation intends to offer a point of entry into the discourse of design and configuration that plays an integral role in structuring how learning and knowledge are produced online. The project engages with and interprets “how-to” literature to help study software in a way that respects the tension that exists between the structural affordances of software with the dynamic and social nature of software as a component in social interaction.

What’s Next? 

At some point in the next year I will likely defend a completed dissertation. Places do dissertations differently, in my program the idea is that what I just defended is actually the first three chapters of a five chapter dissertation. So, at this point I need to follow through on what I said I would do in my methods section (to create chapter 4, results) and then write up how it connects with the conceptual context section (to create chapter 5, conclusions). So I should be able to grind this out in relatively short order.

At this point, I think this project should be interesting enough to warrant a book proposal. So I’ll likely start exploring putting together a book proposal for it in the next year as well. With that in mind, any suggestions for who might be interested in receiving a proposal on this topic are welcome.

Human Computation and Wisdom of Crowds in Cultural Heritage

Libraries, archives and museums have a long history of participation and engagement with members of the public. In my last post, I charted some problems with terminology, suggesting that the cultural heritage community can re-frame crowdsourcing as engaging with an audience of committed volunteers. In this post, get a bit more specific about the two different activities that get lumped together when we talk about crowdsourcing. I’ve included a series of examples and a bit of history and context for good measure.

For the most part, when folks talk about crowdsourcing they are generally talking about two different kinds of activities, human computation and the wisdom of crowds.

Human Computation

Human Computation is grounded in the fact that human beings are able to process particular kinds of information and make judgments in ways that computers can’t. To this end, there are a range of projects that are described as crowdsourcing that are anchored in the idea of treating people as processors. The best way to explain the concept is through a few examples of the role human computation plays in crowdsourcing.

ReCaptcha is a great example of how the processing power of humans can be harnessed to improve cultural heritage collection data. Most readers will be familiar with the little ReCaptcha boxes we fill out when we need to prove that we are in fact a person and not an automated system attempting to login to some site. Our ability to read the strange and messed up text in those little boxes proves that we are people, but in the case of ReCaptcha it also helps us correct the OCR’ed text of digitized New York Times and Google Books transcripts. The same capability that allows people to be differentiated from machines is what allows us to help improve the full text search of the digitized New York Times and Google Books collections.

The principles of human computation are similarly on display in the Google Image Labeler. From 2006-2011 the Google image labeler game invited members of the public to describe and classify images. For example, in the image below a player is viewing an image of a red car. Somewhere else in the world another player is also viewing that image. Each player is invited to key in labels for the image, with a series of “off-limits” words which have already been associated with the image. Each label I can enter which matches a label entered by the other player results in points in the game. The game has inspired an open source version specifically designed for use at cultural heritage organizations. The design of this interaction is such that, in most cases, it results in generating high quality description of images.

Both the image labeler and ReCaptcha are fundamentally about tapping into the capabilities of people to process information. Where I had earlier suggested that the kind of crowdsourcing I want us to be thinking about is not about labor, these kinds of human computation projects are often fundamentally about labor. This is most clearly visible in Amazon’s Mechanical Turk project.

The tagline for Mechanical Turk is that it “gives businesses and developers access to an on-demand, scalable workforce” where “workers select from thousands of tasks and work whenever it’s convenient.” The labor focus of this site should give pause to those in the cultural heritage sector, particularly those working for public institutions. There are very legitimate concerns about this kind of labor as serving as a kind of “digital sweatshop.”

While there are legitimate concerns about the potentially exploitive properties of projects like Mechanical Turk, it is important to realize that many of the same human computation activities which one could run through Mechanical Turk are not really the same kind of labor when they are situated as projects of citizen science.

For example, Galaxy Zoo invites individuals to identify galaxies. The activity is basically the same as the Google image labeler game. Users are presented with an image of a galaxy and invited to classify it based on a simple set of taxonomic information. While the interaction is more or less the same the change in context is essential.

Galaxy Zoo invites amateur astronomers to help classify images of galaxies. While the image identification task here is more or less the same as the image identification tasks previously discussed, at least in the early stages of the project, this site often gave amateur astronomers the first opportunity to see these stellar objects. These images were all captured by a robotic telescope, so the first galaxy zoo participants who looked at these images were actually the first humans ever to see them. Think about how powerful that is.

In this case, the amateurs who catalog these galaxies did so because they want to contribute to science. Beyond engaging in this classification activity, the Galaxy Zoo project also invites members to discuss the galaxies in a discussion forum. This discussion forum ends up representing a very different kind of crowdsourcing, one based not so much on the idea of human computation but instead on a notion which I refer to here as the wisdom of crowds.

The Wisdom of Crowds, or Why Wasn’t I Consulted

The Wisdom of Crowds comes from James Surowiecki’s 2004 grandiosely titled book, The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. In the book, Surowiecki talks about a range of examples of how crowds of people can create important and valuable kinds of knowledge. Unlike human computation, the wisdom of crowds is not about highly structured activities. In Surowiecki’s argument, the wisdom of crowds is an emergent phenomena resulting from how discussion and interaction platforms, like wikis, enable individuals to add and edit each other’s work.

The wisdom of crowds notion tends to come with a bit too much utopian baggage for my tastes. I find Paul Ford’s reformulation of this notion particularly compelling. Ford suggests that the heart of this matter is that the web, unlike other mediums, is particularly well suited to answer the question “Why wasn’t I consulted.” It is worth quoting him here at length:

Why wasn’t I consulted,” which I abbreviate as WWIC, is the fundamental question of the web. It is the rule from which other rules are derived. Humans have a fundamental need to be consulted, engaged, to exercise their knowledge (and thus power), and no other medium that came before has been able to tap into that as effectively.

He goes on to explain a series of projects that succeed because of their ability to tap into this human desire to be consulted.

If you tap into the human need to be consulted you can get some interesting reactions. Here are a few: Wikipedia, StackOverflow, Hunch, Reddit, MetaFilter, YouTube, Twitter, StumbleUpon, About, Quora, Ebay, Yelp, Flickr, IMDB, Amazon.com, Craigslist, GitHub, SourceForge, every messageboard or site with comments, 4Chan, Encyclopedia Dramatica. Plus the entire Open Source movement.

Each of these cases tap into our desire to respond. Unlike other media, the comments section on news articles, or our ability to sign-up for an account and start providing our thoughts and ideas on twitter or in a tumblr is fundamentally about this desire to be consulted.

Duty Calls

The logic of Why Wasn’t I Consulted is evident in one of my favorite XKCD cartoons. In Duty Calls we find ourselves compelled to stay up late and correct the errors of other’s ways on the web. In Ford’s view, this kind of compulsion, this need to jump in and correct things, to be consulted, is something that we couldn’t do with other kinds of media and it is ultimately one of the things that powers and drives many of  the most successful online communities and projects.

Returning to the example from Galaxy Zoo, where the carefully designed human computation classification exercise provides one kind of input, the projects very active web forums capitalize on the opportunity to consult. Importantly, some of the most valuable discoveries in the Galaxy Zoo project, including an entirely new kind of green colored galaxy, were the result of users sharing and discussing some of the images from the classification exercise in the open discussion forums.

 Comparing and Contrasting

To some extent, you can think about human computation and the wisdom of crowds as opposing polls of crowdsourcing activity. I have tried to sketch out some of what I see as the differences in the table below.

Human Computation Wisdom of Crowds
Tools Sophisticated Simple
Task Nature Highly structured Open ended
Time Commitment Quick & Discrete Long & Ongoing
Social Interaction Minimal Extensive Community Building
Rules Technically Implemented Socially Negotiated

When reading over the table, think about the difference between something like the Google Image Labler for human computation and Wikipedia for the wisdom of crowds. The former is a sophisticated little tool that prompts us to engage in a highly structured task for a very brief period of time. It comes with almost no time commitment, and there is practically no social interaction. The other player could just as well be computer for our purposes and the rules of the game are strictly moderated by the technical system.

In contrast, something like Wikipedia makes use of, at least from the user experience side, a rather simple tool. Click edit, start editing. While the tool is very simple the nature of our task is huge and open-ended, help write and edit an encyclopedia of everything. While you can do just a bit of Wikipedia editing, it’s open-ended nature invites much more long-term commitment. Here there is an extensive community building process that results in the social development and negotiation of rules and norms for what behavior is acceptable and what counts as inside and outside the scope of the project.

To conclude, I should reiterate that we can and should think about human computation and the wisdom of crowds not as an either or decision for crowdsourcing but as two components that are worth designing for. As mentioned earlier, Galaxy Zoo does a really nice job of this. The image label game is quick, simple and discrete and generates fantastic scientific data. Beyond this, the open web forum where participants can build community through discussion of the things they find brings in the depth of experience possible in the wisdom of crowds. In this respect, Galaxy Zoo represents the best of both worlds. It invites anyone interested to play a short and quick game and if they want to they can stick around and get much more deeply involved, they can discuss and consult and in the process actually discover entirely new kinds of galaxies. I think the future here is going to be about knowing what parts of a crowdsourcing project are about human computation and which parts are about the wisdom of crowds and getting those two things to work together and reinforce each other.

In my next post I will bring in a bit of work in educational psychology that I think helps to better understand the psychological components of crowdsourcing. Specifically, I will focus in on how tools serve as scaffolding for action and on contemporary thinking about motivation.

The Crowd and The Library

Libraries, archives and museums have a long history of participation and engagement with members of the public. In a series of blog posts I am going to work to connects these traditions with current discussions of crowdsourcing. Crowdsourcing is a bit of a vague term, one that comes with potentially exploitative ideas related to uncompensated or undercompensated labor. In this series of I’ll try to put together a set set of related concepts; human computation, the wisdom of crowds, thinking of tools and software as scaffolding, and understanding and respecting end users motivation, that can both help clarify what crowdsourcing can do for cultural heritage organizations while also clarifying a clearly ethical approach to inviting the public to help in the collection, description, presentation, and use of the cultural record.

This series of posts started out as a talk I gave at the International Internet Preservation Consortium’s meeting earlier this month. I am sharing these ideas here with the hopes that I can getting some feedback on this line of thinking.

The Two Problems with Crowdsourcing: Crowd and Sourcing

There are two primary problems with bringing the idea of crowdsourcing into cultural heritage organizations. Both the idea of the crowd and the notion of sourcing are terrible terms for folks working as stewards for our cultural heritage. Many of the projects that end up falling under the heading of crowdsourcing  in libraries, archives and museums have not involved large and massive crowds and they have very little to do with outsourcing labor.

Most successful crowdsourcing projects are not about large anonymous masses of people. They are not about crowds. They are about inviting participation from interested and engaged members of the public. These projects can continue a long standing tradition of volunteerism and involvement of citizens in the creation and continued development of public goods.

For example, the New York Public Library’s menu transcription project, What’s on the Menu?, invites members of the public to help transcribe the names and costs of menu items from digitized copies of menus from New York restaurants. Anyone who wants to can visit the project website and start transcribing the menus. However, in practice it is a dedicated community of foodies, New York history buffs, chefs, and otherwise self-motivated individuals who are excited about offering their time and energy to help contribute, as volunteers, to improving the public library’s resource for others to use.

Not Crowds but Engaged Enthusiast Volunteers

Far from a break with the past, this is a clear continuation of a longstanding tradition of inviting members of the public in to help refine, enhance, and support resources like this collection. In the case of the menus, years ago, it was actually volunteers who sat at a desk in the reading room to catalog the original collection. In short, crowdsourcing the transcription of the menus project is not about crowds at all, it is about using digital tools to invite members of the public to volunteer in much the same way members of the public have volunteered to help organize and add value to the collection in the past.

Not Sourcing Labor but an Invitation to Meaningful Work

The problem with the term sourcing is its association with labor. Wikipedia’s definition of crowdsourcing helps further clarify this relationship, “Crowdsourcing is a process that involves outsourcing tasks to a distributed group of people.” The keyword in that definition is outsourcing. Crowdsourcing is a concept that was invented and defined in the business world and it is important that we recast it and think through what changes when we bring it into cultural heritage. Cultural heritage institutions do not care about profit or revenue, they care about making the best use of their limited resources to act as stewards  and storehouses of culture.

At this point, we need to think for a moment about what we mean by terms like work and labor. While it might be ok for commercial entities to coax or trick individuals to provide free labor the ethical implications of such trickery should give pause to cultural heritage organizations. It is critical to pause here and unpack some of the different meanings we ascribe to the terms work. When we use the term “a day’s work” we are directly referring to labor, to the kinds of work that one engages in as a financial transaction for pay. In contrast, when we use the term work to refer to someone’s “life’s work” we are referring to something that is significantly different. The former is about acquiring the resources one needs to survive. The latter is about the activities that we engage in that give our lives meaning. In cultural heritage we have clear values and missions and we are in an opportune position to invite the public to participate. However, when we do so we should not treat them as a crowd, and we should not attempt to source labor from them. When we invite the public we should do so under a different set of terms. A set of terms that is focused on providing meaningful ways for the public to interact with, explore, understand the past.

Citizen Scientists, Archivists and the Meaning of Amateur

Some of the projects that fit under the heading of crowdsourcing have chosen very different kinds of terms to describe themselves. For example,  Galaxy Zoo project, which invites anyone interested in Astronomy to help catalog a million images of stellar objects, refers to its users as citizen scientists. Similarly, the United States National Archives and Records Administration recently launched crowdsourcing project, the Citizen Archivists Dashboard, invites citizens, not members of some anonymous crowd, to participate. The names of these projects highlight the extent to which they invite participation from members of the public who identify with and the characteristics and ways of thinking of particular professional occupations. While these citizen archivists and scientists are not professional, in the sense that they are unpaid, they connect with something a bit different than volunteerism. They are amateurs in the best possible sense of the term.

Amateurs have a long and vibrant history as contributors to the public good. Coming to English from French, the term Amateur, means a “lover of.” The primarily negative connotations we place on the term are a relatively recent development. In other eras, the term Amateur simply meant that someone was not a professional, that is, they were not paid for these particular labors of love. Charles Darwin, Gregor Mendal, and many others who made significant contributions to the sciences did so as Amateurs. As a continuation of this line of thinking, the various Zooniverse projects see the amateurs who participate as peers, in many cases listing them as co-authors of academic papers published as a result of their work. I suggest that we think of crowdsourcing not as extracting labor from a crowd, but of a way for us to invite the participation of amateurs (in the non-derogatory sense of the word) in the creation, development and further refinement of public goods.

Toward a better, more nuanced, notion of Crowdsourcing

With all this said, fighting against a word is rarely a successful project, from here out I will continue to use and refine a definition for crowdsourcing that I think works for the cultural heritage sector. In the remainder of this series of posts I will explain what I think of as the four key components of this ethical crowdsourcing, this crowdsourcing that invites members of the public to participate as amateurs in the production, development and refinement of public goods. For me these fall into the following four considerations, each of which suggests a series of questions to ask of any cultural heritage crowdsourcing project. The four concepts are;

  1. Thinking in terms of Human Computation
  2. Understanding that the Wisdom of Crowds is Why Wasn’t I Consulted
  3. Thinking of Tools and Software as Scaffolding
  4. A Holistic Understanding of Human Motivation

Together, I believe these four concepts provide us with the descriptive language to understand what it is about the web that makes crowdsourcing such a powerful tool. Not only for improving and enhancing data related to cultural heritage collections, but also as a way for deep engagement with the public.

In the next three posts I will talk through and define these four concepts offer up a series of questions to ask and consider in imagining, designing and implementing crowdsourcing projects at cultural heritage institutions.


Explore and Share Cultural Heritage Collections with Viewshare.org: Notes for WebWise Talk

This is just a quick post to share the slides and links from the talk I am giving at WebWise today.

The talk starts by explaining the idea behind the tool. Specifically, how making it easy to make interfaces to cultural heritage collections can help librarians, archivists, curators, and historians both better understand relationships between objects in a cultural heritage collection and how the tool can help them communicate those ideas to audiences. After explaining the kinds of interfaces you can make, I walk through a detailed example of what one of these views can do by looking at a prototype interface created by an Archivist at the National Gallery of Art to the Samuel H. Kress Collection History Database.

I wanted to make sure that everyone had links to all the views I mention. So here are all the links.

NDIIPP Partners Collections Interface:(On Viewshare) (Embeded on NDIIPP’s site): This is an interface to a collection of collections. It acts as a kind of directory for digital collections and it was created from a spreadsheet.

Fulton Street Trade Card View: (On Viewshare)
The Fulton Street Trade Card collection features 245 late 19th and early 20th century illustrated trade cards from merchant’s along the Fulton Street retail thoroughfare in Brooklyn, NY. Using a Viewshare pie chart view, the user is able to run queries and faceted search on the cards’ metadata in ways a simple catalog or scroll would not allow. Using the facets you can limit the chart to a certain element, such as business type, and then get numbers and percentages about the subjects, format, or other elements of the cards’ content.

History of Fairfax County in Postcards: (On Viewshare): A very simple view from a simple spreadsheet. If you like, you can find the spreadsheet this is based in the Viewshare documentation and work from it to get a sense of how the tool works.

Cason Monk-Metclaf Funeral Directors View: (On Viewshare): (My View on Viewshare): (Embeded on East Texas Digital Archives & Collections Site) This is one of the most interesting datasets uploaded to Viewshare. It is a set of data transcribed from historic funeral records.

Samuel H. Kress Collection History Database Prototype View: National Gallery of Art (On Viewshare) This view allows users to explore the relationships between purchase information for a work of art and other aspects of the object, including its current location. This data comes from the Samuel H. Kress Collection History and Conservation Database. The relational database documents the art collection’s acquisition, dispersal, and conservation over time and was created by the National Gallery of Art’s Gallery Archives with funding from the Samuel H. Kress Foundation. The data shared here is not complete. Viewshare data and views are intended only for preliminary demonstration of the data and should not be cited in research.