Digital Library Infrastructures, Cosmos Exhibition, Digital Folklore & Dissertation -> Book: 2014 in Review

Another year. Another chance to push the pause button for a bit and try and make some sense of what it is I’ve been doing. As I did in 2012 and 2013, I am taking a few minutes to try to sift and categorize. So if you are interested in a recap of things I’ve done this year this post is for you, if not, I imagine you have already decided to stop reading.

Digital Infrastructure for Libraries and Archives

An image of me gesturing into space with a cartoon mecha archivist from the Radcliffe alumni magazine.
An image of me gesturing into space with a cartoon mecha archivist from the Radcliffe alumni magazine.

The bulk of my work this year falls under the broad category of exploring/improving digital infrastructure for cultural heritage institutions. I published 13 posts on this blog, including pieces on the leadership roles that digital archivists should playhow research questions work in the digital humanities and a knowledge infrastructures in digital humanities centers. If you search for my name in the Library of Congress and restrict it to the year 2014 you find there are 74 things I’m associated with from the year. That includes a mixture of blog posts, reports, and in a few cases people mentioning me in videos of talks from the Digital Preservation conference I served on the planning committee for. Below I’ve tried to break up some of the things I worked on this year into a few different areas.

Cosmos Online Collection Launched

An example item from the cosmos collection.
An example item from the cosmos collection.

January saw the launch of Finding Our Place in The Cosmos, an online collection/exhibition that I spent the previous year curating and project managing.

As part of the launch of the collection I did a lot of writing about it for various communications channels at The Library of Congress. I  interviewed astronomer David Grinspoon about his connections and relationship with his mentor Carl Sagan, I wrote about some of Sagan’s course materials for the Library of Congress science blog. I wrote about notions of technology and progress evident in primary sources for science teachers for the library of congress teachers blog. I wrote about Carl Sagan’s childhood writings on science and poetry for the Poet Laureate’s blog.

Along with writing on it for more general audiences, I also put together two reflective pieces about the process of working on the exhibition including  a draft style guide for digital collection hypertexts, and a piece on the role that worked through how I used pinterest to open up the process of identifying and selecting items for the exhibition. I loved working on this project, the opportunity to explore the collections at LC, to dig deep into Sagan’s papers, to think through the best way to assemble the technology to tell the story and the chance to work with so many smart people from across the institution.

Becoming Dr. Owens and From Dissertation to Book 

A hat, a hood, and a large document to commemorate the end of 23 years of school.
A hat, a hood, and a large document to commemorate the end of 23 years of school.

In February, I successfully defended my dissertation. I checked the box to provide my dissertation directly from George Mason University’s digital repository and that in no way held back in landing a book contract. So ends my continuous 23 years of schooling.

In spring of 2015 a revised version of my dissertation study is on track to be published as a book in the New Literacies and Digital Epistemologies series which Colin Lankshear and Michele Knobel edit.

I’m thrilled. I’ve been following Colin and Michele’s work on new literacies for nearly ten years, and books in the series like Rebecca Black’s Adolescents and Online Fan Fiction played a significant role in informing the study. My dissertation research on the history, structure and ideology of software platforms enabling online communities was informed by this body of scholarship and I am excited to see that it will end up as part of this list.

After receiving some feedback on the state of the dissertation itself, I took six months of weekends and evenings to revise and further transition it into more of a book form. After receiving approval on the text by the series editors I recently reviewed a copy edited version of it and will likely be looking at proofs in the next few months. So it sounds like everything is on track for the book to come out in the middle of 2015.

Born Digital Folklore and Vernaculars

A screen shot of an archived copy of Memegenerator in the Library of Congress web archives.
A screen shot of an archived copy of Memegenerator in the Library of Congress web archives.

One of the best parts of working in NDIIPP has been the opportunity to connect with the various custodial divisions of The Library of Congress to work through their issues in particular born digital content domains. This year it was my pleasure to collaborate with some of my colleagues in the American Folklife Center to explore the role that cultural heritage organizations can play in collecting and providing access to records of digital vernaculars and folk cultures.

I was able to plan, host and co-unchair Curate Camp Digital Culture with Folklorist Trevor Blank and Tumblr’s Meme Librarian Amanda Brennan. The camp was both a ton of fun and enlightening. It was a distinct pleasure to help guide NDIIPP’s junior fellow, Julia Fernandez through her fantastic work conducting a series of interviews I had started off a few years back exploring these issues. All told, the 13 interviews we did on this topic are functionally a brief 35,000 word book on the subject. In them you can find serious discussion and exploration of everything from Bronies, to deviantArt, to an exhibit of animated GIFs,  yelp reviews of restaurants and LOLCats.

Informed by the unconference and the interviews, I was able to help the American Folklife Center develop some new collections. Of the many ideas we explored, two seem to have really had legs so far. First, I’ve been able to make some significant contributions to scoping the Library of Congress Digital Culture Web Archive. So whatever else I do in my career, I’ve been instrumental in making sure that the a bunch of reaction gifs, meme images, creepypastas, and sites that document the meaning of various emoji and facebook symbols persist as part of the Library of Congress web archives. I’ve even helped contribute to the Library of Congress’ illustrious collection of bibles with the acquisition of The Lolcat Bible.

An Award Wining Year

Aside from all of this, I won two awards this year! The Society of American Archivists proclaimed me the Archival Innovator for 2014 and I won the C Herbert Finch Award for an Online Publication for my work curating/managing the Finding Our Place in the Cosmos Online Collection.

Looking back on 2014 I feel incredibly lucky to have had the opportunities I’ve had. I’ve been able to work with some amazing people, on a range of projects that I think are the right fit for my talents and interests.

To that end, I’m looking forward to the new adventures that 2015 has in store.

Discovery and Justification are Different: Notes on Science-ing the Humanities

Computer Scientist: “You can’t do that with Topic Modeling.”

Humanist: “No, I can because I’m not a scientist. We have this thing called Hermeneutics.”

Computer Scientist: “…”

Humanist: “No really, we get to do what we want, we read texts against each other, and then there is this hermeneutic circle grounded in intersubjectivity.”

Computer Scientist: “Ok, but you still can’t make a claim using this as evidence.”

Humanist: “I think we are going to have to agree to disagree here, I think we have different ideas about how evidence works.”


While watching the tweets from the Digital Humanities Topic Modeling meeting a few weeks ago I started to feel the above dialog play out. I wasn’t there, and I am not trying to pigeonhole anyone here. I’ve seen this kind of back and forth happen in a range of different situations where humanities types start picking up and using algorithmic, computational, and statistical techniques. What of all this counts for what? What can you say based on the results of a given technique? One way to resolve this is to say that humanists and scientists should have different rules for what counts as evidence. I am increasingly feeling the need to reject this different rules approach.

I don’t think the issue here is different ways of knowing, incompatible paradigms, or anything big and lofty like that. I think the issue at the heart of this back and forth dialog is about two different contexts. This is about what you can do in the generative context of discovery vs. what you get can do in the context of justifying  a set of claims.

Anything goes in the generative world of discovery
If something helps you see something differently then it’s useful. If you stuff a bunch of text into Wordle and see a word really big that catches you by surprise you can go back to the texts with this different way of thinking and see why that would be the case. If you shove a bunch of text through MALLET and see some strange clumps clumping that make you think differently about the sources and go back to work with them, great. You have used the tool to spark a different way of seeing and thinking.

If you aren’t using the results of a digital tool as evidence then anything goes. More specifically, if you aren’t trying to attribute particular inferential value to a particular process that process is simply producing another artifact which you can then go about considering, exploring, probing and analyzing.  I take this to be one of the key values of the idea of “deformance.” The results of a particular computational or statistical tool don’t need to be treated as facts, but instead can be used as part of an ongoing exploration. With this said, the moment you turn from exploration and theorizing to justifying an interpretation the whole game changes.

Justification is About Argument and Evidence
If you want to use something as evidence then it is really important that you can back up the quality of that evidence in supporting the specific claims you want to make. In the case of topic modeling, you need to make judgment calls about how many topics to look for, and you make the call about which texts from which sources go into the mix to generate your topics. If you want to talk about these topics as evidence to support particular inferences then you better be able to justify your reasons for those decisions, or be able to explain what you did with your data to warrant the interpretation you are forwarding. You are going to also need to explain how different decisions for different inputs could have resulted in different results. (I am mostly going off of the discussion in and around Ben Schmidt’s When you have a MALLET, everything looks like a nail.

The net result here, is that if you want to use the results of something like topic modeling as evidence you really need to have a good understanding of exactly what you can and can’t say based on how the tool produced your evidence. Importantly, there are a lot of different roads to go down when you start working with data as evidence, but in any event, you do need to be able to justify your decisions and defend against alternative explanations. Ultimately  this is where validity of inferences lives. Validity is always about the quality of the inferences you draw and your ability to defend against alternative explanations.

It’s the Scientists that Realized they were Humanists
At the heart of this remains some issues around what it means to do the humanities or to do science. (Fred and I got into this a bit in our Hermeneutics of Data essay).  I still hear this persistent fear of people using computational analysis in the humanities bringing about scientism, or positivism. The specter of Cliometrics haunts us. This is completely backwards.

Scientists, at least the sharp ones, have given up on their holy grail. They have given up on the null hypothesis. The sophisticated ones have realized that what they do is really just argument and evidence too. When it comes to justification time, you need to carefully build an argument grounded in evidence and defend it against alternate explanations. If you want a great recent example of this sort of argument and evidence grounded in statistics I would suggest both Nate Silver’s Simple Case for Obama as the Favorite or if you want a natural science example, read about this paper on arctic sea ice. Both are great examples of defending against different interpretations of evidence.

What you can get away with depends on what you are doing

When we separate out the the context of discovery and exploration from the context of justification we end up clarifying the terms of our conversation. There is a huge difference between “here is an interesting way of thinking about this” and “This evidence supports this claim.” Both scientists and humanists make both of these kinds of assertions. In general, I think the fear of the humanities becoming scientific is largely based on an outmoded idea on the part of humanists as to what we have come to understand happens in science. At the end of the day, both are about generating new ideas and then exploring evidence to see to what extent we can justify our interpretations over a range of other potential interpretations.

Do Less More Often: An Approach to Digital Strategy for Cultural Heritage Orgs

Everybody is trying to do too much at once. Find the low hanging fruit and pick it. Get the boxes off the floor. Release early and release often. Put things out there and find out how you should be doing things. I think this idea cuts across all parts of digital cultural heritage work. Everything from, collecting, processing, arranging, preserving, making available, and exhibiting can be re-framed in this mindset. This was the primary sentiment I put forward in my Keynote talk at the Connecticut Digital Initiatives Forum. At some point I might sit down and write this out, but I figured I would share it here.

Also, here are the slides in case you would prefer to see the presentation instead of sitting through my yammering.

I went up to talk viewshare, but was then also delighted/dismayed to be asked to give the Keynote.  I think it went well, and  I was apparently on TV across the great state of Connecticut.

Are Online Communities Places or Artifacts?

I’m sympathetic to two ways of thinking about online communities that are somewhat inconsistent with each other. The web is a stack of communication technologies (both software and hardware) and should be studied in the same way that one would study the pony express, telegraphy, or the book. Yet, the web has communities, things that through ongoing social interaction where people spatialize the communication technology to “lurk” “hang out” and talk about the other kinds of people that do things different over there.  Online communities end up feeling like places and when we interact with people who are similar in some ways and different in others in those places we end up with cultures.

The Myth of Cyberspace and Possibility of Being There

I full well realize that the web isn’t a space. I’m with PJ Ray on the entire Myth of Cyberspace.  It doesn’t have dimensions, it is a stack of technologies (hardware and software). More specifically it is a constellation of technologies assembled in different arrangements by different individuals. However that stack/constellation  clearly creates cultures. Now sure books create cultures, telephones create cultures, and the postal service creates cultures. With that said, those republics of letters, and literary cultures aren’t really the same kinds of culture that one studies in an ethnography. I mean, imagine pen-pal-nography, telegram-nography—they just sound wrong. You can talk about a republic of letters all you like, but the moment you start saying you are doing an ethnography of letters someone is going to tell you you’re doing it wrong. When you study letters you are studying documents. We study documents as a species of artifact. Yes,  we learn about culture through that study (that would be the entire idea of material culture), but we don’t think of reading letters as “participant observation.”

With all this said, I still think the idea of “netnography” totally makes sense in a way that all those other –nographies doesn’t. Something about the medium of the web (I’d hazard its’ immediacy, two-way-nature, the placey-ness of URLs as locations) ends up giving us the things that we need to think about it as a place and gives us the experiences that we need to really make cultures happen. That is, we are thrown into a thing that works like proximity to others in which we interact with them and develop some shared ways of being in the world while retaining a whole host of dissonant and contradictory feelings about things.

Putting the Field in Computer Mediated Field Work

If you are unfamiliar with the idea of netnography I would suggest Kozinets book, Netnography: Doing Ethnographic Research Online. In contrast to the idea of “virtual ethnography” Kozinets is part of a group of researchers who gets behind the idea of “netnography.” (Rightly these folks acknowledge that there is nothing “virtual” about the web, it’s a real thing). The decision to shift to use netnography instead of ethnography comes from a sense that studying online communities is so substantively different from studying them in physical space that it needs a whole different term. That is, you can study how existing communities use the web alongside other modes of interaction, but there are also communities that exist solely as a result of particular web forums, listservs, and such.

In the last few weeks I’ve read and re-read Netnography, switching between modes of enthusiastic underlining (YES! That is it!). For example, when Kozinets talks about “alteration” recognizing that in online communities “the nature of the interaction is altered—both constrained and liberated—by the specific nature and rules of the technological medium in which it is carried.” (68) However, there are other moments in which I scrawl disapproving marginalia. For example, when I see terms like “online-fieldsite” (NO, the web is not a place and we shouldn’t pretend it is!). I think I can get behind “computer mediated fieldwork,” which he uses in other places, but I’m not sure I can go to “fieldsite.”

Can we talk of “Participant Observation” when we aren’t observing people?

I’ve gone back and forth in my head about Kozinets idea that we do “participant observation” when we study interactions in an online community. How can we talk of observing participants when we are actually observing artifacts? He suggests that our actions in online communities, our clicks, our keystrokes, are effectively utterances. Which is true, but at the same time when we study those utterances it isn’t like when we experience someone talking to us, documents are being created and we are reading them. It is effectively the same as reading a letter. Still, I think those specific features of the web mediums end up making this a situation where we can get away with the “participant observation” metaphor. Yes, if a netnographer jumps into an online community and starts to engage in the ebb and flow of exchange they are doing something that may have more in common with direct participation than with the hermeneutic interpretation of documents.

Theorizing and Interpreting Kinds of Online Community Data

Kozinats discusses three types of data. Archival data (data copied from “pre-existing computer-mediated communications of online community members), Elicited data, (data co-created with “culture members through personal and communal interaction”) and Fieldnote data (the researchers  own notes, observations and self reflections). He suggests that his categories are  similar to Wolcott’s notions of qualitative researchers “watching, asking and examining” and Miles and Huberman’s focus on studying “documents, interviews and observations” as kinds of data to interpret.  These are potentially useful comparisons, and as we need to come up with ways to fit new things into old boxes to make sense of them I can get behind the impulses here.

What’s at issue here is how much the experience of participating in an online community is like participating in a communities that occupy physical space. I think this is particularly tricky in that some of the features that make the web a rather unique medium are the things that give online communities their place-like qualities. To attend to the mediality of the web is to recognize it has this set of place-like or place-affording qualities.

“Archival data” Transcript, Recording, or Encoding

Kozinats struggles a bit to explain “archival data,” not that it is data that is being collected and organized by an archive, but in the much more nebulous sense of archival that has come to mean old-stuff-that-is-still-around-for-some-reason.  At one point, he suggests that the wide availability of this archival data in previous discussion on the boards or old email threads from listservs would be equivalent of “every public conversation being recorded and made available as transcripts.” However, importantly, a listserv archive, and old posts to discussion boards are not “recordings” of what transpired, they are what transpired. The creation of the “archive” is to some extent embedded in the act of communicating through these mediums. With that said, if you aren’t experiencing these exchanges as they happen then there are going to be issues that require you to reconstruct context and make sure that what you are looking at is authentically what was created at the time you want to make inferences about. That is, people edit their posts on discussion boards, users delete their accounts and the contextual information about who they were is often erased, site administrators prune away or remove posts over time. Generally, what we colloquially call an archive with these kinds of online communities is really a pile of things that have some connection to the past but haven’t really been worked over or documented. In any event, it is critical to not take for granted that you are looking at accurate recordings of the past, but to think about the provenance and particular constellations of technologies and users that made it possible for you to look at recordings of previous interactions between members of an online community.

So what can we do with these records of discourse? Kozinats suggests that  “Archival cultural data provide what amounts to a cultural baseline. Saved communal interactions provide the netnographer with a convenient bank of observational data that may stretch back for years.” (104) I’m not sure that this works. I don’t think we can talk about this archival data as “observational data.” It is not something you observed it is a set of documentary evidence that you need to establish the provenance and context of and can then engage in interpreting in the way a historian interprets any textual records. When it isn’t currently happening you aren’t observing it. These utterances become documents as they slide out of the present and into the past.

So are Online Communities Places or Objects

I feel like the answer here has to be something like, they are objects (or specifically assemblages of hardware and software technologies and protocols) that produce place-like experiences. So, it makes sense to try and figure out what it is like to be “a redditor” or to study how redditors interact with eachother and the kinds of communities that emerge there. With that said, reddit isn’t a fieldsite. Reddit is software, a database, and a set of bits on a series of servers accessible over HTTP.

All of that stuff, those objects create and log communication in such a way that they take on place-like qualities. People lurk in some sub reddits, they build relationships with the folks they come into contact with, they develop some shared and conflicting ideas about the world. In short, people create cultures through the affordances of the technologies. That cultural component, the way people use these things, gets rolled back into changing the structure and nature of the technologies that afford the place like qualities.

A Note on Determinisms and Co-Construction

Importantly, this does not mean that they “co-construct” each other. Kozinets nods to this in the beginning of the book. The idea that the forces of technological determinism and social construction of technology have come together in a kumbaya moment where technology and culture each construct each other feels too wishy-washy. Objects and artifacts afford and resist, people interact and interpret (often drawing on their own cultural tool kits or their internal representations of generalized others) and the social or the cultural emerges through this network of actors and actants. That’s at least my best stab at this for now. So yes, it’s not an either or, but I think it’s too much of a gloss to say its co-construction

Open questions?

I’d love to hear how other folks parse out these distinctions. What kind of thing is an online community and where are the limits of talking about them as places, as cultures, as technologies and as documents? Do you agree with how I am parsing this out? Or do you think I’m way off base here?

Glitch, Circuit Bending & Breaking as a Way of Knowing

A big idea in the digital humanities is that building is a hermeneutic, an iterative interpretive process that leads toward knowing and understanding. I saw this great video on The Art of Glitch toady that made me think a bit more about how much breaking can is an essential related way of knowing. I realize I’m not necessarily breaking any new ground here, but I think these few examples I’ve pulled together do a nice job at getting at what it is we learn when we break the slick world of computing a bit.

You should watch the whole thing, its’ great (you should also watch their video on Animated Gifs). But the part that I found most compelling was Scott Fitzgerald‘s basic demonstration of how to glitch some files (change a .mp3 to a .raw and open it in photoshop or open a .jpg in a text editor and delete some chunks of it. It’s fun, in that it is something you can follow along at home with, but the act of doing these things actually teaches something about the nature of digital files. He does a good job of explaining this in the following statement.

“Part of the process is empowering people to understand the tools and underlying structures you know what is going on in the computer. As soon as you understand the system enough to know why you’re breaking it then you have a better understanding of what the tool was built for.”

In short, breaking the files exposes their logic. In a way it helps us escape screen essentialism and see a different side of the nature of the files, file formats, compression algorithms, and structure of digital objects. The whole experience reminded me that I never got around to sharing some of the amazingly cool exhibit on circuit bending at Milwaukee’s Discovery Zone.

Breaking and Bending the Hardware

Circuit Bending at Discovery Zone

If you are unfamiliar, here is how Wikipedia describes Circuit Bending.

Circuit bending is the creative customization of the circuits within electronic devices such as low voltage, battery-powered guitar effects, children’s toys and small digital synthesizers to create new musical or visual instruments and sound generators.

Here is a little video I took of messing with the dials on the bent NES.

In this case, messing with the hardware is producing glitches. In this case, the artist (Luke Reddington) bent a series of different devices. He went in and put a bunch of toggles on this NES that lets you flip a bunch of different switches inside the device that no one is supposed to be messing around with.

In my mind, this works just the same as changing the file extensions. When you poke around inside the Nintendo and set a few different switches to toggle things that aren’t supposed to be toggled you can get this. Sure it’s art, there is an aesthetics to the whole thing, but there is also an element of coming to know in here. I think these are all examples of the ways in which breaking is as much a way of knowing as building.

Breaking & Bending as Knowing & Learning about the Machine

In each case, much like what happens when you set an augmented reality app like wordlens to the wrong language and have it try and read things that aren’t text, or when you go on a quest to find oddities in the digitized corpus of google books, circuit bending and glitch art draw out attention away from the way things are intended to be presented, away from being seemless things that obfuscate their nature, and get us to peek behind the curtain of the technologies and see a bit of the logic of computing.

The Key Questions of Cultural Heritage Crowdsourcing Projects

To sum up my series of posts on different considerations for crowdsourcing in cultural heritage projects I thought it would be helpful to lay out a set of questions to ask when developing or evaluating projects. I think if a project has good answers to each of these four genres of questions it is well on its way toward success.

Four Areas of Questioning

Human Computation Key Questions: 

  • How could we use human judgment to augment computer processable information? 
  • What parts of a given task can be handled through computational processing and which cant and of those parts that can’t can we create structured tasks that allow people to do this work?

It would be a waste of the public’s time to invite them in to complete a task that a computer could already complete. The value human computation offers is the question of how the unique capabilities of people can be integrated into systems for the creation of public goods.

Wisdom of Crowds Key Questions:

  • How could we empower and consult with the people who care about this?
  • What models of user moderation and community governance do we need to incorporate?

Unlike human computation, the goal here is not users ability to process information or make judgments but their desire to provide their opinion. Here the key issues involve finding ways to also invite users to help define and develop norms and rules for participation.

Scaffolding Users Key Questions:

  • How can our tools act as scaffolds to help make the most of users efforts?
  • What expertise can we embed inside the design of our tools to magnify our users efforts?
  • How can our tools put a potential user in exactly the right position with the right just in time knowledge to accomplish a given activity?

All of these questions require us to think about amplifying the activity and work of participants through well designed tools. In a sense, these questions are about thinking through the interplay of the first two issues.

Motivating Users Key Questions:

  • Whose sense of purpose does this project connect to? What identities are involved?
  • What kinds of people does this matter to and how can we connect with and invite in the participation of those people?
  • Are we clearly communicating what the sense of purpose is in a way that the users we are trying to work with will understand?

I think it is critical that cultural heritage projects that engage in crowdsourcing do so by connecting to our sense of purpose and I would strongly suggest that projects think about articulating the sense of purpose that a given project connects to when developing user personas and that that sense of purpose should be evident in the way a project is presented and described to the public.

Example Cultural Heritage Crowdsourcing Projects

Along with these questions I figured I would share a list of different kinds of projects I consider to be crowdsourcing projects in the cultural heritage domain. I’ve only included projects that I think are doing some of these things very well and I have also tried to list out a diverse set of different kinds of projects.

Citizen Archivist Dashboard
Where citizen archivists can tag, transcribe, edit articles, upload scans, and participating in contests all related to the records of the US National Archives.

User’s correct ocr’ed newspaper, upload images,  tagged items, post comments and add lists.

The GLAM-WIKI project supports GLAMs and other institutions who want to work with Wikimedia to produce open-access, freely-reusable content for the public.

Old Weather
Old Weather invites you to help reconstruct the climate by transcribing old weather records from ships logs.

Galaxy Zoo
Interactive project that allows the user to participate in a large-scale project of research: classifying millions of images of galaxies found in the Sloan Digital Sky.

UK Sound Map
The UK Soundmap, invited people to record the sounds of their environment, be it at home, work or play.

What’s on the menu
Help The New York Public Library improve a unique collection “We’re transcribing our historical restaurant menus, dish by dish, so that they can be searched by what people were eating back in the day. It’s a big job so we need your help!”

A place where you can help museums describe their collections by applying keywords, or tags, to objects.

Further Reading & Viewing

My thinking on these issues has been shaped by a range of different talks, presentations and papers. The list below is more of a greatest hits than a comprehensive bibliography.

Ahn, L. von. (2006). Human Computation. Google TechTalks.

Brumfield, B. W. (2012, March 17). Collaborative Manuscript Transcription: Crowdsourcing at IMLS WebWise 2012. Collaborative Manuscript Transcription. Retrieved April 25, 2012, from

Clark, A. (2008). Supersizing the Mind: Embodiment, Action, and Cognitive Extension. Oxford University Press, USA.

Crowdsourcing Cultural Heritage: The Objectives Are Upside Down

deterding, sebastian. (2011, February 19). Meaningful Play: Getting Gamification Right.

Ford, P. (2011, January 6). The Web Is a Customer Service Medium (

Gee, J. P. (2000). Identity as an analytic lens for research in education. Review of research in education, 25(1), 99.

Gee, James Paul. (2003). What Video Games Have to Teach Us About Learning and Literacy (New Ed.). Palgrave Macmillan.

Holley, R. (2010). Crowdsourcing: How and Why Should Libraries Do It? D-Lib Magazine, 16(3/4). doi:10.1045/march2010-holley

Hutchins, E. (1995). How a Cockpit Remembers Its Speed. Cognitive Science, 19, 288, 265.

Juul, J. (2011, April 2). Gamification Backlash Roundup. The Ludologist.

Karen Smith-Yoshimura. (2012). Social Metadata for Libraries, Archives, and Museums: Executive Summary. Dublin, Ohio:: OCLC Research. Retrieved from

Oomen, J., & Aroyo, L. (2011). Crowdsourcing in the cultural heritage domain: Opportunities and challenges. Proceedings of the 5th International Conference on Communities and Technologies (pp. 138–149).

The Crowd and The Library

Libraries, archives and museums have a long history of participation and engagement with members of the public. In a series of blog posts I am going to work to connects these traditions with current discussions of crowdsourcing. Crowdsourcing is a bit of a vague term, one that comes with potentially exploitative ideas related to uncompensated or undercompensated labor. In this series of I’ll try to put together a set set of related concepts; human computation, the wisdom of crowds, thinking of tools and software as scaffolding, and understanding and respecting end users motivation, that can both help clarify what crowdsourcing can do for cultural heritage organizations while also clarifying a clearly ethical approach to inviting the public to help in the collection, description, presentation, and use of the cultural record.

This series of posts started out as a talk I gave at the International Internet Preservation Consortium’s meeting earlier this month. I am sharing these ideas here with the hopes that I can getting some feedback on this line of thinking.

The Two Problems with Crowdsourcing: Crowd and Sourcing

There are two primary problems with bringing the idea of crowdsourcing into cultural heritage organizations. Both the idea of the crowd and the notion of sourcing are terrible terms for folks working as stewards for our cultural heritage. Many of the projects that end up falling under the heading of crowdsourcing  in libraries, archives and museums have not involved large and massive crowds and they have very little to do with outsourcing labor.

Most successful crowdsourcing projects are not about large anonymous masses of people. They are not about crowds. They are about inviting participation from interested and engaged members of the public. These projects can continue a long standing tradition of volunteerism and involvement of citizens in the creation and continued development of public goods.

For example, the New York Public Library’s menu transcription project, What’s on the Menu?, invites members of the public to help transcribe the names and costs of menu items from digitized copies of menus from New York restaurants. Anyone who wants to can visit the project website and start transcribing the menus. However, in practice it is a dedicated community of foodies, New York history buffs, chefs, and otherwise self-motivated individuals who are excited about offering their time and energy to help contribute, as volunteers, to improving the public library’s resource for others to use.

Not Crowds but Engaged Enthusiast Volunteers

Far from a break with the past, this is a clear continuation of a longstanding tradition of inviting members of the public in to help refine, enhance, and support resources like this collection. In the case of the menus, years ago, it was actually volunteers who sat at a desk in the reading room to catalog the original collection. In short, crowdsourcing the transcription of the menus project is not about crowds at all, it is about using digital tools to invite members of the public to volunteer in much the same way members of the public have volunteered to help organize and add value to the collection in the past.

Not Sourcing Labor but an Invitation to Meaningful Work

The problem with the term sourcing is its association with labor. Wikipedia’s definition of crowdsourcing helps further clarify this relationship, “Crowdsourcing is a process that involves outsourcing tasks to a distributed group of people.” The keyword in that definition is outsourcing. Crowdsourcing is a concept that was invented and defined in the business world and it is important that we recast it and think through what changes when we bring it into cultural heritage. Cultural heritage institutions do not care about profit or revenue, they care about making the best use of their limited resources to act as stewards  and storehouses of culture.

At this point, we need to think for a moment about what we mean by terms like work and labor. While it might be ok for commercial entities to coax or trick individuals to provide free labor the ethical implications of such trickery should give pause to cultural heritage organizations. It is critical to pause here and unpack some of the different meanings we ascribe to the terms work. When we use the term “a day’s work” we are directly referring to labor, to the kinds of work that one engages in as a financial transaction for pay. In contrast, when we use the term work to refer to someone’s “life’s work” we are referring to something that is significantly different. The former is about acquiring the resources one needs to survive. The latter is about the activities that we engage in that give our lives meaning. In cultural heritage we have clear values and missions and we are in an opportune position to invite the public to participate. However, when we do so we should not treat them as a crowd, and we should not attempt to source labor from them. When we invite the public we should do so under a different set of terms. A set of terms that is focused on providing meaningful ways for the public to interact with, explore, understand the past.

Citizen Scientists, Archivists and the Meaning of Amateur

Some of the projects that fit under the heading of crowdsourcing have chosen very different kinds of terms to describe themselves. For example,  Galaxy Zoo project, which invites anyone interested in Astronomy to help catalog a million images of stellar objects, refers to its users as citizen scientists. Similarly, the United States National Archives and Records Administration recently launched crowdsourcing project, the Citizen Archivists Dashboard, invites citizens, not members of some anonymous crowd, to participate. The names of these projects highlight the extent to which they invite participation from members of the public who identify with and the characteristics and ways of thinking of particular professional occupations. While these citizen archivists and scientists are not professional, in the sense that they are unpaid, they connect with something a bit different than volunteerism. They are amateurs in the best possible sense of the term.

Amateurs have a long and vibrant history as contributors to the public good. Coming to English from French, the term Amateur, means a “lover of.” The primarily negative connotations we place on the term are a relatively recent development. In other eras, the term Amateur simply meant that someone was not a professional, that is, they were not paid for these particular labors of love. Charles Darwin, Gregor Mendal, and many others who made significant contributions to the sciences did so as Amateurs. As a continuation of this line of thinking, the various Zooniverse projects see the amateurs who participate as peers, in many cases listing them as co-authors of academic papers published as a result of their work. I suggest that we think of crowdsourcing not as extracting labor from a crowd, but of a way for us to invite the participation of amateurs (in the non-derogatory sense of the word) in the creation, development and further refinement of public goods.

Toward a better, more nuanced, notion of Crowdsourcing

With all this said, fighting against a word is rarely a successful project, from here out I will continue to use and refine a definition for crowdsourcing that I think works for the cultural heritage sector. In the remainder of this series of posts I will explain what I think of as the four key components of this ethical crowdsourcing, this crowdsourcing that invites members of the public to participate as amateurs in the production, development and refinement of public goods. For me these fall into the following four considerations, each of which suggests a series of questions to ask of any cultural heritage crowdsourcing project. The four concepts are;

  1. Thinking in terms of Human Computation
  2. Understanding that the Wisdom of Crowds is Why Wasn’t I Consulted
  3. Thinking of Tools and Software as Scaffolding
  4. A Holistic Understanding of Human Motivation

Together, I believe these four concepts provide us with the descriptive language to understand what it is about the web that makes crowdsourcing such a powerful tool. Not only for improving and enhancing data related to cultural heritage collections, but also as a way for deep engagement with the public.

In the next three posts I will talk through and define these four concepts offer up a series of questions to ask and consider in imagining, designing and implementing crowdsourcing projects at cultural heritage institutions.

The New Aesthetic and the Artifactual Digital Object

I’ve had a lot of fun following The New Aesthetic and I think there are some neat parts of this that relate to digital preservation.  If you haven’t seen much on this neologism, the post colon part of the recent SXSW talk is a bit more explanatory, The New Aesthetic: Seeing Like Digital Devices. For further reading I would suggest this, this, or this. In my work on digital preservation I tend to spend a good bit of time thinking about digital objects, and there are a few points of connection between that I wanted to spend a moment teasing out.

The part of the New Aesthetic that I am excited about is the recognition that the digital is fundamentally artifactual, not simply informational and that the formal and forensic materiality of digital and material objects leaves traces that offer potential for aesthetic, interpretive, and potential evidentiary provocation. The characteristics of mediums, of processes, of interfaces are all offering this potential.

Digitization is an Act of Artifact Creation Not of Information Translation

People (quite justifiably) love to get in a huff about poor scans in google books.  We are losing considerable informational qualities of the books in the poor scans, or poorly processed scans. With that said, the information translation part of the project is only part of what is happening. The Art of Google Books does a great job at getting us to think about the artifactual qualities of the newly created digital objects and the process through which they were created. I find it particularly amusing that we use the term “artifact” in computing, in the sense of compression artifact to generally signal a failure to represent the thing. The New Aesthetic can be a way to think about those artifacts not as defects, but as an aesthetic in and of themselves.

Consider, “Image mistaken as the finger of an employee, with attempted autocorrect.” The piexlated section of the image that is removed tells us a bit about how the algorithm sees the image. We can try and fill in the gaps with our mind and think about what it was about this particular illustration that made Google think that he was a finger that should be removed.

Similarly, a “Black-and-white frontispiece photographed in color and through tissue” creates fundementaly different ways of seeing the black and white image. The accidentally colored in image looks great, and in the context of a black and white book is completely unexpected. The ghostly image through the tissue paper almost looks like a kind of static problem in the scanning process, but in context we know that we are actually seeing an attempt to scan through tissue paper. What is one supposed to do to digitize tissue paper? In both cases, we are reminded that digitization is not simply copying information, that through digitization we can see the pages of the book through different physical and computational processes.


Reading the Products of Reading Machines

Similarly, as our devices further bring our ability to read the world, to layer information on top of the world, the products of that layering can also be captured and reflected on.

So when I used Word Lens to attempt to translate things it shouldn’t translate  I ended up learning about both how WordLens sees and found serendipitous reinterpretations of texts and environments. So Reading Machines: Toward an Algorithmic Criticism from translated from Spanish (a language it is not in) became Reading Machetes. Personally, I think reading machetes is a perfectly serviceable alternate title for the book.  The image capture of that moment sort of captures part of the meaning of the book. We are now reading a text with a machine that is about reading and deforming texts with machines.

What is particularly fun about Word Lens is that it can even read non-text. For example, here is Word Lens attempting to read the mirrors above the mantel in my living room. What exactly is it that describes the grape? I’m not sure, but what I do know is that when you flip Word Lens on, turn it to the “wrong” setting and walk around the world you start to see how it sees. You start to see that you can get rows of square things to be read as text, and you start to be able to guess how it might read.

The Authenticity of Performing Music on the Gameboy

There is also something here about the recursive loop wherein particular computing devices, like game boys, become imbued with an authenticity. A really strange kind of authenticity. For example, when people compose chiptunes on Little Sound DJ on gameboys they are using the actual physical device to create the sound, but at the same time using a program that is not something from the era of the gameboy. While you can emulate the device, we end up wanting to see the video of the performance on the gameboy to really know that it was actually played on the gameboy.

I’m just excited to see these kinds of things being herded together, and optimistic that it is part of a broader move toward thinking about and playing with the thingy-ness of things.

Explore and Share Cultural Heritage Collections with Notes for WebWise Talk

This is just a quick post to share the slides and links from the talk I am giving at WebWise today.

The talk starts by explaining the idea behind the tool. Specifically, how making it easy to make interfaces to cultural heritage collections can help librarians, archivists, curators, and historians both better understand relationships between objects in a cultural heritage collection and how the tool can help them communicate those ideas to audiences. After explaining the kinds of interfaces you can make, I walk through a detailed example of what one of these views can do by looking at a prototype interface created by an Archivist at the National Gallery of Art to the Samuel H. Kress Collection History Database.

I wanted to make sure that everyone had links to all the views I mention. So here are all the links.

NDIIPP Partners Collections Interface:(On Viewshare) (Embeded on NDIIPP’s site): This is an interface to a collection of collections. It acts as a kind of directory for digital collections and it was created from a spreadsheet.

Fulton Street Trade Card View: (On Viewshare)
The Fulton Street Trade Card collection features 245 late 19th and early 20th century illustrated trade cards from merchant’s along the Fulton Street retail thoroughfare in Brooklyn, NY. Using a Viewshare pie chart view, the user is able to run queries and faceted search on the cards’ metadata in ways a simple catalog or scroll would not allow. Using the facets you can limit the chart to a certain element, such as business type, and then get numbers and percentages about the subjects, format, or other elements of the cards’ content.

History of Fairfax County in Postcards: (On Viewshare): A very simple view from a simple spreadsheet. If you like, you can find the spreadsheet this is based in the Viewshare documentation and work from it to get a sense of how the tool works.

Cason Monk-Metclaf Funeral Directors View: (On Viewshare): (My View on Viewshare): (Embeded on East Texas Digital Archives & Collections Site) This is one of the most interesting datasets uploaded to Viewshare. It is a set of data transcribed from historic funeral records.

Samuel H. Kress Collection History Database Prototype View: National Gallery of Art (On Viewshare) This view allows users to explore the relationships between purchase information for a work of art and other aspects of the object, including its current location. This data comes from the Samuel H. Kress Collection History and Conservation Database. The relational database documents the art collection’s acquisition, dispersal, and conservation over time and was created by the National Gallery of Art’s Gallery Archives with funding from the Samuel H. Kress Foundation. The data shared here is not complete. Viewshare data and views are intended only for preliminary demonstration of the data and should not be cited in research.

The Value of Design Narratives: The Case of Environmental Detectives

In Please Write it Down: Design and Research in the Digital Humanities I suggested that there are some valuable ways of thinking about the connections between building/designing and creating knowledge and scholarship.  In particular, I suggested that those interested in learning through building in the digital humanities might find some value in work in educational research over the last decade which has tried to define what exactly what a design based research methodology might look like.

This is the first post, in what I imagine might be an ongoing line of thought here, to try to put ideas from design based research in conversation with the digital humanities. As a point of entry, I am going to walk through one emerging genre of writing in design based research, the design narrative. Before getting there, however, I would briefly pause to note that the journal this piece appeared in, Educational Technology Research and Development, is itself an interesting note to the digital humanities. I for one, would love to see a journal in the digital humanities similarly situated as a place for sharing and disseminating R&D knowledge.

The Case of Environmental Detectives

In Environmental Detectives: The Development of an Augmented Reality Platform for Environmental Simulations Eric Klopfer and Kurt Squire offer a summative and reflective report on their work developing the augmented reality game Environmental Detectives. The paper makes some valuable suggestions for how we might better design augmented reality games, but I think its primary strength is as an example of a particularly novel and useful genre of design based research report. 

Brenda Bannan-Ritland’s article, The role of design in research: The integrative learning design framework offers a robust framework for thinking through how the design process and the research process can fit together. See her diagram below  (don’t get lost in the details). The intellectual work that diagram and her approach offers os to illustrate what happens if you mush together the steps in an array of design processes and research approaches. The diagram illustrates how the features of product development, research design, and user centered design can leaf together. 

If you look a the top part of the diagram carefully you will notice that practically every step in this process has an arrow that points over to the publish results box. This is a key concept here, the idea behind design based research is not that the design process is itself a research method, but that throughout the design process there are a series of publishable results and lessons learned that emerge which warrant being refined, shared and communicated. Squire and Klopfer’s article is a great example of the kind of piece one would want to write as a summative result of an extended design research process.

Design Narrative as a Genre of Design Based Research Article

Design based research can generate publishable results in any particular research tradition. You can find interviews, ethnographic approaches, micro ethnographic approaches,  case studies, randomized clinical trials, and methods from usability studies like eye tracking used at different points in the design and development process. In short, there are any number of ways to use existing research methods approaches to reflect on and report out results of research in the process of informing design. Part of what is particularly interesting about Klopfer and Squire’s paper is that it represents a somewhat novel mode of research writing, the design narrative.

Drawing from Hoadley’s 2002 piece, Creating context: Design-based research in creating and understanding CSCL, Klopfer and Squire offer a reflective narrative account of their work designing, developing and researching the Environmental Detectives game. Unlike other papers they published, which might report parts of this research in terms of a case study, or the pre-post test scores or the results of a particular evaluative test of the game’s outcomes, this summitive piece serves to reflect on the design process and offer an account of the context and lessons learned in the course of the design process. It is worth reporting on actual structure of the piece.

Review of literature that informed the design: After explaining background on the idea of design narrative Klopfer and Squire offer an account of both the extent literature on augmented reality games and a review of the existing games projects that they looked to which informed their design. This serves to provide the conceptual context that they began from, it sets the reader up to understand exactly where the project started from while also providing information on what theory and knowledge at the time of the projects start looked like.

Retrospective and Reflective Design NarrativeThe bulk of the paper then reports out on each phase of their design process. In their particular case they describe six phases of their research, brainstorming, designing the first instantiation, developing a first generation prototype, classroom field trials, classroom implementations, expanding to new contexts, and a sixth phase in which they added customized dynamic events to the game. It is not necessary to go into the details of each section for this review. What matters is to stress that each section begins by explaining how they went about their work in the given phase and reports a bit on what they learned in that phase. What is essential in this approach is that each section explains what worked and didn’t work in any given phase and how exactly Klopfer decided to remedy their approach and design to respond to problems.

As is generally the case with qualitative research, the moments when things don’t go according to plan and exactly how we make sense and work through those moments are generally the most valuable parts of the process. The value in this kind of retrospective account is two-fold. It provides a context for understanding why the game they made does what it does, but more importantly, the design narrative’s primary value is as a guide to other designers on what parts of the design process were particularly valuable. This kind of narrative helps us to refine our ideas not only about this particular design situation, but more broadly about how we can refine our own design practices.

Conclusions and Implications from Reflection: After reporting the design narrative the paper presents a set of technological and pedagogical implications. In much the way that the discussion section and conclusion sections of research reports function, this section attempts to suss out and distill the lessons learned from the work. In their case, they present a range of specific implications for the design of augmented reality games that emerged from their design approach.

The Value of Design Narratives

If you read through their references, you can see that they have published about this work on a few previous occasions. It is not that they are double dipping on publications, instead those other publications report results from subsets of this project, some of the earlier findings, or any of the points in the design process that resulted in interesting findings. This paper is really a summitive report, retracing the design narrative of the entire project.
I see the value of this particular design narrative approach as having two primary values, two values that I think are particularly useful to the still emerging world of the digital humanities. Composing these narratives serves an internal value to designers as part of reflective practice. Sharing these narratives makes the kinds essential tacit knowledge that comes about as part of doing design accessible to others.

Reflective Practice is Best Practice: If you can hold yourself to some sound practices for documenting the stages in your design process (the ideas that you had, how you went about implementing and revising them, and the results), you are in a good position to use that documentation to reflect on your practice. In this sense, the design narrative, the retrospective account of what you did, why you did it, what you learned  is an essential piece of doing reflective design practice. When you go back and think through your own process you are not simply reporting on what you learned you are actually making sense out of your trajectory and coming to understand what it is that you actually learned. Like much of qualitative and hermeneutic research, the process of writing is not a process of transmission of knowledge but of the discovery of knowledge. Writing a design narrative is the process by which we come to know and learn from our work.

Making Tacit Practical Design Knowledge Explicit and Available: It is essential that the knowledge developed in the design process is documented and shared. While the individual studies that come out of a design research process provide evidence of the value, or of particular lessons learned in part of a design project, they leave a considerable amount of the bigger picture knowledge off the table. Quite frankly, much of the most essential parts of design are not about explaining that something works, if someone wants to get into design they need access to the deeply pragmatic, heuristic driven, knowledge that develops on over time in the process of design. The design narrative is an essential medium for capturing and disseminating this kind of tacit knowledge.

In short, I would suggest that this particular piece of scholarship serves as a great example of the value of reporting design narratives and an exemplar for others to use as a model for composing their own design narratives.