Macroscopes & Distant Reading: Implications for Infrastructures to Support Computational Humanities Scholarship

The following is the rough transcripts of a talk I gave at Fostering the Transatlantic Dialogue on Digital Heritage and EU Research Infrastructures: Initiatives and Solutions in the USA and in Italy at the Library of Congress in November 2014 (back when I worked there).

As scholars become increasingly interested in approaching digital collections and digital objects as data for computational analysis it becomes critical for libraries, archives and museums to rethink some of their paradigms for providing access to materials. Two related concepts in emergent methodologies in the digital humanities, macroscopes and the notion of distant reading, provide a point of entry for identifying the requirements for digital library platforms to support this kind of scholarship.

Josh Greenberg of the Sloan foundation described the concept of macroscopes thusly, where “Telescopes let you see far, microscopes let you see small, a macroscope lets you see big and complex.” That is, it’s about zooming out to visualize and explore relationships and patterns in aggregates and networks. Related, literary scholar Franco Morritti has famously coined the term “distant reading” to describe similar kinds of activities. In contrast to close reading, distant reading involves studying trends and patterns in things like graphs, maps and tree diagrams of features of texts. These two neologisms are part of a common trend, a push by scholars to make use of tools to explore and interpret patterns in wholes.

Parts and Wholes: Objects, Items, Aggregates, Collections
Screen Shot 2015-06-02 at 2.06.29 PM
By and large, the web has been great for the item and the object in cultural heritage organizations. In hypermedia, every resource is the first resource; every item’s URL is potentially the front door to everything else. As far as Google’s search algorithms are concerned, the page for each of the individual thousand items in a collection is as important as the page about the collection they form part of. This non-hierarchical and rhizomic nature to the web, and much of digital media more broadly, has been a bit disconcerting to librarians an archivists long committed to the coherence of collections and the importance of the context of fonds.

To this end, the move to interest in macroscopes and distant reading provides a potential shift in approach to interpretation and analysis that could potentially better respect the value that comes from aggregates. That is, the parts in the whole of a particular archive or collection and their relationship to each other. Importantly, this makes it all the more critical that the structure and completeness of any given archive or collection is front and center for analysis. That is, the pattern in any distant reading of an archive is as much a map of relationships in the content as it is a map of the processes by which records were created, appraised, selected, and organized.

Three Examples for Going Forward

Screen Shot 2015-06-02 at 1.59.36 PMData Dumps: In the emerging literature on historians use of digital collections for data analysis a common theme is to try, as quickly as possible, to download data to take it away to use it in their own tools on their own systems. Ian Milligan, who works with web archives, has refered to this as “Looking for the big red button.” To this end, whenever possible, the best first step for systems to support this kind of scholarly use is to provide easy ways for someone to export aggregate data. With this noted, with particularly large sets of data or data which is limited to various kinds of use, it’s likely a good idea to provide smaller sample sets of data.

With this said, it is important to note that data dumps are not the bulk access silver bullet that one might hope for three reasons; rights, scale and the skills necessary to make use of them. In terms of rights, many collections, particularly of modern materials, come with rights restrictions that make it impossible to provide direct downloads of full content. In terms of scale, while it is possible to allow someone to download increasingly large scale sets of data it is still the case that there are aggregates of data that require significant resources to provide access to. Importantly, in many humanities cases this kind of analysis is still possible with scales that are modest in comparison to the requirements that scientists have for working with data sets. Lastly, there is a significant skills gap around the use of working with “raw” data. That is, of the possible field of users of a data set in the humanities there is a rather small community of them who have the necessary chops to work at the command line to iron out issues and process collection and object data into processable and computable information. With that said, there are a range of projects and initiatives ongoing focused on bootstrapping humanities scholars into the required competencies to do this kind of work.  To this end, there are two other primary methods for working around these three limitations that I think are promising in a variety of ways.

Screen Shot 2015-06-02 at 2.06.10 PMSandboxes & Multi-Purpose and Purpose Built Platforms: A tool like the Bookworm, the software that powers the Google Books N-Gram viewer, illustrates the potential for two related approaches to enable scholars with limited command line chops to engage in analysis of  or the similar. Set up against the derived set of n-grams, a derivative data product created from the google books corpus which notes the frequency of sequences of words in the corpus of google books, the viewer lets a user search for terms and compare their relative frequency in a corpus over time. In this case, the production of a derivative data set, the n-grams, they have side stepped the rights issues that would have occurred if they had provided raw full text access to the underlying works. To this end, the n-grams can themselves be downloaded and used with other tools. Along with that, the Bookworm platform provides a way for scholars who do not have any command line expertise to make use of the data.

There are a range of tools and platforms that I would put in this category, for example this is the kind of thing that the Hathi Trust Research Center is working to support.  With this noted, it is important to recognize the limitations of these kinds of purpose built tools. In cases where one does not provide the data product underlying the tool there are clear limits to what scholars can do with the underlying data. Furthermore, the reason that google n-gram works is that considerable work was put into the preparation of the underlying dataset. In contrast, many digital collections are a bit of a mess, so it is likely that for a researcher to do sophisticated computational work with them there would be a need for them to engage in this kind of data cleanup and processing to get materials in a form fit for analysis.

Screen Shot 2015-06-02 at 2.06.18 PMAnalysis as a Service and Onsite Research Facilities: Something like the National Software Reference Library, a project of the US National Institutes of Standards and Technology, models a third example of supporting this kind of computational work. The NSRL provides an onsite research environment where researchers can come in to engage in computational analysis of the tens of millions of files from commercial software in the collection. Staff in this research environment can also run algorithms created by researchers remotely and provide them with the outputs and results. In this case, with a collection of materials at an organization with particularly high concerns about limiting access to the corpus creating an onsite research space and setting up staff to run the jobs that researcher around the world create provides a solution that ensures that rights are protected while computational scholarship is enabled. In this case, the significant limitations is the resources required to stand up and staff such a research center and the fact that the process is much less immediate than the more direct ability to either manipulate some platform or interface on the web or to directly download data.

Unpacking Implications

  1. Whenever possible, move toward providing bulk access to data. That means, ideally, exploring ways to offer downloads of arbitrary aggregates of both metadata and digital objects. Given that some of these aggregates could be massive in size, it is likely best to explore ways to queue large requests up and use things like bit torrent as a way to limit the resources they would consume. Provide persistent identifiers for those aggregates to enable dataset citation.
  2. Consider deriving intermediary or transformative data products, like n-grams, in cases where one cannot provide access directly to works and explore ways to create purpose built tools, like the google n-gram viewer, that can be deployed to enable exploratory analysis of intermediary products.
  3. In cases with particularly thorny rights situations, consider establishing in house services whereby researchers can give you their algorithms and you run them against a corpora and provide the outputs back to them.

Becoming Digital Public Historians

The second week of my Digital Public History seminar at the University of Maryland is called “Becoming Digital Public Historians.” I think that kind of identity work is at the core of what graduate education is supposed to be about and I feel like the eight students in my seminar have made great strides to being able to take on an inhabit that way of being and seeing in the world.

To that end, I thought I would take a few minutes to share and celebrate what they were able to accomplish over the course of a semester as they synthesized a massive amount of reading and exploration of digital projects into the development of their own digital projects.

Given the constraints of creating something from scratch in a single semester while working on a range of other courses and often also working either part or full time jobs I am really amazed at what each of my students was able to do in the course of a semester.

Life on the lineLife on the Line: A Historypin Tour of Little Rock’s West Ninth Street.Created by Julie McVey.

Julie created a historical tour and online collection that documents “the line” a black business district in Little Rock Arkansas. In Julie’s original proposal she had already identified partners, the Mosaic Templars Cultural Center, a museum of African American history, and the Department of Arkansas Heritage. She had also identified  Historypin as a platform that would suit her needs to create a place based way of engaging with the past to “ideally increase awareness of the history of West Ninth Street by making photographs, the historical narrative of the area, and community memories available to a wider audience by utilizing a free web site and mobile app.” I was impressed from the beginning by her intention to develop a digital history project intended to “help the community see the past more clearly to re-envision the possibilities of the future.”

Screen Shot 2015-05-01 at 10.38.59 AMThe Archives & Research Center of the Historical Society of Frederick County Blog & Catalog by  Marian Currens

Marian, who works as the Archives and Research Center Coordinator for the Fredrick County Historical Society, began by planning to create an online Omeka collection of a particular set of items. In the process, given that the historical society had developed a set of finding aids for collections that weren’t particularly easy to find and that she had been following along with a range of work on “catabloging” she decided to create a site to publish the finding aids and to start up a blog for the historical society. Hosted on wordpress, she ended up creating an easy-to-use site that can serve as a web presence for collections at the Historical Society and for sharing “blog posts about our collections, new accessions, and projects going on in the Research Center.”

Screen Shot 2015-05-01 at 10.51.49 AMStereomap: Mapping Stereograms from NYC by Joe Carrano

Joe created stereomap, a site which presents animated stereographs in their historical and geographical context through maps. His idea was that, since “many stereographs document specific places that are best experienced in their geographic location and in juxtaposition to what is built there today.” The site looks great, and it nicely builds on the work of the stereogramanator and the sterogram collections from New York Public Library.

Screen Shot 2015-05-01 at 4.03.40 PMEnding the American Civil War in 1865: A Podcast by Andrew Neal Barker

In an attempt to create an engaging digital public history resource, Andrew created a podcast and website which tracked through events as they unfolded over the month of April at the end of the civil war. The project is marked by some very solid production values. Andrew shared with the class how he created the recordings in Audacity and how he used Soundcloud to host them. A considerable part of the work around this project focused on getting the word out. He created a mailing list, and did a lot of work to push each new episode out through a range of social media channels.

Screen Shot 2015-05-01 at 11.41.57 AMVicky Rex: Or, Queen Victoria’s Vlog by Catherine Bloom

Catherine created Vicky Rex, a vlog series grounded in the diaries and letters of Queen Victoria. Inspired by some of the points Michael Edson makes about the Vlog Brothers and other historical reenactment style work from previous semesters.

I think the series is a rather amazing project. It is neat to see creative ways to bring to bring a historical stories to life and attempt to make them relatable.

Screen Shot 2015-05-01 at 11.55.02 AMThe Middle Sex County Oral History Project by Jamie Mears

Jamie conducted a series of oral histories, transcribed them and published an Omeka Collection to publish the interviews. Along with that, she developed guidance for other’s from the county to conduct their own interviews to submit and some guidance for educators on how to use the histories.

Altogether it’s a great looking site. It makes good use of the features of Omeka and given how much work conducting and transcribing oral histories is it is really a feat to pull something like this off.

Screen Shot 2015-05-01 at 2.20.47 PMThe Archive of Immigrant Voices by Caitlin Haynes

Caitlin created an Omeka online collection for the Archive of Immigrant Voices, a project of the Center for the History of the New America established to collect stories of the experience of migration. The site includes an initial set of oral histories which she described and provided transcripts for as well as a set of information on the process of collecting oral histories and a series of ideas for how k-12 educators can make use of the collection. It’s a great looking online collection and a nice start to a project which will likely be sustained by the Center for the History of the New America beyond the life of the course.

Screen Shot 2015-05-01 at 3.53.48 PMSubmarine Capitol of the World by Stephanie Harry

Inspired by some of the functionality of sites like Phillaplace, Stephanie decided to prototype a version of a site with an interactive map that would explore the history of Groten Connecticut, the “submarine capitol of the world.” One of per particular interests was to trace out a history that was very much tied to the thames river and explore how it played out in relation to the surrounding communities.

Going Forward: 

I’m thrilled to have had the opportunity to teach digital history again. It took me a bit of time to shift gears from teaching a digital history course to public history students to teaching a digital public history course to iSchool students. With that said, the experience made me realize how relevant I think digital public history is to the future of libraries and archives. 

It was a delight to have a course of 8 graduate students who to think through all of this together. I can’t wait to see what all of them end up doing with their new found digital public history chops.









The Invention & Dissemination of the Transparent GIF: Traces in Web Archives

Tiny transparent image files have played a significant role in the history of the Web. Digital folklorist, Olia Lialina has done some great work exploring the presence of spacer GIFs in the Geocities web archive and on how those GIFs persisted in some cases beyond the deletion of geocities. These invisible files have a story to tell, and I think exploring there presence and traces in web archives can end up illustrating some ideas for modes of researching in web archives. 

Exhibiting the Invisible 

Here is a picture of an exhibition Olia mounted of various historically significant transparent GIFs.

Screen Shot 2015-04-03 at 12.55.49 PM

They are all hot linked from their original URLs. As a result, the broken image symbol is the thing that alerts us to the presence of the one’s that are still alive and out there.

Here is how Olia explained their role in an interview:

I remember, everybody who made pages in the 90s had cgif, maybe it was called clear gif, some people would call it zero-dot-gif, but it was this transparent one that would help you to make layouts, and now we can say that this, we found, we can maybe try to build now something out of this invisible gif, just implement it in our work, whatever it is, and make it in our own pages, this to prolong the life of Geocities.

In an effort to get outside screen essentialism I’ve been a bit smitten with the idea of looking at things like cryptographic hashes to show how two things that look the same are, at a lower level, not the same. So I did a bit of experimenting with

Was there an Original Transparent GIF?

So I generated SHA-1 hashes for all of the .gifs that Olia shared in her online exhibition to see the extent to which some of them are actually the same original file with different names. (It is also possible that they were generated through exactly the same process, but I imagine that is unlikely.) I’ve got a picture of it below, but here is a link to the spreadsheet.

Screen Shot 2015-04-03 at 3.39.44 PM

I’ve colored in the hashes that are the same as each other. The result is to see that there aren’t really 10 different files here, instead three of these GIFs are identical. Interestingly we can also see which of these has been present and associated with the provided URLs the longest in the Internet Archive. So, the two geocities files there predate their crawl of google’s first clear.gif.

What was the break out transparent GIF?

Andy Jackson from the British Library was generous enough to take a look for these SHA values in the UK Web archive. He published the data and the scripts to visualize it online. One of them, cleardot.gif, appears in the UK web archive over a million times!

It’s likely that we are seeing a lot of the underlying crawl dynamics here as much as we are seeing trends in the history of the GIFs. With that said, we can see the scale at which these GIFS appear around the UK web.

From the data we learn of three extant examples of GIFs in the archive dating from 1996. These include 2 instances of Blank.gif, 3 instances of pixel.gif and 46 instances of spaceball.gif. So, there is at least in terms of what the UK Web archive collected, spaceball.gif was the early break out hit.

Transparent GIF trends

The trends over time (pictured below) are interesting in their own right. I’m curious what folks think we learn from this?

Look at the scale on cleardot.gif. In 2008 more than a million instances of that GIF appear in the UK Web archive. At the same time, why do most of the other transparent GIFs all but disappear in 2008?What do we make of the resurgence of blank.gif in 2010? What is up with the massive spike of pixel.gif in 2004?

Interestingly, each one of these exists in the UK webarchive by 1997. Which makes sense based on where they come from in Olia’s research. Which opens the question of when and how they made their ways across the web. They were all there, for the most part at the beginning, so what circumstances led one to appear so often and the others not? Why do they all but disappear from the archive in 1999 and 2000?

If we were to zero in on that early year we could well pinpoint the URL that each of these images first appeared at in the archive and they day they first appeared.


The Shape of the Trend, The Shape of history and the Scope of the Crawl

What is it that we are actually mapping out here? That is, where are we seeing the history and spread of these files and where are we seeing the history of decisions about what is crawled and how those crawls are scoped? To that end, the note Andy sent me after doing the check is important context. He suspects that some of the dramatic drop offs in the charts may be the result of scoping decisions to exclude them from the crawls at different points in time.

It is entirely possible that the lack of any of these for the two-year period (1999 and 2000) reflect decisions to exclude these files from crawls. Along with that, Andy brought up another interesting point, that likely the reason many of these GIFS show up so often is that aside from being used for spacing and layout they were also used to track hits as “web bugs“. Which opens an interesting question, does the massive increase in the presence of these files around 2001 illustrate the beginning of that trend? To that end, here is an article from July of 2000 on the phenomena.

Counting Things You See Straight Through

I don’t know what this all means, but I think it opens up some interesting questions and comes with some implications. Who would have thought a bunch of tiny files that you see straight through would have so much to teach us. Here is my first run at some implications.

  • Lots of Potential in Exploring Web Archives by Hash Value: We learn a lot when we see the traces of these images in the archives over time. I think it would be neat to see other exploration of hashes as a way to study web archives and I also think it might be interesting to see this become something that web archives consider as a way of providing access to their collections. When we end up knowing that two URLs held identical files at a particular date we could start to track and trace the replication and movement of information. Importantly, this is all derivative information about the content. So even in a situation where you can’t offer global access to the content itself you could very well provide the hashes for this kind of work.
  • Essential Need to Document Crawling Practices & for Looking for Traces of Crawling Practices: Andy’s point about changes in scoping the crawls dramatically change the way that one interprets the data. With that said, the drop off to 0 across the board in 1999 and 2000 is likely also a good thing to file away for web archive researchers. Something that dramatic should suggest considering if some collecting process factor is coming through. This both underscores the value of creating scope and content notes and keeping logs and all kinds of other records of crawling practice and gestures toward the need to develop methods and techniques for interpreting web archives that respect the nature of web archiving practice.
  • The Value of Multiple Archives: Given that crawling practices are going to be different in different archives there is a ton of value to having a lot of different archives. If we looked at trends for these files in other web archives we would start to see the trends that cross different approaches to crawling and get closer to understanding what parts of what we are seeing are part of the crawler and the collection and what are part of the web as it was.
  • The Value of Records of Multiple Copies: The trends in the appearance of these files opens up all kinds of questions. Think about similar approaches to all kinds of other files. That is, trends in identical copies of files are themselves telling about the movement, dissemination, and popularity of practices and approaches. So there is informational content in the files, but the history of the appearance of a given file in a given place also comes with a lot of potential informational value.
  • Hashes are Still Just One Way of Characterizing: While hashes are exciting, it’s important to remember that there are many other ways to characterize similarity. It might be interesting to just look for .gif files that are really tiny to see how many other SHA values we can identify for other transparent GIFs. When one moves further into hash based approaches to studying files it’s going to be important to remember that minor changes in a file are going to give it a totally new hash. So it will still be interesting to continue exploring ways of suggesting that two images are likely versions of the same thing through different methods of characterizing the files and then exploring how to store that information to aid in research and discovery.

What do you think?

I would love to hear from other folks about what you think these trends suggest. Also, if folks want to do their own explorations of some of this stuff please share results and thoughts on that back here in the comments too.


Review: Preserving Complex Digital Objects

The following is a pre-print of a book review of Preserving Complex Digital Objects which I was invited to write for the Journal of Academic Librarianship. I wrote it in December of 2014. I keeping with everything on this site, it is not written in any kind of official capacity or role. 

delveRGBPreserving Complex Digital Objects, by Janet Delve and David Anderson, editors. Greenwich, CT: Facet Publishing , 2014.

Ensuring long term access and usability of complex digital objects is of critical importance to the future of nearly every area of arts, culture, the humanities and the sciences. With that noted, to date there is a surprisingly small amount of basic and applied research and scholarship that explicitly engages with issues in this area. To this end, the 25 essays in Preserving Complex Objects are invaluable as documentation and presentation work on this topic. With origins in a 2010 JISC funded workshop and further work funded by the European Commission the book is anchored in the UK and European context but includes a series of essays about several related projects from across the globe.

In the realm of complex digital objects, the book is particular focused on three kinds of born digital content; simulations and visualizations, software art, and gaming environments/virtual worlds. It includes essays by content creators on the significance of their objects, cultural institutions on issues in archiving these materials, discussion of tools and practices, a series of case studies and two essays on some of the significant legal issues.

From my perspective, the strongest and most valuable essays in the volume come from the section focused on practices and tools for software preservation. Many of the other essays, while interesting in their own right, read mostly as reporting out on work that was done instead of developing frameworks and material that is useful for someone actually focused on preserving a given piece of software. It is worth underscoring that “complex digital objects” is in some ways synonymous with software. Of particular interest is Neil Chue Hon of the Software Sustainability Institute’s essay “Digital preservation and curation: the danger of overlooking software” succinctly explains seven approaches to software preservation a (preserving original hardware, emulation, migration, cultivation, hibernation, depreciation and procrastination) and compares and contrasts the relative strengths and weakness of each approach for different contexts. Similarly, Brian Matthews, Arif Shaon and Esther Conway’s chapter “How do I know that I have preserved software” expands on those categories and offers some valuable initial discussion of how the community should go about assessing if what has been preserved in a given context is going to be good enough for a given set of future uses. These essays are both valuable building blocks for what should be a whole field of software preservation scholarship.

It should be noted that work in software preservation has recently picked up and that a range of recent and more U.S. focused projects in this area are not significantly discussed or considered. This is not so much a criticism, it isn’t really necessary for a book to cover the entire state of the field, but instead a note to potential readers that there is a good bit of related work that isn’t represented here. There are several significant U.S. based software preservation projects that are notably absent, the National Software Reference Library run by the US National Institute of Standards and Technology, the open source JSMESS emulation platform which was recently implemented broadly by the Internet Archive and the Olive Executable Archive platform under development at Carnegie Mellon University. Similarly, recent acquisitions of software based art at MOMA and of mobile applications source code at the Cooper Hewitt Design Museum are helping to move the state of the practice forward.

There remains a critical need for work on the preservation of software and other complex digital objects. To that end, this book is invaluable. With that noted, given the report like nature of this book, I think it’s audience is really the relatively small community of practice and research that is forming around the preservation of software and other complex objects. The book provides considerable insight into ongoing work in the UK and Europe more broadly. I hope that this is the first in an entire library of books to engage with these issues.

To Show For It: Links, Lists & Paper from Four Years at LC

I had originally intended this to be a post for The Signal, but it ended up having more of a personal bent to it so it made more sense for it to go here. Like everything on this site, nothing here reflects any official anything of any org or institution This is just my own personal thoughts/reflections. 

For the last four years I had the distinct pleasure of working as a digital archivist with NDIIPP at The Library of Congress. Apparently, I’ve been up to a lot here too. A search for mentions of me in The Library of Congress search box since 2010 currently turns up 187 “available online results.” After a quick skim of the results, I think that is mostly things this Trevor Owens was involved in. I’ve learned so much about this field and this work and a big part of that has been writing up my ideas and perspectives and conducting interviews for this blog.

As you might imagine, there is a form for leaving the LC.
As you might imagine, there is a form for leaving the LC.

I recently started a new position as the Senior Library Program Officer for the Institute for Museum and Library Services (IMLS) tasked with steering the national digital platform portfolio for libraries. I don’t know about you, but to me that sounds like too much fun and too great of a chance to make a national impact on the field to pass up!

Name badges from conferences, summits and workshops I've participated in the four years that accumulated in my desk drawer. It's fun to look at little pieces of paper and plastic like these that accrue in your desk and see what you see about yourself and your work in them.
Four years of name badges from conferences, summits & workshops I participated in and/or planned. These little pieces of paper & plastic that accrue in your desk pile into something that reflects back who you are and what your work has been.

As part of my work at NDIIPP, I’d been eagerly following developments around the emerging vision to support a national digital platform for libraries (PDF). A lot of incredibly smart and talented folks have been working on this and I am thrilled to be able to play a part in this effort. But the cool new thing isn’t what this post is about. This one is about looking back at what accrued over the last four years. The things I made, wrote, and did and what I find useful from that time and place and what I think is hopefully useful to others.

It has been a privilege and an honor to be able to work at The Library of Congress. The staff and the collections are both treasures from which I have learned so much. People at the institution, and throughout the national and international community of folks working on digital library issues, continue to be generous in sharing their time and ideas. I count myself very lucky to be a part of this field. With that noted, what do I have to show for my time?

Some Lists of Posts and Projects

"New business cards" Uploaded on November 15, 2010
New business cards” Uploaded on November 15, 2010

Given that we just finished out one year and started another, a time of top lists, and that leaving a job naturally pushes one to glance backward on their work, I thought it would make sense to share some of what I think are the interviews, posts and projects I’ve worked on in my time at LC. These are the things that keep popping up in my mind over time. So, here are some lists of work I did on The Signal and more broadly at LC that you can access over the web.

10 of the Interviews I Revisit

  1. Open Source Software and Digital Preservation: An Interview with Bram van der Werf of the Open Planets Foundation April 4, 2012
  2. Digital Strategy Catches up With the Present: An Interview with Smithsonian’s Michael Edson   August 9, 2012
  3. Life-Saving: The National Software Reference Library May 4, 2012
  4. We’re All Digital Archivists Now: An Interview with Sibyl Schaefer September 24, 2014
  5. Historicizing the Digital for Digital Preservation Education: An Interview with Alison Langmead and Brian Beaton May 6, 2013
  6. The Metadata Games Crowdsourcing Toolset for Libraries & Archives: An Interview with Mary Flanagan April 3, 2013
  7. The PDF’s Place in a History of Paper Knowledge: An Interview with Lisa Gitelman Jun 16, 2014
  8. Archivematica and the Open Source Mindset for Digital Preservation Systems October 16, 2012
  9. Exhibiting .gifs: An Interview with curator Jason Eppink June 2, 2014
  10. Collecting and Preserving Digital Art: Interview with Richard Rinehart and Jon Ippolito November 26, 2014
When I found Ed Summers name plate after he left but before I did.
When I found Ed Summers name plate after he left but before I did. Taken on October 3, 2014.

5 Posts I Wrote That I Revisit

  1. The is of the Digital Object and the is of the Artifact October 25, 2012
  2. Interface, Exhibition & Artwork: Geocities, Deleted City and the Future of Interfaces to Digital Collections January 28, 2014
  3. What Do you Mean by Archive? Genres of Usage for Digital Preservers February 27, 2014
  4. All Digital Objects are Born Digital Objects May 15, 2012
  5. Glitching Files for Understanding: Avoiding Screen Essentialism in Three Easy Steps November 5, 2012

5 Other Things I Loved Working On

  1. The NDSA Levels of Digital Preservation (PDF): I feel like this project illustrates the potential of an organization like the NDSA to make an impact on the practice of digital preservation. I love that I had a part in getting it off the ground and shaping it.
  2. Preserving.exe Report/Meeting: I think this meeting and report on collecting, preserving and providing access to software turned out really well.
  3. Working on the Digital Culture Web Archive with the American Folklife Center and the #FolklifeHalloween2014 Photo Project.
  4. CURATECamps: Shortly after I started I pitched the idea that we should host some unconferences and I think the CURATECamp Processing, CURATECamp Exhibition, and CURATECamp Digital Culture have all been great events.
  5. Finding Our Place in the Cosmos: They let me spend 60% of my time for a year doing research and writing about Martians, the history of models of the cosmos, and Carl Sagan.

Advance Praise for Designing Online Communities

Owens Book Cover
The cover for my book, which I’m rather happy with. I like that it looks like it could well be the cover of one of the books I focused my analysis on 🙂

My book proofs have been finalized and it now has a cover!

Along with that, Amazon seems to think it will be out in March. So in advance of that, I thought I would share the “advance praise” quotes I collected for the publisher here.

Can media archaeology have a methodology? Does software studies need data sets? In Designing Online Communities, Trevor Owens presents a bracing case study that not only contributes to our understanding of lives lived online, but also joins the empirical rigor of applied social science with leading-edge digital and media studies.” Matthew Kirschenbaum, University of Maryland

Designing Online Communities is a must-have for anyone designing or researching online communities, particularly for learning. Owens’ work is both comprehensive and eminently readable, a sweeping look at the technologies, design patterns, and cultural forms they produce that is both theoretically ambitious and grounded in examples and tools that will help you develop, research, and manage online communities.” — Kurt Squire, University of Wisconsin

“At a time when online communities are ubiquitous, and in some cases larger than most countries, it is critical that we understand how they are composed—technologically, psychologically, and sociologically. Trevor Owens shrewdly looks back to early bulletin boards and web forums to grasp the nature of these modern communities, how they arose, how they dealt with bad behavior and the inevitable disagreements between members, and how all of this was represented in rhetoric and code. This book provides essential context for our shared online existence.” Dan Cohen, Digital Public Library of America

“Part enabler, part denier, full-on technological mediation, web forums offer a fascinating entry point into the interplay of software and social interaction. In Designing Online Communities, Owens deftly mixes actor-network theory, discourse analysis, and other approaches, writing with clear language and insight to expose the ideologies inherent in seemingly pedestrian historical artifacts — how-to books for web forum administrators. His engaging analysis gives clarity to how the design strategies implicit in code influence the ways we build conversations, relationships, and communities on the web.” Jefferson Bailey, Internet Archive

“An important read for educators interested in using and building online communities. Trevor Owens asks us to consider how technologies reflect and shape permissions and control, and how the managers and builders of online communities wield power beyond simply an offer of “connectivity.” Audrey Watters, Hack Education


Mobile, Bots, Sound Studies & Video Games: Things In My New Digital Public History Grad Seminar

Huge thanks to everyone who weighed in on what I should add to my Digital Public History Graduate Seminar. I thought folks here might be interested in seeing how that all turned out. So, you can check out the course blog/website and you can read the syllabus embedded here below. I figured I would also share the topics of the weekly schedule to give a quick sense of the sorts of things we are going to be getting into. There is of course always room for improvement, but we have reached the time when the semester is going to start so I think aside from fixing typos and such this is going to be the course 🙂

The course blog is going to be a public thing. Think of it as something like a semester long student run Review of Digital History. So if you are interested, you should subscribe to the feed and join in that conversation.

Weekly Topics

  • Becoming digital public historians
  • Defining digital history & public history
  • The Web: Participatory? Collaborative? Exploitive?
  • Distant reading, text analysis, visualization as scholarly communication
  • Designing digital projects
  • “MTV Cops” proposal pitch week
  • Digital media, materiality and formats
  • Spring Break
  • What are digital archives and what do they have to do with the public?
  • Digital exhibition, hypermedia narrative & bots
  • Digital audio, oral history and sound studies
  • Mobile media, place & mapping in public history
  • Playing the Past: Videogames, Interactivity & Action
  • Class Conference Week

Digital Public History Syllabus UMD 2015

Digital Library Infrastructures, Cosmos Exhibition, Digital Folklore & Dissertation -> Book: 2014 in Review

Another year. Another chance to push the pause button for a bit and try and make some sense of what it is I’ve been doing. As I did in 2012 and 2013, I am taking a few minutes to try to sift and categorize. So if you are interested in a recap of things I’ve done this year this post is for you, if not, I imagine you have already decided to stop reading.

Digital Infrastructure for Libraries and Archives

An image of me gesturing into space with a cartoon mecha archivist from the Radcliffe alumni magazine.
An image of me gesturing into space with a cartoon mecha archivist from the Radcliffe alumni magazine.

The bulk of my work this year falls under the broad category of exploring/improving digital infrastructure for cultural heritage institutions. I published 13 posts on this blog, including pieces on the leadership roles that digital archivists should playhow research questions work in the digital humanities and a knowledge infrastructures in digital humanities centers. If you search for my name in the Library of Congress and restrict it to the year 2014 you find there are 74 things I’m associated with from the year. That includes a mixture of blog posts, reports, and in a few cases people mentioning me in videos of talks from the Digital Preservation conference I served on the planning committee for. Below I’ve tried to break up some of the things I worked on this year into a few different areas.

Cosmos Online Collection Launched

An example item from the cosmos collection.
An example item from the cosmos collection.

January saw the launch of Finding Our Place in The Cosmos, an online collection/exhibition that I spent the previous year curating and project managing.

As part of the launch of the collection I did a lot of writing about it for various communications channels at The Library of Congress. I  interviewed astronomer David Grinspoon about his connections and relationship with his mentor Carl Sagan, I wrote about some of Sagan’s course materials for the Library of Congress science blog. I wrote about notions of technology and progress evident in primary sources for science teachers for the library of congress teachers blog. I wrote about Carl Sagan’s childhood writings on science and poetry for the Poet Laureate’s blog.

Along with writing on it for more general audiences, I also put together two reflective pieces about the process of working on the exhibition including  a draft style guide for digital collection hypertexts, and a piece on the role that worked through how I used pinterest to open up the process of identifying and selecting items for the exhibition. I loved working on this project, the opportunity to explore the collections at LC, to dig deep into Sagan’s papers, to think through the best way to assemble the technology to tell the story and the chance to work with so many smart people from across the institution.

Becoming Dr. Owens and From Dissertation to Book 

A hat, a hood, and a large document to commemorate the end of 23 years of school.
A hat, a hood, and a large document to commemorate the end of 23 years of school.

In February, I successfully defended my dissertation. I checked the box to provide my dissertation directly from George Mason University’s digital repository and that in no way held back in landing a book contract. So ends my continuous 23 years of schooling.

In spring of 2015 a revised version of my dissertation study is on track to be published as a book in the New Literacies and Digital Epistemologies series which Colin Lankshear and Michele Knobel edit.

I’m thrilled. I’ve been following Colin and Michele’s work on new literacies for nearly ten years, and books in the series like Rebecca Black’s Adolescents and Online Fan Fiction played a significant role in informing the study. My dissertation research on the history, structure and ideology of software platforms enabling online communities was informed by this body of scholarship and I am excited to see that it will end up as part of this list.

After receiving some feedback on the state of the dissertation itself, I took six months of weekends and evenings to revise and further transition it into more of a book form. After receiving approval on the text by the series editors I recently reviewed a copy edited version of it and will likely be looking at proofs in the next few months. So it sounds like everything is on track for the book to come out in the middle of 2015.

Born Digital Folklore and Vernaculars

A screen shot of an archived copy of Memegenerator in the Library of Congress web archives.
A screen shot of an archived copy of Memegenerator in the Library of Congress web archives.

One of the best parts of working in NDIIPP has been the opportunity to connect with the various custodial divisions of The Library of Congress to work through their issues in particular born digital content domains. This year it was my pleasure to collaborate with some of my colleagues in the American Folklife Center to explore the role that cultural heritage organizations can play in collecting and providing access to records of digital vernaculars and folk cultures.

I was able to plan, host and co-unchair Curate Camp Digital Culture with Folklorist Trevor Blank and Tumblr’s Meme Librarian Amanda Brennan. The camp was both a ton of fun and enlightening. It was a distinct pleasure to help guide NDIIPP’s junior fellow, Julia Fernandez through her fantastic work conducting a series of interviews I had started off a few years back exploring these issues. All told, the 13 interviews we did on this topic are functionally a brief 35,000 word book on the subject. In them you can find serious discussion and exploration of everything from Bronies, to deviantArt, to an exhibit of animated GIFs,  yelp reviews of restaurants and LOLCats.

Informed by the unconference and the interviews, I was able to help the American Folklife Center develop some new collections. Of the many ideas we explored, two seem to have really had legs so far. First, I’ve been able to make some significant contributions to scoping the Library of Congress Digital Culture Web Archive. So whatever else I do in my career, I’ve been instrumental in making sure that the a bunch of reaction gifs, meme images, creepypastas, and sites that document the meaning of various emoji and facebook symbols persist as part of the Library of Congress web archives. I’ve even helped contribute to the Library of Congress’ illustrious collection of bibles with the acquisition of The Lolcat Bible.

An Award Wining Year

Aside from all of this, I won two awards this year! The Society of American Archivists proclaimed me the Archival Innovator for 2014 and I won the C Herbert Finch Award for an Online Publication for my work curating/managing the Finding Our Place in the Cosmos Online Collection.

Looking back on 2014 I feel incredibly lucky to have had the opportunities I’ve had. I’ve been able to work with some amazing people, on a range of projects that I think are the right fit for my talents and interests.

To that end, I’m looking forward to the new adventures that 2015 has in store.

Curating in the Open: Martians, Old News, and the Value of Sharing as you go

The Salt Lake Tribune speculates about "vast thinking vegetable" on Mars
Speculation about the “vast thinking vegetable” on Mars from The Salt Lake Tribune

This is ultimately a story about how doing research for an online exhibition ended up sparking articles on Boing Boing, i09, and The Atlantic which explored a theme from the exhibit eight months before the exhibit would launch. I think the story has some lessons for thinking about the future of digital collections and exhibitions.

Finding our place in the cosmos

I spent 60% of my time at work in 2013 curating an online exhibition/collection/hypertext contextualizing the Carl Sagan papers in the history of astronomy and life on other worlds as evident in objects from across the Library of Congress collections. I’ve written before, about what I think that project has to say about how to compose such online things, but I haven’t shared much about how I went about identifying and selecting materials for it.

Through the process of working on the collection, I think I stumbled into something that has considerable potential to impact the way we should go about doing the work of creating such thematic narrative explorations of content in digital collections of libraries, archives and museums.


A big part of the interesting story about the idea of life on other worlds is that, for a good while, it was completely reasonable, if not expected that there would be intelligent life on the other planets in our solar system. One great episode in this story is the history of the Martian canals. Knowing how big of a topic this would be for popular press I realized I could just turn to Chronicling America, the website for a partnership between the NEH, LC and a network of libraries and archives from around the country to provide access to millions of digitized newspapers. I knew there would be a good bit of material here, and I was thrilled to find that a search for “martians” in the millions of digitized newspaper pages from 1836 to 1922 turned up a trove of pieces to explore. So I noted the pieces in this search that were particularly relevant for the collection. Instead of keeping these in a document on what my institution lovingly calls a “workstation,” I went ahead and just used Pinterest to keep track of them.

Work in Progress on Pinterest Progresses the Work

So I made Pinterest boards for each of the thematic sections of the collection I was working on. Below is an image of the Pinterest board I created on free and publicly available materials from across LC’s digital collections related to ideas of life on Mars. I liked using Pinterest for this as it created a visual way for me to track and organize these things. A big part of the project was to find what I could do with already publicly available digitized content, so it seemed like it would be fine to track these public materials using a personal Pinterest account. It had an interesting side benefit too.


I started using Pinterest for this purpose because it was easy, but it being public had an interesting secondary effect. As you can see from the image below, the board I started on Mapping Mars & Life on Mars ended up with 191 followers. It’s not a part of any official anything, but it turned out that many of the historians of science and history of science curious who follow me on twitter were interested enough to review and share some of the raw material I was pulling together on Pinterest. I needed to do this kind of aggregation for my own work for the essays and online collection, so it made sense to keep that up and out there for others to benefit from.


What Vast Thinking Vegetable of Mars Taught Me

Which brings me to the vast thinking vegetable that lives on Mars. One of the newspaper pages I found ended up showing up in my feed reader on Ptak Science Books.


If you don’t read John Ptak’s blog and you are into cool quirky history of science object stuff you are missing out. He is always sharing interesting finds. As you can see above, one day he found the article I found. It wasn’t just a coincidence, either. As you can see from the image below, John credited both the Chronicling America site and my Pinterest board in the post.


That alone was a hoot. What a success. I set out to use Pinterest to keep track and organize materials I might work with, but in the process I found an audience interested in the topics on Pinterest and that rolled into John getting in there and not only sharing what I had found but digging in and interpreting and explicating what about that article was interesting. While I hadn’t provided any interpretive frame, the things that I found interesting about the article were the same that John focused on. But it didn’t stop there. It turned out that Alexis Madrigal also reads John’s blog and that he thought this was interesting enough to take it to an even larger audience. It also hit BoingBoing and io9.


From my Pinterest board, to Ptak’s blog and from there to The Atlantic.  At this point, the Atlantic article ended up generating a surge of web traffic to the Chronicling America Website. So much so, that one of the project leaders noted the spike and went looking to see where it was coming from. The work I was doing to organize my notes, on at that point a project that had yet to be announced, had helped to punch a bunch of traffic and eyeballs back onto the content. That is, eight months before the launch the research process itself was hitting home a core objective of much of our work, spurring engagement and use of the collections. The traffic was nice, but importantly, it also had the effect of promoting thinking about the exact set of issues that the essays I was working on were focused on.

Both Ptak’s blog post and Alexis Madrigal’s piece on The Atlantic are brief but substantive. They contextualize and explore the issues of what it was and wasn’t reasonable to think about the existence of life on mars in the early 20th century. To this end, before I had even gotten close to publishing my essays, simply sharing the way I was organizing my resources and tweeting about them had prompted public scholarship exploring the same issues in the same resources.

Succeeding before You’ve Even Launched

So, before anyone had even formally announced this project, I was already meeting many of my objectives to spark conversations about the history of ideas of life on other worlds and generating significant use of the Library of Congress collections. I see a few different implications of this process.

  1. Defaulting to sharing serves the mission: The research that goes into preparing a thematic collection/exhibition is itself something that can be made into a public project that contributes to the objects of exhibiting materials. Using Pinterest to organize my research made that research into it’s own resource. While you can’t plan to have this kind of thing happen, you can plan to enable the possibility of it.
  2. There is great stuff on the cutting room floor that can have a life of it’s own: It ended up that I didn’t even use that giant vegetable eye story for the exhibition. It wasn’t the right fit in the end. The Pinterest boards I made are loaded with items that didn’t make the final cut but they still found their own audiences. This is to say, If I hadn’t shared the process there is little reason to believe this story would have gotten much attention. Just think about all the objects that someone considered featuring. Just the fact that it was considered is likely an interesting link that someone might be interested in following.
  3. Sharing Objects in the Research Process Encouraged Deeper Use: In the thematic essays, I work out what the objects mean and people scroll through and read that. However, just sharing the items I was working with in progress ended up inviting others to take those materials and interpret and explicate them on their own. Intriguingly, less became more there. It helped encourage others to explicate and contextualize.

PastPlay as the Digital Humanities

9780472035953I was invited to review Kevin Kee’s new edited volume Pastplay: Teaching and Learning History with Technology for the current issue of The American Historian. The author agreement allows one to post the “manuscript” version of this kind of thing to one’s personal website, so it’s shared here to that end. As I note, I think the concept of play at the heart of the volume is of potential interest for defining a perspective on play as something that defines the ever-nebulous digital humanities. 

Play can and should be a core part of both historical research and the teaching of history. This is the central argument the historian Kevin Kee frames around the fifteen essays gathered together in Pastplay: Teaching and Learning History with Technology.

The thesis of this collection emerges by stringing together the titles of the four sections of the book. Historians should be 1) teaching and learning history, 2) playfully, 3) with technology, 4) by building. Teaching and Learning History includes four cases studies of historical educational games. Playfully focuses on how play, or what author Stephen Ramsay calls the “Hermeneutics of Screwing Around,” can function as part of the practice of research and writing. With Technology explores board games, 3D printing, and simulation computer games as instruments for teaching history and engaging in historical scholarship. Finally, By Building provides four essays that argue that making things, from historical hoaxes to digital models of Victorian homes, can be powerful tools for historical inquiry. The Playfully section of Pastplay includes three essays that argue that play itself is an instrument for learning about the past. William J. Turkel and Devon Elliot connect work with 3D printing and fabrication with the value that historians of science have found in re-creating historical experiments. Ramsay argues for the value of serendipitous “screwing around” as a response to the massive scale of source material offered by millions of digitized books. Bethany Nowviskie explores a medieval device that served as a “mechanical aid to hermeneutics and interpretive problem solving” as inspiration for how humanists might make use of digital technologies (p. 140).

Pastplay focuses more on teaching and learning than it does historical scholarship, and as a result, the book is somewhat thin on addressing how play can and should be a component of historical inquiry. From my perspective, the most valuable contribution of Pastplay isn’t really articulated in the text. The book offers a framework for defining the ever-nebulous digital humanities. Many of the contributors are leading thinkers in the digital humanities, and their ideas about the playful use of technology to experiment, dabble, and explore the past offer insight into digital humanities epistemology. Often simply described as the application of computing technologies to humanistic inquiry, the playful hermeneutics described here, and the implication that there is no substantive difference between student learners and historians as perpetual learners, allow us to pin down what is different and significant about how these digital humanists approach the understanding of the past.

Pastplay is a book about teaching history, but the most intriguing parts of it deal primarily with historiography and method. In this respect, I might have liked to see two separate books: one focused on the educational possibilities of play and the other on how playful approaches to building models and exploring texts can provide value to the practices of historical research. While I’m still not entirely sure where this book belongs on my bookshelf, or what kind of course for which it is best suited, I am glad to know it is in my collection.