Curating Science, Software and Strides in Digital Stewardship: A Personal 2013 Year in Review

It’s that time of year. Time to take stock and provide an accounting. Looking back, all the themes I noted from 2012 carried through in 2013. That kind of continuity is itself exciting, it makes me think I’ve got a career/body of work emerging from what at times can feel like a flurry of activity and projects.

What follows is a quick run down of things I’ve been working on. This includes work from the office, from school, and those moments stolen away to write while on the commuter train spent working on a range of independent projects. In looking back I think I’ve spent a good bit of time focusing on the future of primary sources and scholarship in history, infrastructure and strategy for digital stewardship and on interpreting and presenting the history of science on the web.

Showing Bill Nye Carl Sagan's Papers, a personal highlight of the year.
Showing Bill Nye Carl Sagan’s Papers, a personal highlight of the year.

Future History

Orchestrating the Preserving.exe Software Preservation Summit: I’m very proud of the software preservation summit I played a role in this year. It was great to be able to take an idea from it’s inception about a year and a half ago through to it’s completion. There was great lead up to the meeting on the Signal blog, including this interview with Henry Lowood on video game preservation at scale. Discussions and presentations at the summit were well received, I know everybody left with a lot of excitement about some of the collections being developed and the role that emulation and virtualization is likely to play in the future of access for these collections. I’m thrilled with how well the Preserving.exe report for the meeting came out.

Meditations on Digital Objects as Primary Sources: Continuing some of my work from last year, I wrote a bit about the future of significance and equivalence, about the recursive nature of items and collections, about traces, significance and preservation, about connections between archival theory, stratigraphy and disk images,  and learned a ton doing this interview about historicizing digital preservation with perspectives from media studies and science and technology studies.

Three books essays of mine appeared in this year; Writing History in the Digital Age, Playing with the Past, and Rethoric, Composition, Play
Three books essays of mine appeared in this year; Writing History in the Digital Age, Playing with the Past, and Rethoric, Composition, Play

Digital History and the Future of Historical Scholarship: I started this year remotely offering my perspectives on the of an early career digital historian at the annual meeting of the American Historical Association. I ended up throwing down a bit on the American Historical Association’s dissertation embargo statement was asked to comment on the recent Organization of American Historians similar statement. In short, I’m becoming increasingly interested in working on the modes historians access and work with primary sources and the kinds of scholarly communication products they create as a result.

Closing in on the Dissertation: Earlier this year I defended my dissertation proposal. If you are at all interested in the history of the design and rhetoric of online communities consider reading my proposal. I’m looking forward to carrying some of that thesis work forward into some of my job next year further exploring preserving online communities and the vernacular web. I’m thrilled to report that I have a full draft of my thesis in hand and that it has already gone through one round of review by my thesis committee. I’m looking at defending the thesis in the early spring. I won’t be embargoing it, so you can expect to be able to download it in full from GMU’s open access dissertation repository and here on my website as soon as it’s done.

Some scratches from my notebook where I was figuring out some themes for my dissertation conclusions.
Some scratches from my notebook where I was figuring out some themes for my dissertation conclusions.

Exhibition in and of the Digital Age: Alongside the Digital Preservation 2013 meeting, I had the chance to coordinate CURATEcamp Exhibition: Exhibition in and of the Digital Age. Together with my un-conference-chairs Michael Edson from the Smithsonian Institution and Sharon Leon from the Roy Rosenzweig Center for History and New Media I kept the plates spinning on a great and far ranging set of discussions on the future of exhibition. There were sessions on the future of online exhibits, on visualization as a mode of exhibition, on exhibition of born digital works, and a range of other issues. You can read notes from many of the sessions up on the CURATEcamp wiki. I’m still processing and digesting some of the ideas shaken loose from the camp, so expect more from me next year on some if the ideas and implications of those discussions. Some of this percolated up in thinking through a museum’s acquisition of an historic iPhone. 

From Past Player to Past Editor: This year I took on the role of co-editor of Play the Past, alongside Shawn Graham. It’s been a lot of work, I appreciate everything Ethan Watrall did to get the blog up an running and keep it running. When I started my primary goal was to get more activity through guest posts and getting new bloggers into the fold. I’m thrilled to have Angela Cox and David Hussey join the blog and contribute a lot of amazing work alongside a range of great guest posters. In short, I think we have seen a lot of great and diverse work on the blog and I’m looking forward to seeing where it goes into the future.

Three chapters I wrote ended up in dead tree volumes. The Hermeneutics of Data and Historical Writing, Modeling Indigenous Peoples, and Mr. Moo’s First RPG: Rules, Discussion and the Instructional Implications of Collective Intelligence on the Open Web
Three chapters I wrote ended up in dead tree volumes. The Hermeneutics of Data and Historical WritingModeling Indigenous Peoples, and Mr. Moo’s First RPG: Rules, Discussion and the Instructional Implications of Collective
Intelligence on the Open Web.

Infrastructures and Strategy for Digital Stewardship

Crowds & Roles for Public in Digital Library, Archives and Museum Projects: The year started off with the publication of a lot of my ideas on public participation in cultural heritage in Digital Cultural Heritage and the Crowd, in Curator: The Museum Journal. I interviewed Arfon Smith of Galaxy Zoo and the Adler Planetarium about the role of citizen science projects in digital stewardship and cultural heritage. I also wrote a bit about the role that citizen science projects can play in informing science education. My conversation with Mary Flanagan about her Metadata Games crowdsourcing platform ended up being one of the top Signal posts for the year. This year at THATcamp prime, a group of us thought through how crowdsourcing might be applied to explore images from inside the wealth of digitized books out there, and then actually stood up an instance of Metadata Games to run against images we stripped out of some Project Guttenberg books. I tried to spark some conversation about how cultural heritage orgs could shift their workflows to better anticipate activity of the crowd but it didn’t really go anywhere. Yet.

Dominic McDevitt-Parks talking about partnerships between wikipedia and the National Archives at #DCHDC.
Dominic McDevitt-Parks talking about partnerships between wikipedia and the National Archives at #DCHDC.

Open Source and Digital Stewardship: I had a nice set of interviews on the role of open source in digital preservation and stewardship come out. I talked with Peter Murray on when OSS is the right choice for cultural heritage orgs. Tom Cramer and I discussed the approach that Hydra is taking. I talked with Don Mennerich from NYPL about his work on born digital manuscript materials and got some of Cal Lee’s perspective on the same issue in this interview on BitCurator.

Pushing Out the Levels of Digital Preservation: Earlier this year saw the publication of the first version of the NDSA levels of digital preservation and a paper on them. It’s the result of a great little sub group of folks from NDSA member organizations and I think we have a lot to be proud of in it. I’ve been thrilled to see all the ways this  guidance is being used to inform practice at organizations all over the place (ex. at USGS, ARTstor, TRC Canada, MetaArchive, and Mississippi’s Archives.

Contributing to the National Agenda for Digital Stewardship: I’m thrilled to have a part in shaping the first National Agenda for Digital Stewardship. I think the document is a real triumph for the NDSA, it outlines a lot of issues that matter and it’s unique in getting more than a hundred some organizations to speak with one voice about national priorities. As the co-chair of the NDSA Infrastructure working group, I had a hand in shaping a good bit of the infrastructure section.

Special Curator for a History of Science Project

This year I’ve been thrilled to have the chance to spend the bulk of my work time on a history of science project. The work is mostly finished, but it’s not out yet so I can’t talk about it much right now. But I can talk about a few pieces of that work that are public. 

The most important thing in the universe by L.M. Glackens. Cover from Puck, v. 60, November 7, 1906.

You can get a taste of some of the work I’ve been engaged in up on a two of the LC blogs. I’m rather happy with this piece I wrote about visions of earth from space before we went there, which was picked up by Smithsonian magazine and by Popular Science. I also wrote about the history of imaginary space ships.

I also wrote a series of pieces on how science teachers can use some historical astronomy items as teaching tools. I’m really happy with how each of these turned out.

Not officially a part of my work, but Marjee and I pitched a script for a Ted-Ed video called Is there a center of the universe? which I think turned out to be amazingly cool. 

Center of universe ted video

Display for the Carl Sagan Event: As part of my work I was thrilled to curate a presentation of items from the Carl Sagan papers alongside some rare astronomy books and comics and prints to illustrate how Sagan’s papers fit into both historical and fictional ideas about life on other worlds in the Library of Congress collections. A high point there for me was when I got to show Bill Nye through some of the Sagan papers.


Mass Digitization, Archives, and a Multiplicity of Orders & Arrangements

Quick, drop everything and read All Text Considered: A Perspective on Mass Digitizing and Archival Processing. It helped me think through some of what I was getting into in Implications for Digital Collections Given Historian’s Research Practices.

The abstract of the paper does a great job at explaining it’s objective, “coupling robust collection-level descriptions to mass digitization and optical character recognition to provide full-text search of unprocessed and backlogged modern collections, bypassing archival processing and the creation of finding aids.” The key point in the piece, is that it’s becoming plausible to see digitization costs as being on par with the actual processing costs of a collection. You can read this as an even more extreme take on MPLP, where digitization would potentially replace a significant part of the processing process itself. Which is exciting/intriguing for a number of reasons, one of which is as a prompt for thinking through a different kind of future for archival description and access.

The possibility of actual original order and a multiplicity of orders

Most of archival original order ends up being it’s own kind of new order. So if/when you do get around to doing some form of arrangement it’s strictly intellectual arrangement, you do so without actually moving anything.  That is, if you did still want to do processing you could do it on the digital files and then provide any number of different identifiers that resolve to the digital files. In essence, the information about original order and any further arrangement would be demoted from the central organizing factor to a relevant and important piece of metadata alongside any other pieces of metadata.  So you have the order things came in and the order the archivist worked out after processing. One would likely do some coarse level of weeding and deaccessioning in many cases before digitizing, but then once digitized a processing archivist would be able to further decide which of the scanned files should be kept and what the permissions for viewing the images are. From there, you just set different permissions, say onsite access, reading room only access, dark archive for x years, complete public access. You could then just work from a black list white list approach to whatever level of granularity an archive decided to process a given collection to. Not to mention, with OCRable archival material the OCR itself could be used to set up some heuristics for what kinds of materials to show to what users in what circumstances.

The container list for an archive enforces a single linear hierarchy on the contents of the archive. Each sheet of paper can only be in one folder, in one box, in one series.
The container list for an archive enforces a single linear hierarchy on the contents of the archive. Each sheet of paper can only be in one folder, in one box, in one series.

Linked Open Description

The Herbert A. Philbrick Papers in unprocessed form. Manuscript Division, Library of Congress
The Herbert A. Philbrick Papers in unprocessed form. Manuscript Division, Library of Congress

If the archive just commits to minting a URL structure then this process opens an exciting new future for description. That is, if every image has a URL, and the folder and collection are named in the URL (Ex /division/collection/series/box/folder/image ) then you (or anyone else for that matter) can create a range of descriptions and relationships of those digitized objects. If something comes in substantial disorder, Like the Herbert A. Philbrick Papers, many of which came in the trash can’s pictured here, then you just make a directory for the trash can and number the images based on the order you pull them out of the can. When you do go ahead and arrange the scans, you can do so while retaining the order they were pulled out of the trash can as a parallel set of the persistent metadata element.

The net result is that you are no longer limited by the fact that one atom is stuck in one spot. You just index the content in as many ways as you like. Much like the chaotic storage principles at the heart of the design of organizing Amazon’s warehouses you use the logic, structure and order of the database to transform the order of physical materials into something akin to the random access nature of a hard drive. The result:

  1. You get the benefit not being limited by the fact that a thing can only be in one place at a time.
  2. You are also not limited to one linear/narrative/sequential way to find things
  3. Anyone inside or outside an organization can then set up in house, or third party services, to let stewards/curators add any level of description to any arbitrary set of images. That is, internal and external agents could provide distinct data to organize and structure collection content,  which the institution could chose to harvest and display to the extent they were interested. Since you are actually minting URL’s you could then start to watch inbound links to your items from things like citations and pull those links in as a kind of descriptive trackback.
If everything is digitized and each image is given an ID then any number of different modes of arrangement could be minted and maintained referencing the images. Making it function much more like this distributed network. The Network by @nancywhite, CC-BY
If everything is digitized and each image is given an ID then any number of different modes of arrangement could be minted and maintained referencing the images. Making it function much more like this distributed network. The Network by @nancywhite, CC-BY

Paralyzing or Paralleling Workflows for Archives

I think this could also help to break up much of the serial nature of workflows for cultural heritage orgs. That is, if you digitize everything and give them persistent URLs that mean things then you could have any number of processes like arrangement, description, OCR, and even processes for automated description like topic modeling run against your materials in a much more parallel fashion. If we started giving persistent URLs to these images at the beginning of our workflows instead of at the end we can reap the benefit of running any number of jobs and processes against them simultaneously. Furthermore, these could happen on a rolling basis, that is you wouldn’t need to wait for any one process to finish before moving on to another. I wrote a bit about this idea in Paralyzing or Paralleling Workflows for THATcamp leadership and a lot of these ideas came up and were discussed at CurateCamp Processing: Processing Data/Processing Collections

All Kinds of Cans of Worms Opened

All Text Considered: A Perspective on Mass Digitizing and Archival Processing opens all kinds of different cans of worms. For some kinds of materials, the prospect of digitization and OCR could make material accessible in shorter order. With that said, it throws open the doors to figure out what exactly intellectual  control means in those circumstances, and what kind of further processing and arrangement one would want to do, or how to go about integrating automated techniques for summarizing and describing content an archivist might use to complement and extend their efforts to make an archive’s structure legible to their users.

I’d love to hear your reactions to some of my provocations here and any other thoughts and reflections the essay prompts in discussion in the comments.

Thanks to Jefferson Bailey, Thomas Padilla, and Ed Summers for comments on a draft of this post. They each had some great ideas and input. I hope they’ll bring some of their more extended comments into the comments here.

6 Digital Historiography and Strategy Grad Seminars I’d Love to Teach

As I’ve been working on finishing my dissertation over the last two years I haven’t had the chance to teach graduate seminars and I really miss it. I’ve twice taught American University’s History in the Digital Age course for their History and Public History program and I’d love to do that sort of thing again. Partially inspired by other very cool courses I see folks sharing syllabi from,  and as s a fun thought experiment, here are a few ideas for six grad seminars I’d love to develop and teach.

Visualizations of the Enron Email Archive Dataset

Understanding and Interpreting Born Digital Primary Sources: Web archives, software collections, video games, digital photographs, email archives, historical laptops, floppy disks; the world (and institutions of cultural memory) are now flush with born digital primary sources. Working directly with digital artifacts students would explore and develop practices and processes for making sense of born digital materials.

Public Digital History: Scholarly Communication, Explication and Participation on the Web: Historians and public historians write books and articles and develop exhibitions to communicate to audiences about the past. The web brings with it a range of modes for communication and dialog and significant opportunities for historians to engage with and invite participation from the people formerly known as the audience.

A photo of the Einstein Memorial shared on Flickr

Sites of Memory: Museums, Monuments and Memory in the Digital Age: What do you make of the trip adviser page for the Albert Einstein Memorial All the selfies people take of themselves in museums? What does the potential for augmented reality mean for the set up and presentation of historic homes?  The course explores what changes as public sites of memory become part of networked publics.

Historicizing the Digital in Digital Preservation: It’s easy to fall into the trap of thinking that digital objects are a stable and straightforward thing. In practice, electronic records, software, and digital objects have meant different things at different points in the history of computing. This would basically be a take on Allison, Brian and Jefferson’s course.

Studying the Vernacular Web: Making Sense of Records of Everyday Life from the Web: Folklorists, anthropologists, sociologists and other adherents to ethnographic research methods have developed approaches for netnography and virtual ethnography to study the ways that people are creating and developing cultures on the web. The course would focus in particular on the methodological questions inherent to studying the records of computer mediated communication.

Digital Strategy for Cultural Heritage Organizations:  Digital is increasingly becoming a key part of nearly every function of cultural heritage organizations (Libraries, Archives, Museums etc.). We are increasingly acquiring, preserving and exhibiting born-digital and digitized materials, using social media for outreach and public relations, supporting researchers and fielding reference questions through digital channels, and supporting all of that work with a substantive IT infrastructure.  Looking across each of these areas, this course would focus on exploring ideas for how organizations should be structured, about the role of software development should play, embedding “digital into the design, decision making, strategy and all the operations” of cultural heritage orgs and the role that the web should play as a platform and organizing principle for orgs.

So, if anyone from a D.C. metro area institution of higher learning wants someone to teach an awesome special topics course in the evenings after work drop me a line. Oh and please feel free to run with any of these as ideas for your own courses. There is no higher flattery than having