Some class notes from Alice Rogers in my digital preservation seminar.

This has been brewing for a while, but it’s now enough of a thing that I can share about it. I am excited to announce that I’m on the hook with Johns Hopkins University Press to produce a short book (30-40k words) called The Theory & Craft of Digital Preservation: An Introduction.

I have about half of the book together in a really rough draft form. Much of my nights and weekends for about the next six months will be spent working up the rest of it and getting the whole thing together.

The genesis of the book came when I was designing my digital preservation seminar and realized that I feel like much of the beaten path for talking about digital preservation has more to do with how we got to what we do now than how it would make sense to explain the issues and topics to folks from scratch. So the course has given me a chance to try out the road-map for the book.

I’ve gotten the OK to share drafts of the chapters as they start to come together. I’ve found that I benefit dramatically from doing my writing in the open where folks can help me refine and sharpen my ideas before they end up fixed in any particular medium.

To that end, I figured I would share most of the book proposal I worked up. In working on drafting, some of this has started to shake out a bit differently, but I thought folks might be interested in a preview. I’m thinking I will start posting a chapter or two a month early-ish in the new year.

Overview of the Book

The historical record is increasingly digital. Over the last half century, under headings of “electronic records management” and “digital preservation,” librarians, archivists, and curators have established practices to ensure that our digital scientific, social and cultural record will be available to scholars and researchers into the future. This book is intended as a point of entry into that theory and practice.

Through years of leading collaborative national digital strategy efforts to ensure long-term access to digital content, I have observed that many experts in digital media and libraries, archives and museums often end up talking past each other as they work toward their mutual goals. All too often, discussions of digital preservation fail to fully state and engage with the nature digital objects and media, thereby undermining our ability to fully engage do this work in a common and coherent fashion.

This failure of understanding is rooted in two key fundamental issues: First, that preservation itself is not a single area of activity, but has always been historically intertwined with distinct disciplines that have grappled with the affordances of various historically “new” mediums. Second, that there are distinct affordances of digital media that require rethinking those diverse perspectives on preservation and conservation. The central contribution of this book is to put the lineages of preservation in dialog with the affordances of digital media as basis to articulate a theory and craft of digital preservation.

As a guidebook and an introduction, this text is a synthesis of extensive reading, research, writing, and speaking on the subject of digital preservation. It is grounded in my work on digital preservation at the Library of Congress and before that, working on digital humanities projects at the Center for History and New Media at George Mason University.  The first section of the book synthesizes work on the history of preservation in a range of areas (archives, manuscripts, recorded sound, etc.) and sets that history in dialog with work in new media studies, platform studies, and media archeology. The later chapters build from this theoretical framework as a basis for an iterative process for the practice of doing digital preservation.

This book serves as both a basic introduction to the issues and practices of digital preservation and a theoretical framework for deliberately and intentionally approaching digital preservation as a field with multiple lineages.  The intended audience is current and emerging library, archive, and museum professionals as well as the scholars and researchers who interface with these fields. As such, the book will be useful as assigned reading for graduate courses in digital preservation and digital curation in library science, museum studies, and public history programs. This book is also highly relevant to digital humanities programs and courses as the work of digital humanists increasingly results in the development of digital platforms, tools and resources which face significant sustainability challenges and thus require an understanding of digital preservation planning to succeed.

There are a handful of books on digital preservation, but this book is significantly different in two key ways. First, it is intentionally brief. Because of this, it is more accessible and usable by a wide range of stakeholders in digital preservation. This is not to an exhaustive work on the subject, but a clear and focused perspective and approach. Second, it treats digital preservation as a craft and anchors it in work in humanities scholarship on media and mediums. Much of the extent work on digital preservation approaches the subject as one that is highly technical, which continues to obfuscate many key issues and assumptions, particularly for humanities scholars interested in understanding digital preservation. While the book has a practical bent, it is not a how-to book that would quickly become outdated. It establishes and offers stages and processes for doing digital preservation, but it is not tied to particular tools, methods, or techniques. Instead, it is anchored in an understanding of the traditions of preservation and the nature of digital objects and media.

Sections of the Book 

Introduction: Getting Beyond Digital Hyperbole

At a summit on digital preservation at the U.S. Library of Congress in the early 2000s, a participant from a technology company proposed, “Why don’t we just hoover it all up and shoot it into space.” The “it” in this case being any and all historically significant digital content. Many participants laughed, but it wasn’t intended as a joke. Many have, and continue to seek similar “moon-shots,” singular technical solutions to the problem of enduring access to digital information.

More than a decade later, we find ourselves amid the same set of stories we have heard for at least thirty years. Among the public, there is a persistent belief that if something is on the Internet, it will be around forever.  At the same time, warnings of a potential impending “digital dark age,” where records of the recent past become completely lost or inaccessible appear with regular frequency in the popular press as well.

To many, it seems like the world needs someone to design a system that can “solve” the problem of digital preservation. The wisdom of the cohort of digital preservation practitioners in libraries, archives, and museums who have been doing this work for half a century suggests this is an illusory dream not worth chasing. Working to ensure long-term access to digital information is not a problem for a tool to solve. It is a complex field with a significant ethical dimension. It is a vocation.

The purpose of this book is to offer a path for getting beyond the hyperbole and the anxiety of the digital and establish a baseline for practice in this field. To do this, one needs to first unpack what we mean by preservation. It is then critical to establish a basic knowledge of the nature of digital media and digital information. With these in hand, anyone can make significant and practical advances toward mitigating the most pressing risks of digital loss. For more than half a century, librarians, archivists, and curators have been establishing practices and approaches to ensure long-term access to digital information. Building from this work, this book provides both a sound theoretical basis for digital preservation and a well-grounded approach to its practices and craft.

Section One: Historicizing Preservation and Digital Media

Chapter One: Preservation’s Divergent Lineages

Interdisciplinary dialog about digital preservation often breaks down when an individual begins to protest “but that’s not preservation.” Preservation means a lot of different things in different contexts. Each of those contexts has a history. Those histories are tied up in the changing nature of the mediums and objects for which each conception of preservation and conservation was developed. All to often, discussions of digital preservation start by contrasting digital media to analog media.  This contrast forces a series of false dichotomies. Understanding a bit about the divergent lineages of preservation helps to establish the range of competing notions at play in defining what is and isn’t preservation.

Building on work in media archeology, this chapter establishes that digital media and digital information should not be understood as a rupture with an analog past, Instead, digital media should be understood as part of a continual process of remediation embedded in the development of a range of new mediums which afford distinct communication and preservation potential. Understanding these contexts and meanings of preservation establishes a vocabulary to articulate what aspects of an object must persist into the future for a given preservation intent.

To this end, this chapter provides an overview of many of these lineages. This includes; the culture of scribes and the manuscript tradition; the bureaucracy and the development of archival theory for arranging archives and publishing records; the differences between taxidermy and insect collecting in natural history collections and living collections like butterfly gardens and zoos; the development of historic preservation of the built environment; the advent of recorded sound technology and the development of oral history; and the development of photography, microfilming and preservation reformatting. Each episode and tradition offers a mental model to consider deploy for different contexts in digital preservation.

The purpose here is not a detailed history of lineages of preservation and the development of media, but instead to illustrate the many different conceptions of preservation exist and how those conceptions are anchored in different objectives. This overview provides readers with a focus on the distinct conceptions of what matters about an object and the innate material properties and affordances of different kinds of media as they relate to preservation.

Chapter Two: Understanding Digital Objects

Doing digital preservation requires a foundational understanding of the structure and nature of digital information and media. This chapter works to provide such a background through three related strands of new media studies scholarship. First, all digital information is material. Second, digital information is best understood as existing in and through a nested set of platforms. Third, that the database is an essential media form and metaphor for understanding the logic of digital media.

Given that digital information is always physically encoded on digital media, it is critical to recognize that the raw bit stream (the sequence of ones and zeros encoded on the original medium) have a tangible and objective ability to be recorded and copied. This provides an essential first level basis for digital preservation. It is possible to establish what the entire sequence of bits is on a given medium, or in a given file, and use techniques to create a kind of digital fingerprint for it that can then be used to verify and authenticate perfect copies.

With that noted, those bit streams are animated, rendered, and made usable through nested layers of platforms. In interacting with a digital object, computing devices interact with the structures of file systems, file formats and various additional layers of software, protocols and drivers. Drawing on examples from net art, video games, and born digital drafts of literary works, I explore multiple ways to approach them anchored in different layers of their digital platforms. The experience of the performance of an object on a particular screen, like playing a video game or reading a document, can itself obfuscate many of the important aspects of digital objects that are interesting and important but much less readily visible, like how the rules of a video game actually function or deleted text in a document which still exists but isn’t rendered on the screen.

As a result of this nested platform nature, the boundaries of digital objects are often completely dependent on what layer one considers to be the most significant for a given purpose. In this context, digital form and format must be understood as existing as a kind of content. Across these platform layers digital objects are always a multiplicity of things. For example, an Atari video game is a tangible object you can hold, a binary sequence of information encoded on that medium identical to all the other copies of that game, source code authored as a creative work, a packaged commodity sold and marketed to an audience, and a signifier of a particular historical moment. Each of these objects can coexist in the platform layers of a tangible object, but depending on which is significant for a particular purpose one should develop a different preservation approach.

Lastly, where the index or the codex can provide a valuable metaphor for the order and structure of a book, new media studies scholarship has suggested that the database is and should be approached as the foundational metaphor for digital media. From this perspective, there is no “first row” in a database, but instead the presentation and sorting of digital information is based on the query posed to the data. Given that libraries and archives have long based their conceptions of order on properties of books and paper, embracing this database logic will have significant implications for making digital material available for the long term.

Chapter Three: Challenges  & Opportunities of Digital Preservation

With an understanding of digital media and some context on various lineages of preservation, it is now possible to break down what the inherent challenges, opportunities and assumptions of digital preservation are.

We can’t count on long-lived media, interfaces, or formats. Popular digital media of all kinds Disc, Disk, and NAND Flash Wafers all degrade rather quickly — in terms of years, not decades or centuries. Many of these media are relatively complex to read, so the interfaces required to interpret them are likely to not be particularly long lived. The costs of trying to either repair these media or to fix and repair interfaces to read them rapidly becomes prohibitive. As a result, traditional notions of conservation science are, outside of some niche cases, going to be effectively useless for the long-term preservation of digital objects.

Going back to the discussions of preservation lineages, this means that digital preservation is an enterprise that can only focus on the allographic digital object. While all digital information is material, the conservation of that material over the long haul is not broadly practical. Where conservation science is concerned with the chemical and material properties of mediums and artifacts, the science of digital preservation is and will be computer science. With that said, because bitstreams are always originally encoded on tangible media and then created by, acted on and interpreted by all kinds of human made layers of software they end up presenting an extensive range of seemingly artifactual and not simply informational qualities. That is, the physical and material affordances of different digital mediums will continue to shape and structure digital content long after it has been transferred and migrated to new mediums.

Section Two: Doing Digital Preservation

Chapter Four: Articulating preservation intent

What is it about the thing you want to preserve that matters and what do you need to do to make sure it is there in the future? To many, this seems like a simple question. It is not. Too often we take for granted that there is a de facto answer to this question. However, as a result of the nested platform nature of digital information and the fact that most of what we care about is the meaning that can be made from collections of objects, it is critical to be deliberate about how we answer this question in any given situation. This is why digital preservation must be continually grounded in the articulation of preservation intent.

In some cases, someone can clearly articulate this intent at the start of a project. But  for most preservation projects it is often best to be purposeful and strategic around the preservation intention. This is particularly critical given that deciding what matters most about some set of material can lead to radically different approaches to preserving and describing it.

Through examples of the diverse types of content that different kinds of cultural heritage organizations are preserving and their intent for doing so, this chapter establishes how to articulate preservation intent and how well-articulated preservation intent makes the resulting collections easier to evaluate and more transparent for future users.

Chapter Five: From Bit Preservation to Digital Preservation

Taking into account the challenges and opportunities of digital preservation, it is important to bracket the work into two different challenges: bit preservation and digital preservation. Bit preservation, ensuring authentic copies of digital objects, is the most pressing problem. Thankfully, it is a relatively straightforward problem for which there are a range of simple solutions. With that said, ensuring those authentic copies are interpretable, comprehensible and usable is far more challenging. Thankfully, this work of digital preservation is a much less time sensitive activity.

Bit preservation is accomplished by managing multiple copies of the digital objects you want to preserve, regularly comparing digital fingerprints for those files to ensure that they are all identical, repairing or replacing copies when they fail those checks, and migrating the copies to newer media and continuing to ensure that the digital fingerprints still match. With more resources, there are better ways to systematize and automate these processes, but with relatively small collections it is still possible to do this and be confident you have authentic copies as long as someone continues to mind and tend to them.

Digital preservation is much less straightforward.  The central challenge of digital preservation is that software runs. The active and performative nature of that running is only possible through a regression of dependencies on different pieces of software that are typically tightly coupled with specific pieces of hardware. Along with this, it is important to think through if there is enough context for the digital objects for someone in the future to be able to make sense of them. Two primary strategies exist for approaching these issues: emulation and format migration. Both are discussed and a case is made for why in many cases organizations are hedging their bets and pursuing both strategies.

Chapter Six: Arranging and Describing Digital Objects

The story goes that shortly after the Library of Congress signed an agreement with Twitter to begin archiving all of the tweets, a cataloger asked “But who will catalog all those tweets?” The idea of describing billions of objects was dauntingly incompressible to those who lacked experience with the nature of digital media. Like most digital objects, tweets come with a massive amount of transactional metadata: timestamps, usernames, unique identifiers, links out to URLs on the web. Like most digital objects, the tweets can largely describe themselves.

The usability of digital information will be largely dependent on how we organize, arrange, and describe it.  Arranging and describing digital objects needs to conceptually shift to embrace the nature of digital media and to recognize a distinct transition which has occurred in terms of computability. Digital media continually generates massive amounts of metadata and because it is computable, it is also increasingly possible to process digital data to derive descriptive information and metadata. As a result, arranging and describing digital content should increasingly be focused on limited amounts of expert intervention in chunking and describing content in aggregate and leaving lower levels of description to the objects themselves.

In terms of arranging digital objects, their database nature means that unlike folders in a box or books on a shelf, by their very nature digital media come with a multiplicity of orders. This complicates core archival principles around original order. It also, requires thinking through how to chunk content into reasonable and coherent sets of information that are easier to manipulate and work with as all kinds of current and future users.

In this context, it is critical to revisit the levels of description at which librarians, archivists, and curators work to evaluate in what cases something should be treated as an “item” or a “collection” and what levels of descriptive work should be employed. Given how much objects are self- describing, it makes much more sense to take up archival practices of describing content at the collection level and explaining the scope of a collection, the context of it’s acquisition, and how and why that collection was collected and preserved and to let the lower levels of description be left to the content itself.

Similarly, many digital objects actually index, describe, and annotate other digital objects. For instance, if you take all of the links that appear in articles published in the Drudge Report, the fact that the Drudge Report linked out to those sites tells you something about them. This affords the possibility of starting to think of nearly all-digital objects as both data in their own right and metadata that describes other objects. To this end, we must increasingly think of “description” and “the described” as a fuzzy boundary.

Chapter Seven: Divergent and Multimodal Access and Use

When a user in a research library asks to see a book in an obscure language a librarian will generally bring it out and let them look at it. That librarian may have no idea how to make sense of the text, but they know how to provide access to it and it is assumed that the researcher needs to come with the skills to make sense of it. At the most basic level, we can provide this kind of access to any digital objects we are preserving.

The affordances of digital media open up significant potential for access and use of digital content. At the same time, our experience with commercial software can get in the way of letting others access digital content until one can provide a simple way for any user to double click on a digital object and have it “just work.” It is critical for us to get over the assumptions that are embedded in this mentality and embrace the divergent and multimodal nature of access that digital media present us with.

This means digital preservation practitioners need to be OK with just saying, “Here it is, have at it” and also with consistently exploring the potential for new tools and methods for providing access to digital content. Even if you don’t know how to open a given file, there are a range of emerging techniques and approaches that researchers today and in the future will be able to use in working with digital content. In addition, it is important to think through the types of access restrictions or redaction of information may be necessary.

This means we should be continually exploring ways to make digital content as broadly accessible and usable as individual files, bulk aggregates and a range of other modes. Researchers are increasingly interested in approaching all kinds of digital content as data sets for computational analysis and this requires adopting new ways of thinking about access.

Conclusions: The Theory & Craft of Digital Preservation

Digital preservation is not an exact science. It is a craft in which experts must reflexively deploy and refine their judgment to appraise digital content and implement strategies that make the most sense for minimizing the most pressing risks of loss while working to make it as widely usable and useful as it can be to its’ respective audiences. At least, that is the case I have sought to make in this book. As Stacy Eardman, digital archivist at Beloit College has noted, digital preservation is much like a lyric from the song The Have Nots, “This is the game that moves as you play.”

The craft of digital preservation is anchored in the past. It builds off of the records, files, and works of those who came before us and those who designed and set up the systems that enable the creation, transmission and rendering of their work. At the same time, the craft of digital preservation is also the work of a futurist. We must look to the past trends in the ebb and flow of the development of digital media and hedge our bets on how digital technologies of the future will play out.

My former supervisor, Martha Anderson, who worked as the Managing Director of the National Digital Information Infrastructure and Preservation Program at the Library of Congress, liked to describe digital preservation as a relay race. Digital preservation is not about a particular system, or a series of preservation actions. It is about preparing content and collections for hand offs. We cannot predict what future digital mediums and interfaces will be, or how they will work, but we can select materials from today, articulate aspects of them that matter for particular use cases, make perfect copies of them, and then work to hedge our bets on digital technology trends to try and make the next hand off as smoothly as possible.



For many, this is where we find ourselves in organizations just starting to work on digital preservation.

I’m working on drafting up the syllabus for my digital preservation graduate seminar for the University of Maryland’s iSchool for this coming fall. I am a firm believer in learning-by-doing. I also think talking about digital preservation in the abstract, outside the very real resource and time constraints of organizations largely misses the point. As a result, I am planning to have each student work through a series of assignments where they serve as digital preservation consultants to small cultural heritage organizations.

My hope is that this will be a meaningful learning opportunity for the students, as well as a way for them to start building out a portfolio of work that will be relevant to potential future employers. I am also optimistic that this can be a way to provide some help to small cultural heritage organizations that could  benefit from having the additional manpower  think through and develop plans for helping to make the best use of resources to make their digital content more long-lived.

I wanted to share a draft of the series of assignments I am putting together for two reasons:

  • First, to get feedback and input on how to improve the assignment.  I’ve posted it as a Google Doc too, so if you have suggestions for it please feel free to write comments or suggestions directly into the doc.
  • Second, pairing students with individuals who are interested in participating in this work is going to be key. I wanted to circulate this document as a means to identify people and organizations interested in working with a student as a digital preservation consultant for their organization.

Requesting a Graduate Student Digital Preservation Consultant

I think the finish line for digital preservation is a little too close to the starting line here. But it get's at the idea :)

I think the finish line for digital preservation is a little too close to the starting line here. But it get’s at the idea 🙂

If you (and your organization) would be interested in having a University of Maryland graduate student in my digital preservation seminar focus their digital preservation consultant project on your organization please take a two minutes to fill in this 5 question form. I think this is a great opportunity for organizations for a few different reasons.

Here are some reasons to consider filling in the form for your organization. This project is a chance to:

  1. Solicit assistance thinking through digital preservation issues and planning for your organization.
  2. Provide a meaningful learning experience to someone just getting started in the field
  3. Learn t more about digital preservation as the student shares what they are learning through the class

Through the course of the assignments, students will;

  1. Document and review current practices with an organization’s digital content
  2. Draft suggestions for potential next steps to improve management of digital content grounded in the resources an organization has access too
  3. Draft a digital preservation policy for consideration for the organization

On the first day of class (September 1st), I will present the organizations that have filled out the survey my students. In the first few weeks of class I will help to pair each student with an organization for the semester.

If you are matched up with a student, the idea would be that you would commit to doing an interview or two with them about your organization’s collection and current practices for digital material and that you would review and provide input on several of their assignments (listed below).

I should underscore that it is completely fine for organizations to be literally at square one in terms of digital preservation practices and planning. So many cultural heritage organizations are just getting started with their digital preservation planning, and while it can be a bit intimidating to take some first steps in this space. There are many simple and inexpensive things organizations can be doing to mitigate risks of loss . The assignment will be most valuable for both students and organizations in cases where there is little current work  being done in digital preservation. As part of this project, students will be blogging about their work, so you and your organization will need to be OK with them sharing information about the project. This can be a bit intimidating, but by having students work on their public writing skills and inviting a broader audience into discussion about how to do this work in organizations it will help to ensure that the quality of that work is stronger and more useful. Through this public writing process, the results of the work will be more useful to both the student and to your organization.

What follows are details about the design of this assignment. This is also available in the google doc if you would like to suggest edits or make comments.

Digital Preservation Consultant Project

Here you can see a student, working synthesizing what they have found and drafting a plan.

Here you can see a student, working synthesizing what they have found and drafting a plan.

An academic understanding of the issues in digital preservation is necessary but not sufficient for  professional digital preservation work. Digital preservation is fundamentally about making the best use of what are always limited resources to best support the mission of an organization. As such, to really learn how to do digital preservation you need to apply these concepts in the practical realities of an organizational context.

Aside from participating in discussion of the course readings through the course blog, the other course assignments will require you to act as a digital preservation consultant for a cultural heritage organization. For a variety of reasons I suggest this be a small institution. Below are the five assignments you must complete over the course of the semester as part of this project.

  1. Identify Small Cultural Heritage Organization and Establish Partnership (by week 3): For most of the course assignments, you will need to find a small cultural heritage organization that you can work with as a digital preservation consultant. I have identified a list of organizations that are up for participating, but you are free to find other organizations as well. The key requirements here are that 1) they have consented to working with you 2) they have some set of digital content but 3)  their collections are not so complex that you couldn’t possibly do the project. Example institutions include an independent organization (like a house museum, a community archive or library), a small department or subset of an institution (say the archives of a student newspaper or radio station, the special collections department at a public library, or the archives in a museum).
    1. Deliverable: The output of this phase is to identify this organization and confirm that you have a commitment from them to participate. We will check in on this in class as we go, but by the date of this assignment you need to have confirmed participation of an organization that meets these requirements and have posted what organization you are working on in a list on the course website. On the site, post the name of the organization, your name (or handle) and two or three sentences about the organization and its digital content.
  2. Institutional Digital Preservation Survey (Draft by week 6 and send to your org, publish with their comments incorporated by week 8): For your organization, interview one or two staff members to get a handle on their digital collections and practices. Draw from the NSDA levels of preservation as an overall framework for conducting your survey. You will want to focus on gathering information about their practices in five key areas.
    1. First, what is the scope of their digital holdings?
    2. Second, how is that digital content currently being managed?
    3. Third, what are the staff at the organization’s perceptions of the state of their digital content (are they concerned about it, do they see it as mission critical or a nice to have, what do they see as their own self efficacy and their organization’s capacity for sustaining their content)?
    4. Forth, what kinds of digital content would the organization like to be collecting but currently isn’t?
    5. Fifth, what, if any resources, do they have that they could bring to bear on this problem (if they have some significant potential resources that’s great, but realize that there may well be very meaningful smaller resources that could be brought to bear. For example, could one staff member spend 2-4 hrs a week on digital preservation, could they bring in community volunteers, how much could they spend on things like extra hard drives etc.)  Throughout all of this, it will be important to understand what the organization’s collecting mission is. You want to begin to probe all the questions above, but you need to be able to map their answers to the NDSA levels.
    6. Deliverable: You will write and publish a post to the course blog (1200-3000 words) in which you present the findings of your survey. The post should first provide context, what is this organization what are its digital holdings what does it want to be collecting them. From there, work through presenting an accurate and coherent report of the themes and issues that came through in your interviews. At this point you are primarily interested in accurately representing the state of their work. Do not get into making recommendations. Simply do your best to succinctly and coherently explain what you found about the five areas of questioning discussed above. Before publishing this, you must present it to your org for their feedback to make sure you have their input on how you are describing the state of their work.
  3. Institutional Digital Preservation Next Steps Preservation Plan (Week 10): Now that you have the results of your survey, it is time to take out the NDSA levels of digital preservation and the rest of our course readings and figure out what a practical set of next steps would be for your organization.
    1. Deliverable: Post your next steps plan to the course blog (1200-3000 words). After a brief introduction providing context about the organization and its collections, you should work through reviewing  the organization’s current work on digital content using each of the areas of the NDSA levels of digital preservation. Complete by identifying three different levels (low, medium and high resource requirement) of next steps they could take to improve their rating on the NDSA levels of digital preservation. Be creative here, for example could they upload collection items to the Internet Archive or Wikimedia Commons? Or could they buy an extra hard drive and make copies and swap it with a backup buddy at another organization in a different region of the country, etc. The point here is to think about how to get them the furthest up some of the levels with the resources at hand.  Before publishing this, you should present it to your organization for them to review and provide input.
  4. Draft a Digital Preservation Policy for Your Org (Week 12): Now that you have put in place a set of recommendations, it is important to also draft up a set of digital preservation policies and practices for the organization. If this is to have any impact you are going to need to be able to articulate what the organization’s policies could be going forward.
    1. Deliverable: Drawing on the example digital preservation policies we read in class, draft up a short policy document for your institution tuned to what you have learned from working with them. Draw from the examples for models for aspects of this document. Share it with them for some input and feedback. Then Post it to the blog (800-1500 words).
  5. Reflecting on Lessons Learned (Week 13): After doing this work,presenting it, and getting feedback from your organization, you need think through what worked and didn’t work for the project. Taking time for reflection and teasing out the lessons you’ve learned about both digital preservation and working with a cultural heritage organization.
    1. Deliverable: Return to each of the documents you created thus far and synthesize 3-5 points about what did or didn’t work or what your take away lessons are from this process. Think through what you will do differently the next time you help an organization improve its digital preservation practices. Bring in references to what you’ve learned from readings in the course and from what you have learned from your classmates work on their projects (800-1400 words).

I’ve found that interdisciplinary dialog about digital preservation often breaks down when someone protests “but that’s not preservation.”

Preservation means a lot of different things in different contexts. Each of those contexts has it’s own history. Those histories are tied up in the changing nature of the mediums and objects for which each conception of preservation and conservation was developed. All to often, discussions of digital preservation start by contrasting digital media to analog media.  This contrast forces a series of false dichotomies. I’m feeling like better understanding a bit about the divergent lineages of preservation could help to establish the range of competing notions at play in defining what is and isn’t preservation.

I’m curious to start building out some of my understanding of the lineages of different kinds of preservation. So I would love if folks could share any examples of writing in this area that might be helpful. I think a lot of this context looks to be in something like Preserving our Heritage: Perspectives from Antiquity to the Digital Age (which I am still digging into.) However, I also think the story is even broader here, and that there is a media archaeology aspect that is missing. That is, my sense is that a series of old new media; like photography, film and recorded sound technologies have been interacting with ideas about what preservation is or should be for more than a century. 

What follows is not so much a coherent final product as it is me openly sharing some of my notes on different strands I see at play in this space.

  • The manuscript tradition: A situation where the allographic nature of a work is primary what matters, that something is the work if it has the same spelling and where copying is the basis of preservation. In this case, something like the Evolution of Manuscript Traditions could be useful.
  • The history of archival traditions: In this case, something like What is Past is Prologue: A History of Archival Ideas Since 1898, and the Future Paradigm Shift is useful. Also, publishing records in documentary editions vs. arranging and describing records and ideally a bit on the interventions that came with microfilming. That is, while we generally think of archives as holding unique and original records in this space there is a lengthy tradition of documentary edition work focused on publishing records and a history of photographic reproduction of records for both access and preservation purposes.
  • The history of art conservation and restoration: For example, Changing Approaches in Art Conservation: 1925 to the Present. I’ve seen a lot on the history of conservation of things like paintings. However, the history of the development of variable media art works, art installations, and works made of materials that rapidly deteriorate has resulted in very smart thinking about what it is about art works one wants to conserve. In this space, Re-collection Art, New Media, and Social Memory,
  • Preservation of dance and live performance:  There are, at this point, long standing traditions in how to preserve and document works of art that produce lived experience. In this space, the Dance Heritage Coalition‘s Documenting Dance: A Practical Guide nicely illustrates the continuity that exists between a variety of modes of documentation technologies, from textual notation, to moving image technologies to new digital methods like motion capture.
  • The history of conservation of living creatures: Everything from taxidermy and insect collecting to living collections like butterfly gardens and zoos as well as things the Svalbad Global Seed Vault. I don’t really have good resources on the history and theory here. Thinking about digging into some history of science journals. In any event, I think there is an interesting story about which techniques are intended for what purposes and what is significant about a living thing that must be preserved toward that particular purpose. That is, when and why do you pin and preserve butterflies as a collection and when and why would you choose to run a butterfly garden. So looking for any ideas folks might have for work in this space.
  • The development of historic preservation of the built environment: I know some good stuff here, like Giving Preservation a History: Histories of Historic Preservation in the United States. In this case, it’s interesting to me that some newer technologies like photogrammetry  or 3D point cloud technologies are being explored as ways to “digitize” or create recordings to preserve and document physical spaces. I find historic preservation particularly interesting in that it often focuses on turning back the clock on a particular building to make it appear as it was at a particular moment in time. In this vein, it can involve recreation and fabrication. Similarly, historic preservation connects in interesting ways to reenactment and living history. In this space, I am a huge fan of Abraham Lincoln as Authentic Reproduction: A Critique of Postmodernism which explores fascinating sets of issues around authenticity in the New Salem Historic reconstructed village and outdoor museum in Illinois.
  • The advent of recorded sound technology and the development of oral history: There is some good stuff on recorded sound technology in Gramaphone, Typewriter, Film and MP3 the Meaning of a Format but they aren’t really explicitly about oral history. In contrast, The History of Oral History isn’t so much focused on the role that recorded sound media have played in the history of oral history. The Media Archaeology work points to how our conceptions of “memory” have themselves been shaped by the advent of these new technologies. That was said of Edison’s phonograph “Speech has become, as it were, immortal” or as an article on Memory and the Phonograph from 1880 would “define the brain as an infinitely perfected phonograph”.
  • The development of photography and microfilming and preservation reformatting: There is some good stuff on this in Lisa Gitelman’s  Paper Knowledge: Toward a Media History of DocumentsIn particular, discussion on the work of the “Joint Committee on Enlargement, Improvement and Preservation of Data” a joint effort of the American Council of Learned Societies and the Social Science Research Council. Which ended up publishing Robert Binkley’s 1931 Manual on Methods of Reproducing Research MaterialsThe book is, to some extent particularly interesting in that it is a cover-page over a photo-offset printing of a type-written manuscript. To this end, the book itself illustrates how changes in the technologies for photo-duplication of documents was effecting access to documents.
  • The history of newspaper conservation: Closely related to the last point, the push to microfilm newsprint based on some of it’s inherent vices. While Double Fold is over the top, it did prompt some really great reactions, like Don’t Fold Up: Responding to Nicholson Baker’s Double Fold 
  • Scientific data and records of observations: Astronomers draw on records of observations of the motion of celestial objects dating back to the ancient world. Lorraine Daston’s “Sciences of the Archives” research group has produced some facilitating work in this vein. I like how this quote from Datson’s research group captures the continuity that exists in these traditions which bridges analog and digital practices and incorporates other new media like photography. “Since ancient times, cultures dispersed across the globe have launched monumental data-centered projects: the massive collections of astronomical observations in ancient China and Mesopotamia, the great libraries from Alexandria to Google Book Search, the vast networks of scientific surveillance of the world’s oceans and atmosphere, the mapping of every nook and cranny of heaven and earth.” They have a great 2012 paper in Osiris that works through this in more depth.

So in all these contexts, I think a few preliminary points start to emerge that I keep thinking about.

  1. Preservation’s meaning is contextual and tradition dependent: As a concept, preservation  has situated meanings in particular traditions and contexts so it’s important to really articulate what one means by the term and what traditions one is drawing on. In this vein, the different traditions have emerged in dialog with the development of media and have their own ideas of what is significant about objects for their use.
  2. Digital vs. Analog Preservation is a false dichotomy: There were already a lot of divergent ideas of what preservation meant in play before digital technology came in to play. In this vein, the intervention of digital technology is just one of a series of technological interventions which has disrupted preservation practices and traditions.
  3. New media is older than digital media: Related to the last point, various media/ technologies of reproduction (and their affordances) have had significant impacts on the traces of the past that can be created and our ability to preserve them. In this vein, scholarship in Media Archaeology focused on reinterpreting and understanding these old new media is likely of considerable value for unpacking those impacts.

So those are some working thoughts and rough notes. Curious and interested for 1) other resources you think are relevant in some of these areas 2) other ways of slicing and characterizing these points 3) other ideas about what the take aways are.


A few years back, Curatememe set out on a mission to create a space “Where Curators Curate Memes about Curation. Where will the absurdity of our use of the term Curation go next? This Tumblr speculates wildly.” I think it’s now time to declare curation accomplished. Now that curation means whatever, I thought I would curate the best of the best from the tumblr here in a listicle. It seemed appropriate. It will also make it easier for me to find one’s I want to use sometimes.

The memes are arranged more or less reverse chronologically, which offers a sense of how they developed as a body of work over time and preserves the experience of reading back into a tumblr.

It is looking like I may end up teaching a graduate seminar on digital preservation for the University of Maryland’s iSchool. There is an existing syllabus, but I will have some flexibility in terms of how I shape and design the course and I am curious what thoughts different folks have for what would be the most effective way to teach a graduate seminar on the subject.

Below are a few of the big picture course design questions I am thinking through and some of my initial thoughts on them. I’m curious for any and all input folks might have.

Organizing PrinciplesHow best to organize a digital preservation course? 

  • To what extent should such a grad seminar like this be about frameworks and principles vs examples and cases? I’m thinking that I should cover those, but I’m also thinking that too many of those models fail to address the idea that digital preservation is fundamentally about risk mitigation from future loss. That is, it’s less about a process and more about how to make the best use of available resources and identifying the best opportunities to systematically work to further lessen the risk of loss. I also think that the frameworks often get in the way of first grasping a fundamental understanding of the nature and structure of digital information and digital media. So I’m entertaining the idea of getting to the frameworks at the end as a way to understand the issues but working through the core issues first.
  • How would you organize and structure such a course? If I don’t start with the frameworks, I’m thinking it makes sense to start by working through a core understanding of digital information and digital media and work from there into the various issues in the NDSA levels of digital preservation.

Particular Tools & Software: What role should they play in the course? 

  • What approach should I take toward particular tools? On the one hand, it is very pragmatic to leave a course like this understanding how to use particular tools, but at the same time, the tools are always going to be changing and everyone needs to be able to plan for how to swap in and out different tools to meet the underlying objective. In my digital history courses I have required students to each figure out how to use and then teach the class how to use particular tools and software. I like this approach as teaching yourself how to use new software and evaluating it is an important skill in it’s own right. With that said, some of the digital preservation tools out there are complex enough that I’m not entirely sure this method would do them justice.
  • How much should a course like this require/push students to develop some basic command line literacy? My sense is that many student’s will not have this, but it is challenging to think through how to do much work in this area without that. With that said, the course isn’t about developing that command line literacy, so I’m not sure how far to delve into this kind of thing.

Kinds of assignments: What would be the most useful for the students? 

  • I’m curious for what folks think would be the most useful kinds of assignments. I’m thinking that given the context of planning for risks and the need to make such plans inside the constraints of an institution that it might make the most sense to have students serve as consultants for small cultural heritage organizations and have them develop plans for options to improve their approaches to ensure long term access to their digital content. So I think many of the assignments might be fit around that. With that said, I am curious for any other ideas for how to either improve this idea of a course project or for other kinds of assignments.
DCHDC 2012

Folks at one of the first DCHDC meetups, September 2012

Four years ago, some of my colleagues in the NDIIPP program thought it could be neat to try and start up a monthly meetup for digital cultural heritage professionals in DC. Butch Lazorchak found a bar in DC that would give us free space upstairs once a month and signed up for a meetup account and we were off.

I love that we ended up sparking something that has become an anchor monthly event for folks from libraries, archives, museums, universities and related non-profits to share ideas and perspectives. I know it’s been a key element in various people finding internships and jobs and for sharing ideas and approaches to working in this area. To that end, I decided it would be worth looking back and checking in with folks who have joined the group. So a few months ago I put together a survey.

4 Years, 40 Meetups, Almost 500 Members


Jamie Mears talking about personal digital archiving at DCPL at a recent meetup.

Over the last four years the Digital Cultural Heritage Meetup group has hosted more than 40 meetups. It seemed like a good point to do some legwork to figure out how the meetup is working.

The meetup continues to draw anywhere between 20-30 some folks a month and I thought it would be useful to survey the 492 people who have signed up to follow the meetup. The loosely organized group of folks who organize the events are working to improve them based on the survey.

Along with that, I thought folks in other cities might be interested in the results too. For an event that makes use of free space and takes a bit of time each month from a handful of people to volunteer to organize I think it has been having a rather substantial impact on the scene in DC.

Info on the survey sample

68 people responded to a survey I put together. This is less than 10% of the total set of people who have signed up to the meetup, but given the way meetup works I would hazard to guess that something like 60 or 70% of the people who signup for the meetup don’t ever end up coming. This is to say, I think responses from 68 people likely give a good view into the whole of who participates.

In the interest of transparency, you can see survey results (PDF), download the tabular data an see what the survey form looked like. As an aside, I would love to see other people take a look at the responses and write up their own reactions and interpretations of the survey results. Along with that, I would love to get further discussion of the results of the survey in the comments on this post.

Who participates in DCHDC and to what extent? 

Survey respondents represented a range of different profiles of DCHDC participants both in how frequently they participate and in where they are at in their careers.

In terms of the frequency of participation, they represented a range of levels of engagement.

  • 19 had participated more than six times,
  • 12 had come at least 4 times,
  • 21 had come two or three times,
  • 14 had participated once
  • 1 respondent had never participated

In terms of their stage in their careers, the survey mostly drew in folks who were either established professionals or in the first five years of their careers.

  • 3 respondents were current students,
  • 25 were in the first five years of their career and
  • 38 were established professionals who had worked in their field for at least five years

There weren’t that many students, but I think that likely represents trends in who participates. What I would note here is that this underscores how well the meetup functions as a middle ground between established and emerging professionals. I would also underscore that the students who do come have clearly gotten a ton out of being able to network with established and early career professionals. So grad students, if you’re listening, I think there is a huge opportunity here for you.

How DCHDC Matters

Across the board, respondents to the survey were largely united on the positive aspects of participating in DCHDC. For those participating, it seems clear that there is consensus that it has become a community that plays an important role in their careers.

  • 97%  of respondents either agreed or strongly agreed that through DCHDC they have learned about projects and issues that are relevant to their work.

  • 97% of respondents reported that DCHDC has become a community they value participating in.

  • 95%  of the respondents either agreed or strongly agreed that participating in DCHDC has expanded their professional network.

  • 80%  of respondents either agreed or strongly agreed that Participating in DCHDC has made them more aware of career opportunities.

Examples quotes of how DCHDC has been Helpful:

The free text responses that respondents provided give some of the best specifics of what has been working about the meetup. I thought I would include some of those inline here.

  • Connecting with professionals at different stages in their careers: “As I am just now beginning a career as a librarian specializing in digital preservation, having the opportunity to hear presentations on related to this area in librarianship is really helpful, as it is still evolving (and will continue to do so). Furthermore, actually having the ability to speak with individuals about their workflows, the politics of advocacy, standards, etc. has enabled me to gain a better understand my work.”
  • Finding professional opportunities: “DCHDC helped me get an internship in my field and lead to a greater understanding of what kind of jobs were out there and what direction I’d like to head in. Beyond the networking and professional advancement aspects, DCHDC has given me the opportunity to learn more about technology and aspects of cultural heritage that weren’t touched on in my program. While I have been unable to attend DCHDC in recent months I speak highly of it and recommend it often.”
  • Getting perspectives from outside a particular field: “I’ve strengthened my digital humanities network outside of the museum sector, and I’ve been able to bring a digital humanities perspective to my museum work. (I also discovered that one of my DCHDC friends was living upstairs from me. :))”
  • Personal/emotional support in career pathways: “The breadth of my knowledge has been expanded. I’ve made friends that have helped me emotionally through some hard career-related stuff. DCHDC has also helped me maintain consistent relationships with key people in the community, and I truly believe this helped me get my last job.”

Common Requests for improving the Meetup:

Along with understanding what people were getting out of DCHDC, I was also interested to learn a bit about how to improve the event.

  • Shorter talks: Originally the idea was to do 5 minute lightning talks, but over time they have become longer. So we decided to shift back to short talks with a quick bit of time for Q&A.

  • Further planned out schedule: This is an entirely volunteer run and organized event series, so planning is a bit of a challenge. That said, if we can get better at lining up a schedule then folks can make sure they plan on coming to weeks that are of particular interest. I think it will also help to bring new folks into the fold who might be drawn in by a particular topic.

  • Recaps/notes/links from talks shared online: This was a request that came through from several people. I don’t have the bandwidth for it, but if anyone wanted to take on something like this it would be welcome and appreciated.

Example Suggestions from Survey: Below are some examples from the survey responses of particular individual requests.

  • “I think it’s great as-is! I’m happy with whatever meeting time/place. earlier was mentioned last time like 6:00 and that would be great too.”

  • “Back to the short talk format. An hour lecture that starts after 7:30 pushes the event FAR TOO LATE, and removes the opportunities for networking and socializing that the above questions address. 20 minutes socializing, 20 minutes max presentation (including Q&A), a few minutes for announcements, and 20 minutes to whenever for after-socializing and closer convo with presenters would make it much more valuable.”

  • “At the beginning dchdc had *very* short presentations bookended by plenty of time to meet people and have free-ranging discussions. It seems that over time the presentations have gotten longer and more dependent on power point. Since not every presentation is relevant to everyone in the community, some might be less likely to attend based on topic, whereas before they might come just for the excellent company.”

  • “Be clearer on the MeetUp about time to socialize vs. presentation time, so that people who want to chat know they either need to come a little early or stay later. When the only information is the time and who’s talking, it makes it seem like the talk is at that time or just shortly after.”

  • “Scheduling or at least soliciting ideas for presentations a bit further in advance could be a good step. A formal call for ideas/volunteers could help bring some new faces and organizations to the fore. That said, I really have no idea how scheduling works and don’t want to mess up a good thing.”

  • “Given the locations in which we’ve met over the past couple of years, a consistent audio-visual/computer setup is key. Ad hoc talks are valuable but are almost always enhanced with graphic examples.”

  • “A Facebook group or some other way to share information about jobs, events, etc in between meetings would be a good supplement.l, especially for when we can’t make it to meetings in person every time.”

Distributing Credit for DCHDC

I should note that while I’m one of the co-organizers for the group since the beginning,  I have not been one of the folks who have really carried the water on this. At various points I’ve missed big chunks of the meetups when I have had to teach classes that meet on the same night. On that front, Bill Lefurgy gets credit for scheduling and running the events for most of our run so far. This is a touch which has recently largely been passed to Atiba Pertilla.  There are also several other folks who have been involved almost all the time and stepped up to run events at various points, I’m thinking of Jennifer Serventi and Patrick Murray-John. There are probably about 5-7 more folks I could list out here, but this is just to say that I think the strongest part of the group comes from a core set of folks that are incredibly generous with their time and welcoming to anyone and everyone who we can encourage to participate.

Going forward

The survey largely confirmed the things I hear from lots of folks about what is valuable and useful about this group. I don’t think we had any clear expectations of what this would be when or how long it would run when we launched it. But here we are, four years out, having moved between three different venues and still going strong. I’m personally very excited to see how this keeps going into the future and always interested in talking more with folks about how it can be improved/enhanced. I’m also happy to talk with anyone who might want to set up similar meetups in other places.

The Insights Interviews: First Person Perspectives on Ensuring Long Term Access to Our Digital Heritage

Back when I was working for the Library of Congress I did, and helped coordinate, a ton of interviews with practitioners and thinkers working in digital preservation for the National Digital Stewardship Alliances innovation working group. At one point, there was discussion of making a book out of the then 33 interviews. As with many ideas, it stalled out at some point. In any event, I worked up an intro for that and a table of contents at one point. So I figured I would just post that here, as I think it makes the interviews a bit easier to navigate. Together they form a whole that is, I think, more useful than just looking at them as part of a serial publication. For context, I wrote the intro below in 2014 and the interviews range from 2011-2014.From the initial draft I also added a set of  9 additional fantastic interviews that Julia Fernandez did as a Junior Fellow focused on understanding, documenting and preserving digital culture. I also added in links to guest posts that Sharon Leon and Mackenzie Smith wrote about approaches to developing open source software that are slightly different in that they predate the focus on interviews as an approach.

I don’t claim the credit for this massive amount of work. A ton of people did a lot of work on running, planning and coordinating these. Off the top of my head Jane Mandelbaum, Martha Anderson, Abbey Potter, Erin Engle, Butch Lazorchak, Jefferson Bailey, Lori Emerson, Julia Fernandez, Ricky Padilla, Barbra Taranto, come to mind as people who either did significant work in running or coordinating interviews.  I know there are many others from the NDSA innovation working group who contributed to doing these as well. 

The Insights Interviews:First Person Perspectives on Ensuring Long Term Access to Our Digital Heritage

Innovation can be a terrible buzzword. It can be a stand in for flavors of the month, and trendy ideas on the upswing of this year’s hype scale. With that said, it remains a critical concept. Particularly in a field like digital preservation where the idea of even keeping up with the scale and deluge of digital media along with an ever changing series of new forms, formats, tools and platforms is often dizzying and overwhelming. In some of my first conversations with Jane Mandelbaum, the Library of Congress co-chair for the National Digital Stewardship Alliance Innovation working group we struck on the idea of focusing on how exactly people are making it work.

Across a range of disciplines and areas an amazing set of professionals have emerged to ensure long term access to digital information and were doing amazing things with that information. When we did our first interviews for the then new Signal digital preservation blog I had no idea how useful and valuable many of them would become as touchstones for our field. Some of these interviews were topical and primarily of interest in the moment, but many of them share important and profound insights (the term the then Director of NDIIPP Martha Anderson suggested for the series). When an NDIIPP colleague approached me about helping to shape a volume out of the best of these interviews I thought it was a great idea. In reflecting on them, I think there are four particular cross cutting reasons that these interviews organized as they are here, are particularly useful for emerging and established professionals in and around the work of digital stewardship.

First Person Perspectives from an Emerging Interdisciplinary Field

Everyone in this volume has launched or established a career in this new and interdisciplinary field of work. As digital technologies reshape work across every sector ensuring long term access to information now touches on nearly every sector. Our education and training systems are responding to these changes, but beyond that, it is invaluable to use these interviews as a point of entry into the work of individuals in this field. In this respect, every interview here is an opportunity to understand someone’s career trajectory and in many cases a chance to gain insight into the skills and knowledge required to take on the hybrid roles that many of these innovative individuals are engaged in. In this respect, the collection is of particular interest to young professionals and students looking to establish the course for their careers.

Practical Dispatches from the Front Lines of Digital Stewardship

A considerable amount of ink and pixels have been spilt over theory of digital stewardship. Models, frameworks and certification criteria abound. These are great resources, but given the rapid pace at which technologies and systems are evolving, understanding how individuals are working to ensure long term access to digital information provides insight into how people on the ground are actually making this work happen. In this respect, each interview in this volume is an illustration of how theory comes into practice. Each interview provides a firsthand frontline narrative of how the models and frameworks of the field are calibrated into the messy realities of resource constraints and practical limitations of the world.

A Cross Section of Work and Issues Involved in Digital Stewardship

Big picture strategy, perspectives of content specialists, exploration of issues in the design and maintenance of infrastructure and systems, needs and desires of researchers scholars and other end users. While the interviews were not done to create a comprehensive picture of the field, as they have accrued over time when we set about sorting out the best of them into different buckets I was thrilled to see how well they covered the waterfront of collecting, organizing, preserving and providing access to digital information.

Disaggregating Digital Stewardship and Preservation

It’s not as tidy as it would be if this all hung together from a single perspective, there is a lot of messiness in the different objectives, frameworks and perspectives which different participants come from. That is something which I think is a particular strength of the volume. The rhetoric of the digital often makes it seem like we should be moving into the clean lines and clear cut universe of a science of digital stewardship. But when we zoom in to the work at each layer of the infrastructure for digital stewardship we are building it becomes evident that the same professional values and approaches that made for idiosyncratic visions of preservation and access in the past are just as present in our digital future as they were in our analog past. Digital stewards engage in their work toward differing objectives through differing means. For instance, considerations about the authenticity of digital artworks are not the same as concerns about the authenticity of electronic records.

Table of Contents

Chapter One: Digital Strategy

  1. Digital Strategy Catches up With the Present: An Interview with Smithsonian’s Michael Edson August 9, 2012
  2. Open Source Software and Digital Preservation: An Interview with Bram van der Werf of the Open Planets Foundation, April 4, 2012
  3. Solving Problems and Saving Bits: An Interview with Jason Scott, August 20, 2013
  4. Digital Humanities Connections to Digital Preservation: Interview with Brett Bobley of the Office for Digital Humanities at the NEH, October 11, 2011

Chapter Two: Understanding Digital Objects

  1. BitCurator’s Open Source Approach: An Interview With Cal Lee, December 2, 2013
  2. What’s a Nice English Professor Like You Doing in a Place Like This: An Interview With Matthew Kirschenbaum August 12, 2013
  3. Media Archaeology and Digital Stewardship: An interview with Lori Emerson, October 11, 2012
  4. Archives, Materiality and the “Agency of the Machine”: An Interview with Wolfgang Ernst February 8, 2013
  5. Historicizing the Digital for Digital Preservation Education: An Interview with Alison Langmead and Brian Beaton, May 6, 2013

Chapter Three: The Curator’s View

  1. Web Archiving and Mainstreaming Special Collections: The Case of the Latin American Government Documents Archive, June 6, 2012
  2. Crossing the River: An Interview With W. Walker Sampson of the Mississippi Department of Archives and History, December 9, 2013
  3. ArtBase and the Conservation and Exhibition of Born Digital Art: An Interview with Ben Fino-Radin May 1, 2012
  4. Exhibiting Video Games: An interview with Smithsonian’s Georgina Goodlander September 25, 2012
  5. The Digital Data Backbone for the Study of Historical Places”: An Interview with Matt Knutzen of the New York Public Library, February 27, 2013
  6. Challenges in the Curation of Time Based Media Art: An Interview with Michael Mansfield April 9, 2013
  7. Insights Interview with Beverly Emmons, Lighting Design Preservation Innovator February 10, 2012
  8. Born Digital Archival Materials at NYPLBorn Digital Archival Materials at NYPL: An Interview with Donald Mennerich, April 22, 2013
  9. Curating Extragalactic Distances: An interview with Karl Nilsen & Robin Dasler, August 18, 2014

Chapter Four: Designing Infrastructures

  1. Engineering Digital Preservation: Interview with David Rosenthal, June 15, 2011
  2. Lessons Learned for Sustainable Open Source Software for Libraries, Archives and Museums, September 15, 2011 (From Mackenzie Smith)
  3. Hydra’s Open Source Approach: An Interview with Tom Cramer, May 13, 2013
  4. Digital Stewardship and the Digital Public Library of America’s Approach: An Interview with Emily Gore, October 28, 2013
  5. The Foundations of Emulation as a Service: An Interview with Dirk von Suchodoletz, December 11, 2012
  6. WWI Linked Open Data: An Interview with Thea Lindquist, July 29, 2013
  7. Toward a Library of Virtual Machines: Insights interview with Vasanth Bala and Mahadev Satyanarayanan, September 21, 2011
  8. Imagine What We’ll Know This Time Next Week: An Interview with Bailey Smith and Anne Wootton of Pop Up Archive, December 6, 2012

Chapter Five: Working with the Public

  1. Crowdsourcing the Civil War: Insights Interview with Nicole Saylor, December 6, 2011
  2. Understanding User Generated Tags for Digital Collections: An Interview with Jennifer Golbeck, May 1, 2013
  3. Galleries, Libraries, Archives, Museums with Wikipedia (GLAM-Wiki): Insights Interview with Lori Phillips, April 20, 2012
  4. The Metadata Games Crowdsourcing Toolset for Libraries & Archives: An Interview with Mary Flanagan, April 3, 2013

Chapter Six: Scholar and Researcher Perspectives

  1. Quest for the Critical E-dition: An interview with Leonardo Flores, March 20, 2013
  2. Machine Scale Analysis of Digital Collections: An Interview with Lisa Green of Common Crawl, January 29, 2014
  3. Sharing, Theft, and Creativity: deviantART’s Share Wars and How an Online Arts Community Thinks About Their Work, September 17, 2012
  4. Astronomical Data & Astronomical Digital Stewardship: Interview with Elizabeth Griffin, October 8, 2014

Chapter Seven: The Digital Vernacular and Digital Folklore

  1. Born Digital Folklore and the Vernacular Web: An Interview with Robert Glenn Howard, February 22, 2013
  2. Understanding Folk Culture in the Digital Age: An interview with Folklorist Trevor J. Blank , June 30, 2014
  3. LOLCats and Libraries: A Conversation with Internet Librarian Amanda Brennan, July 14, 2014
  4. Understanding the Participatory Culture of the Web: An Interview with Henry Jenkins, July 24, 2014
  5. Computational Linguistics & Social Media Data: An Interview with Bryan Routledge, August 1, 2014
  6. Networked Youth Culture Beyond Digital Natives: An Interview With danah boyd, August 11, 2014
  7. Netnography and Digital Records: An Interview with Robert Kozinets, August 13, 2014
  8. Research is Magic: An Interview with Ethnographers Jason Nguyen & Kurt Baer, August 15, 2014
  9. Studying, Teaching and Publishing on YouTube: An Interview with Alexandra Juhasz, September 5, 2014
  10. Archiving from the Bottom Up: A Conversation with Howard Besser, October 10, 2014



Huge thanks to everyone who shared ideas about what to include in my upcoming digital art curation grad seminar. I’ve decided to use the same course blog that I’ve been using for my digital history seminars, so if you haven’t already, you can tune in to what we will work on at

I’ve embedded a copy of the draft syllabus below and I think I have more or less all of the readings in this Zotero collection.

Curation and Conservation of Digital Art Syllabus

Getting Out There: 2015 in Review

Showing off a red velvet cupcake with the POTUS seal on it at the White House.

Showing off a red velvet cupcake with the POTUS seal on it at the White House.

Another year. Another chance to do a quick look back and make sense of what I’ve been doing and where I think it’s taking me. As I did in 2012,  2013, and 2014, I am taking a few minutes to try to sift and categorize. So if you are interested in a recap of things I’ve done this year this post is for you, if not, I imagine you have already decided to stop reading.

Looking back, I feel like the move from the Library of Congress to IMLS has been a huge chance to better connect with and learn from the field. While NDIIPP was always outward facing, it was still inside an institution that acts as such a center of gravity that it was challenging to really be out there. In contrast, as the core role of IMLS is to serve and support libraries and museums across the nation it has been exhilarating and rewarding to be out in these communities much more.

Dropping the “IIP”: From NDIIPP to NDP

Presenting a framework for the National Digital Platform at the IMLS Focus convening at DCPL's MLK library.

Presenting a framework for the National Digital Platform at the IMLS Focus convening at DCPL’s MLK library.

I started the New Year with a new job. I left NDIIPP to “head National Digital Platform  responsibilities across programs” at the Institute of Museum and Library Services. As NDIIPP stood for the National Digital Information Infrastructure and Preservation Program, I smile a bit thinking that even though I was changing jobs I was keeping the first two words of the program and dropping two “i’s” and a p.

My last day on the job at the Library of Congress was New Year’s eve 2014. In the four and a half years I spent at the Library of Congress I had amazing opportunities to work and learn, and made a lot of friends and colleagues I know I will have for the rest of my life. With that said, it was just impossible for me to pass up the chance to be a part of the emerging National Digital Platform work at IMLS.

IMLS and I go back. When I started working at the Center for History and New Media in 2006 my job was, in part, funded by a IMLS national leadership grant. You can see a bit of what we were up to in this interview I did for the IMLS blog in 2007. Over the years I’ve given talks at the IMLS WebWise conferences and had the opportunity to review for the agency. In all those interactions, I was consistently impressed by all that the IMLS team could accomplish.

In my first year at IMLS, I’ve had the chance to co-develop and publish a vision for the priority, shape a convening and the resulting report on the National Digital Platform priority, and support IMLS investing nearly ten million dollars in more than a dozen grants and cooperative agreements. As a push for transparency, I’m also thrilled that we were able to publish both the first and second round of funded projects proposals online.

Through all of this, I have been so lucky for guidance and leadership from my boss Maura Marx and the insights of my colleague and constant collaborator Emily Reynolds. Along with that, I’m thrilled to find myself surrounded by the dedicated and exceptional staff of the Office of Library Services and the rest of the agency.  The experience has confirmed what I’d always imagined, that I really like helping people think through and refine ideas for their projects and work and thinking about how different areas of research and practice connect and add up to more then the sum of their parts. I can’t imagine any place where I could get to do exactly that kind of work and help support all kinds of libraries across the country keep advancing in the 21st century.

Teaching Digital History & Digital Curation

Outside the office, I was thrilled to be able to continue teaching. I was able to teach a digital history graduate seminar in the Public History program at American University and as a special topics course for the University of Maryland’s iSchool’s digital curation program.  I was totally impressed by what my students were able to do on their projects over the course of a semester. I also started developing a digital art curation and conservation course, which I will be teaching at UMD in the Spring.

Digital History & Preservation: Research, Writing & Speaking

A stack of the author copies I received after my book, Designing Online Communities, was published.

A stack of the author copies I received after my book, Designing Online Communities, was published at the begining of the year.

My book, Designing Online Communities, dropped! and some super smart people claim I said some smart things in it. Along with that, I wrote about the history of transparent gif’s in web archives, and about the implications of distant reading for developing digital infrastructures to support computational humanities scholarship.

My essay Zombies on Flickr, Lego, Handcraft, and Costumed Zombies: What Zombies do on Flickr, was published in New Directions in Folklore. An article I contributed to exploring learning in makerspaces was published in the Harvard Educational Review. I reviewed Preserving Complex Objects for the Journal of Academic Librarianship. I drafted an essay titled Digital Sources & Digital Archives: The Evidentiary Basis of Digital History for a forthcoming Companion to Digital History.

The big talk this year was People, communities and platforms: Digital cultural heritage and the web at the National Digital Forum in New Zealand. Aside from that, I planned and ran a daylong workshop on Roles & Responsibilities for Sustaining Open Source Platforms & Tools at the International Digital Preservation conference.

Love is... ...accepting he's a zombie, featured in my Flickr Zombies article. by _Matn.

Love is… …accepting he’s a zombie, featured in my Flickr Zombies article. by _Matn.

I gave a lot of shorter talks about the National Digital Platform priority at a range of conferences including Linked Data for Libraries, Museum Computer Network. Along with that, I wrote up some of my take aways from five conferences I participated in as posts for the IMLS Blog.

These included,

All told, it has been a really great year.

Posted in Uncategorized | 1 Comment

Below is a draft of an essay I am contributing to a forthcoming book titled A Companion to Digital History. I have permission to share drafts on my personal website, so I thought it would be good to get this up and out there 1) for folks to be able to read it and 2) to see if I could get any substantive commentary and discussion about it to help me revise it. If you would like, you can comment directly on the draft in this google doc.

Digital Sources & Digital Archives: The Evidentiary Basis of Digital History

In an early draft of my undergraduate thesis I wrote that a source “spoke for itself.” My advisor crossed that out and wrote in the margin something like “sources almost never speak for themselves, you have to explicate what the source means for your argument and justify your interpretation.” I imagine this sort of experience is how many individuals learn the ropes of historical research and writing. The task of the historian is to interpret sources.

The world is full of objects, archives, records and texts which historians can study and interrogate to develop and refine our understanding of the past. These are the primary sources of history; materials, relics, and texts, that testify and provide traces of the past. Almost anything could be a primary source. The rings of a tree testify to weather conditions and changes in climate. Probate records document the material goods individuals held at the end of their lives. Court proceedings offer insight into the experiences of the oppressed through the moments they are dragged in front of the justice systems that control and marginalize them[1]. Just as any kind of physical object might serve as a source, as society increasingly produces digital relics, documents, artifacts and other objects the evidentiary basis of history will become increasingly digital.

While things like the rings of a tree have their own value as historical sources, the bulk of historical work continues to be anchored in archives. Historian’s ability to study the past is largely directly indebted to archivists and the range of individuals involved in the production and management of historical records. Archives come in all shapes and sizes; massive federal agencies, small local historical societies, manuscript collections at research libraries to name a few examples. The same digital shift occurring in sources is occurring in archives.

At this point, historians have access to an ever-expanding wealth of digitized versions, or digital surrogates, of a selection of primary sources through online collections. At the same time, an explosion of born-digital materials is being produced and collected at unprecedented scale (websites, the contents of a hard drives, collections of emails, digital video and photos, etc.). While these new forms of sources are emerging so to are notions of digital archives. Organizations like the Internet Archive, and projects like the September 11th Digital Archive, and the Rossetti Digital Archive have emerged with the archive name attached. However, each of these varieties of digital archives represents a somewhat different vision of the nature of the concept of an archive.

So, what happens to history when the basis of its sources and evidence becomes increasingly digital? Similarly, what happens to history when it’s archives become digital? Backing up a bit, given how the very form of archives as institution is anchored in the management of paper documents, what does it even mean to have a “digital archive”? What follows is an attempt to identify and discuss issues in the evidentiary basis of history that arise as the materials and systems that manage those materials become digital. In looking at different kinds of sources and archives I work to suggest practical advice on the kinds of issues and questions one should ask when working to interpret, to find out what one can say, based on digital sources and digital archives.

What are Digital Sources?

When you hold a letter in your hand and read the words on it you can imagine what it was like when the recipient of that letter held it in their hands in the past. As an interpreter of the record, you can think about what it must have been like to receive it and follow a chain of correspondence to understand the exchange of thoughts and ideas. How does this interaction change when you have a digitized copy of a letter? Similarly, how does it change when you are looking at the text of an e-mail message?  

Making sense of a source and making a defensible inference based on the content of a source requires context. That is, knowing a letter was sent from one individual to another and that you found it in the papers of the recipient you can likely infer that it represents a perspective that the author wanted to communicate to the recipient and you likely have reason to assume that the recipient read it. In contrast, if a historian of the future had access to an archived copy of my Gmail account they would need to know a bit about many of the automated rules I’ve set up that “mark as read” emails from a range of individuals and organizations and in some cases are set to “skip my inbox” entirely. So without knowing about those rules one could end up making all kinds of problematic inferences about what I had or had not read based on what was in my email. Understanding my email thus requires an understanding of how people like me used email at a particular period of time and the set of features and functionality that different email clients came with.

As Martha Howell and Walter Prevenier explain in their introduction to historical analysis of sources “to make wise choices among potential sources, historians must thus consider the ways a given source was created, why and how it was preserved, and why it has been stored in an archive, museum, library or any such research site.[2]” The same kinds of questions need to be asked of digital sources. This is particularly challenging given that pace of change in the mediums and context of communications technologies seems to continue to accelerate. Historians need to develop an understanding of digital source criticism and provenance.

Digital Source Criticism & Provenance

Given the range of digital sources and the complexities of their production and use the future of historiography will require a good bit of work in digital source criticism.[3] German historian Johann Gustav Droysen’s 1867 book Outline of the Principles of History explains the concept and importance of source criticism as a part of historical practice. The task of Criticism is to determine what relation the material still before us bears to the acts of will whereof it testifies. The forms of the criticism are determined by the relation which the material to be investigated bears to those acts of which will gave it shape.[4]”  That is, a key part of historical research and writing involves not simply identifying sources of history but working to understand the context in which they were produced.

To this end, working with digital sources prompts the historian to ask the same kinds of questions they have long asked of sources. What is a sources provenance? How it was created and stored? Why does it persist today? These kinds of questions are essential for interpreting a source. This is not simply an issue for those studying society after the advent of computing technology. There are a range of key source criticism questions to should ask of both digitized primary sources and born digital sources. What follows is an exploration of some of the key issues for consideration related to both of these kinds of sources.

Digitized Primary Sources

For anyone studying the world before the emergence of digital media the primary role that digital media will play is as a transmitter of digital surrogates. Libraries, archives and museums have now been actively digitizing sources for thirty years and the result is that one can find millions of digital surrogates of books, maps, photographs and manuscripts in a range of online digital collections. In working from these sources there are a few critical questions to ask from the perspective of provenance and source criticism.

Why was this digitized and not something else?

It has always been important for historians to ask why a particular source has been preserved. It is critical to think through why we have access to some kinds of sources and not others and this is a key part of that reasoning exercise. The same kind of selection questions needs to be asked of any digitized source.

In some cases, archives have digitized full runs of materials; in other cases they have digitized highlights or selections. Generally, libraries, archives and museums have only digitized a sliver of their entire holdings. It’s not enough to find a source, one must be able to contextualize it and understand why they have it at hand and as such it’s important to think through the kinds of limitations on inferences one can make from something based on what you know about the digitization policies of a given organization.

For example, because of copyright restrictions many institutions in the United States are focusing efforts on digitizing materials from before 1923. Or similarly, an archive might have the rights cleared to digitize one particular collection, or the writings of one person instead of another. In each of cases if one want’s to work primarily from digitized materials it is critical to think through how the selection policies for what was digitized can shape and limit one’s ability to make inferences based on those materials.

Is this copy of significant quality for my purpose?

All digitized objects are surrogates for the originals. That’s fine. Historians have a long tradition of working from surrogates. In many cases, the only access historians have to extent historical materials is through copies of reprintings, and copies of copies created through the manuscript tradition. Similarly, when microfilm technology developed in the 1930s historians were thrilled with the prospect of reproductions of sources. Public historian Ian Tyrrell used the same rhetoric often used regarding digitization and the web to describe microfilm in the 30s. In his words, microfilm “democratized access to primary sources by the 1960s and so put a premium on original research and monographic approaches.[5]” The reproduction of sources played a key part in historians increased focus on working from primary sources. In this vein, it’s worth remembering that the development of the technologies that provide access to sources will continue to play a role in shaping the norms and expectations of the composition of history. So, surrogates are nothing new, in many ways they are the norm for many areas of historical practice. With that said, it’s always critical to ask if the surrogate is good enough for the questions a historian is asking.

Historians often want to do straightforward things with a source. So if one wants to be able to say an individual wrote a particular thing in a particular document then as long as you can make out the words in a digitized copy of something that is likely enough. In this case, it is worth differentiating the informational qualities of a source from its artifactual qualities[6]. The informational qualities of a source are generally the words inscribed on it. The artifactual qualities of a source can consist of any number of different features one might study. As historians have become increasingly interested in sources as part of material culture the need to consider artifactual qualities has become increasingly important. Every physical object contains a nearly infinite amount of information in it’s artifactual qualities. For example, beyond the legibility of words on an object, characteristics of handwriting, fingerprints, watermarks, the chemical composition of inks or of paper or vellum can all be interrogated to provide valuable information. All of that information is anchored in the artifactual qualities of the source.

As an example, you can find some rather ugly looking, but for the most part legible, copies of Hamlet in Early English Books Online. They are black and white images created from scans of old microfilm. You can also find much nicer looking copies of the same work in the Folger Shakespeare Library’s online collections. If what you care about is the text of the work, you are mostly fine in either case. With that said, researchers have used high quality full color scans, like those Folger provides, to study the placement of dirt on the margins of the page. The dirt on the pages, which comes from people handling the books, attests to the use of the books over time. That is, there are material traces of use of the books left on them that can be studied. Most interestingly, it can actually only be study when high quality scans of the book are created. That is, aspects of the source only become available for analysis through the production of a very high quality digital surrogate. To that end, the better quality the scans the more potential there is to examine traces of other physical properties of a source[7]. The question for someone working from a digitized surrogate of a source is thus are the significant properties of the source necessary for the sorts of questions you are interested in asking present? Similarly, it is important to consider how some aspect of the quality of a source might be obfuscated in how it was digitized or provided.

How did I find it and how does that effect what I can say about it?

At this point one can visit the Library of Congress, the Digital Public Library of America, Europeana or Google Books on the web and plug in some obscure search terms and find digital surrogates of records, artifacts and a variety of other primary sources. This is amazing. You can find things that you would never have been able to find before[8]. Searching across millions of sources at once is transforming many historians’ methods for research and scholarship[9]. At the same time, full text search presents a whole new set of challenges for reasoning from and interpreting sources.

Where in the past one would develop an explicit strategy to explore a given collection or archive, or to systematically look at all the newspapers from a given date range, search encourages researchers to stumble around and find something that looks interesting. This is all fine if all one wants to do is make an existence proof argument. That is, if one just wants to make the case that something was said at a particular point in time. However, this is a rather low bar for historical argumentation. The extent to which something is representative of a particular moment in time, or a particular community or place is tied explicitly to a range of contextual questions.

To be able to make broader claims based on a given source it is important to work to contextualize it after it is discovered through search. Feel free to search for idiosyncratic terms, to as Stephen Ramsey suggests, “screw around” in searching through digitized sources. However, it then becomes necessary to do the legwork required to understand the original context from which that source emerged and think through the limitations that come from why that source was digitized and not something else. To do this, it is necessary to work backward from a digitized source to understand where it came from and the extent to which it is or isn’t representative of the collection it comes from.

Born Digital Sources

Born digital is the rather clumsy term we have to discuss sources that started off digital; email messages, digital photographs, websites, databases, etc. Going forward, the bulk of the primary sources historians will work with to understand the world in the 21st century are going to be things that started off digital. This is not to suggest that we will every get away from paper sources, but it is to note that much of that paper source material will have started out as digital as well. In those cases, the paper will often be a surrogate for the digital. While archivists and historians are still only just figuring out how to collect, preserve and provide access to born digital primary sources there are already a set of emerging key questions to ask of such sources. What follows is an initial exploration of some key source criticism questions to ask of born digital sources.

What are you not seeing on the screen?

When working with digital objects it’s essential to remember that what they look like on the screen is a performance[10]. The actual digital object is a sequence of markings registered on a medium. Hard drives, CDs, flash drives, etc. are all things that register sequences of markings (bits) that are read by software to show up on a computer screen. In any digital file and any digital file system there is additional encoded information that one could be looking at and reading.

In contrast to looking at a hand written letter, where you can see how hard someone pressed and get a feel for their handwriting, when one looks at an email message on a screen all you see is the words. However, if you poke around in the email headers, or in the metadata associated with a message you can find a wealth of information that isn’t rendered on the screen. New media scholar Nick Montfort has deemed the focus on what things look like on the screen “screen essentialism” and a growing body of work is emerging to provide basic tools and approaches for getting beyond simply taking things as they appear[11]. Two examples of working with particular primary sources will help underscore what historians have to gain by getting beyond screen essentialism.

When curator Doug Reside first opened a file he found on a floppy disk in playwright Jonathan Larson’s papers at the Library of Congress he must have been shocked. Right there on the screen was a different set of text for a famous song from one of the musicals Larson had created[12]. What was it that he was looking at? Was this an alternative version of the song? As Reside dug deeper, and came to understand the nature of the word processing software that Larson had used and the software that Reside was using to render the text with he came to understand exactly what had happened. The word processing software that Larson had used would save a record of changes in the text inside the file. So an individual word-processing file would actually contain a record of the edits to a file over time.

The only way Reside could interpret what he saw on the screen was to learn a bit more about the software that was used to write it and the software he was using to render it. Ultimately, this is a rather fascinating result; works written in this particular word-processing application have within them records of their creation and editing.

The implications of this kind of work extend beyond the structure of individual files. In working to understand the material properties of digital objects, digital humanities scholar Matthew Kirschenbaum opened up a ROM (a copy of a floppy disk) in a Hex editor[13]. This ROM had a copy of an early video game called Mystery House. A Hex editor renders the hexadecimal notation, a recording of each byte on the medium. So the Hex editor showed how the information in the ROM was laid out on the original floppy disk it was saved on. As he explored the disk he found something intriguing, a sequence of text that did not appear in the game he was studying. What had he found? Was this hidden text in the game that wasn’t used? After goggling the text he was able to identify that the text came from a completely different game. From this, he was able to infer that the disk the ROM had been created from had a copy of the other game that had been overwritten by the second game. Kirschenbaum downloaded a copy of a game and was able to figure out what had been on the original disk before the game was saved on it.

Understanding how this happened requires background on how floppy disks and hard drives function. When a file is deleted it generally really isn’t deleted. Instead, a computer marks the space that the file is stored as available to be overwritten. The result is that if you poke around in what is actually written on a computer disk you will find that all sorts of areas on it that the operating system will tell you are empty spaces that actually contain readable information. As a result, as archives increasingly begin accessioning this kind of born digital material they are making decisions on if they want to create forensic copies of this kind of media (that is copies that will contain all that information, including information that is hidden to the user) or if they want to create logical copies of disks and drives that will only contain what the operating system thinks is there. In either event, this suggests a whole new set of skills for interpreting primary sources that historians are going to need to be come adept with. When working with born digital sources it is important to understand them beyond what they look like on the screen. It is critical to move past the performance of a file or a file system and to understand the additional information that may not be immediately revealed. The performance of digital content similarly opens a set of questions about the set of technologies used to interpret it.

What is lost in how it was/is rendered?

When files are rendered on a computer screen a user witnesses something akin to the performance of a play. The underlying data in a file is interpreted and rendered through software for a user to interact with in much the same way that the script of a play is interpreted and performed by a cast on a stage. In each case, while the underlying script or files remains the same, a given performance of a file or a play is going to look and sound different. For some kinds of research questions those differences do not matter, however, it is necessary in either case to be aware of the differences.

Archived websites offer a key case to explore how this plays out in the interpretation of a born digital primary source. At this point, many organizations are using a range of different tools to archive websites. They use a few different kinds of tools to harvest copies of what content was available at a particular URL at a given moment and then use another set of tools to be able to render that content for you to view. For example, you can go to the Internet Archive and type in the URL for and you will find an interface that lets you see what the homepage of the Library of Congress website looked like at different points in time when the Internet Archive saved a copy of it. With that said, it is important to realize that when you look at a copy of the site in the Internet Archive’s Wayback Machine you are not really seeing what the site looked like at that point in time because a range of characteristics of the way the site looked then are not being replicated.

One views a website through a web browser, and any given browser will render things slightly different. This is particularly true for older sites. Similarly, when one looks at a website from ten or twenty years ago those sites were designed for computers that had smaller screen resolutions, that had different processers, that ran different operating systems. Each intermediary layer of software (the browser, the operating system etc.) and the implied assumptions about computer hardware baked into that software (screen resolution, processor speed, etc.) function as part of the sequence of interpreters that perform a webpage.

When asking questions about what is lost in how a digital object or set of digital objects is rendered it is important to recognize that different elements are more likely susceptible to issues. The distinctions between the informational and artifactual elements of sources previously discussed are similarly relevant in this context. For example, if all one is focused on is how something was written in text on a page, in most cases how it is rendered isn’t likely to be too much of a problem. However, in cases like the presentation of digital art created for the web or in situations where the aesthetics, design and user experience of a web page matter it is very likely that issues in how something is rendered will play a significant role one’s ability to interpret it[14].

How was this created, managed and used and how does that impact what one can say about it?

To be able to accurately interpret a source it is essential to understand the context in which it was created, managed and used. This is particularly challenging in the context of born digital source materials, as there is a rapid and continual churn in the underlying technology and formats that interact with shifting behaviors and social contexts for interpreting the meaning of those behaviors.

As an example, consider what the email signature “Sent from my iPhone” at the bottom of a message communicates[15]. First off, that the sender sent an email from a mobile device which likely explains why their might be typos or it might be brief because of the limits of a smaller interface. At the same time, it tells us that the user didn’t care to change the default signature that Apple added to their messages. So email’s aren’t just emails. The conventions and forms of the medium have developed and changed over time and what it means to send and receive an email has changed too. Part of understanding and interpreting a particular email is going to involve understanding the context through which it was created and the social conventions around email at a given point in time.

Continuing in the case of email, the way that individuals manage their email and how that email is acquired and processed is going to be an important part of interpreting archives of email. Some email users keep complex folder structures for managing email. In some cases organizations restrict the total size of storage space for users to keep email, so individuals end up managing their email by deleting emails to make space for new ones. At the same time, the development of services like Gmail have encouraged a different set of behaviors where individuals are increasingly keeping all of their email and simply using search to work their way through their messages[16]. To this end, developing an understanding of what an individual’s practices and or an organizations practices were around email will be a key part of making sense of any given set of emails.

To illustrate another area of born digital content that has these issues consider the way that people take, manage and work with digital photographs. One of the primary characteristics of digital objects is that it is generally trivial to make exact copies, or seemingly exact, copies of them. As a result, when it comes to digital photographs, people will often have an assortment of copies of an image with varying amounts of metadata associated with them[17]. There is the original file from a camera or a phone, a copy downloaded to a hard drive that might be edited and a range of derivative copies created for sharing on Facebook or a series of photos using different filters. While the original might be the highest resolution, the derivative files are likely seen more and it’s likely that the metadata and descriptive information about each copy can be different. As a result, there isn’t really a master file or copy, so much as there is a constellation if different versions of the photo that each can be studied to understand a personal digital media ecology of an individual or organization.

It is also worth underscoring that what a photo means in a given moment is itself historically contingent as well[18]. In the last few years more photographs have been taken then in the two hundred or so years since the camera was invented. At this point, there are more than 6 billion photos on Flickr, and hundreds of millions of photos on Facebook and Instagram[19]. The combination of camera phones and sites like Flickr, Instagram & Facebook have created a set of practices and social norms where all kinds of people take sequences of photos throughout their day and share them. Similarly, the fact that camera phones quickly began to have two cameras, one in the front and one in the back, illustrates the shift toward the emergence of the selfie as a key use of photographs. In this vein, photos increasingly play a role in the presentation of self in everyday life.

With this noted, digital photos increasingly come with a considerable amount of technical metadata embedded inside them that will be increasingly useful for historians studying these objects. Again, what is shown on the screen is only part of the story with digital objects. With a range of simple tools, it is possible to read the text information encoded through standards like XIFF which can document information about when a photo was originally taken, what software has been used to edit it, and the kind of camera that was used to originally take the photo. The result is that there exist inside many digital photographs records of the provenance of their creation and management that can be used to help contextualize and understand how they were in fact created.

What role did search play in the original experience of content?

The idea of original order, that the order materials are organized in by their creators and managers contains important value for contextualizing records, is somewhat at odds with the basic nature of digital media[20].  From the perspective of an end user, there really isn’t a first row in a database[21]. Instead, a user enters a query and the results of the query come in their own order. As a result, when content is preserved without preserving the interfaces to that content historians are going to be left needing to do a lot of reasoning and theorizing based on how they think those interfaces worked. This poses a key question to ask of born digital primary sources. What role did search interfaces and algorithms play in how users interacted with and made sense of content and what limitations on interpretation does likely not having that information impose? A few examples will illustrate this issue.

One of the biggest challenges facing web archives is that it is very unlikely that anyone is going to be able to recreate the central mode through which web content is accessed and understood. It is unlikely that there will be a historical Google search. While it is possible to find archived copies of many webpages at particular moments in time there won’t be a way to figure out what someone in Washington D.C. who goggled “Benghazi” in March of 2015 would have seen in the search results. Given that search is the primary mode through which web content is found and accessed that means it won’t be easy to figure out what it is likely that people will have come across.

As a related example, consider if someone want’s to study visual representations of any given topic in the 6 billion photos on Flickr. Even if there is an archived copy of all those photos, it would be challenging to figure out what photos someone might have seen if they searched the site at a given point in time. From that archived copy of the photos and their metadata it would be possible to study what kinds of photos people created and shared and through the metadata the relative popularity of given images. However, if one wanted to know what someone would find when they visited Flickr and searched for something you would also need to have a copy of Flickr’s proprietary “interestingness” algorithm which is used to sort out what photos are shown based on a series of weights assigned to different characteristics of photos[22].

Examples of the role of search in the use of digital media are everywhere. The capability of search is itself increasingly shifting how people manage their information, from a “filing” mentality to “piling,” and the result is that knowing how search worked in Gmail, or in the Mac operating system, is going to be increasingly important for making sense of born digital primary sources.

These various questions asked of digitized and born digital sources connect directly to a broader set of issues in how aggregations and collections of these materials are established and described. In this area many different kinds of projects have started to be described as digital archives. In what follows I will briefly explore some of the ways the term is used and discuss the issues that arise in terms of interpreting the various kinds of sources in these different kinds of digital archives.

What are Digital Archives?

When archivists, historians and digital humanists use the term “digital archive” they often mean different and overlapping things. I’m not so much interested in trying to decide whose use of the term is right or wrong, but in clarifying what the term means in different contexts.  In each case below, I have provided an example or two of this type of usage and worked to connect the kind of usage back to the questions one needs to ask of the digital primary sources contained in them.

Collections of Aggregated Digitized Primary Sources

When digital humanities scholars use the term digital archive, they are often describing aggregated collections of digitized primary sources. For example, the Shelly Godwin Archive brings together digitized copies of primary source manuscript collections from a range of different archives around the world to create a single place to access the papers of a particular family.

Historian Joshua Sternfeld has suggested considering calling these kinds of projects a genre of “digital historical representations”.[23] Sternfeld uses that term to talk more broadly about the diverse range of products historians are now creating from digitized sources, including visualizations and databases, but included theses kinds of digital archives under this umbrella. He included these in this category as they tend to be more expansive in what they bring together than what archives have generally focused on.

The origin of this usage is anchored in Jerome McGann’s work on the Rossetti Archive[24].  The Rossetti Archive presents a dizzying array of sources related to 19th century poet, illustrator and painter Dante Gabriel Rossetti. It contains much of what one might find in an archive, like copies of manuscripts and correspondence. However it also includes copies of published works like books and poems as well as a range of visual works by other artists, contemporary periodicals and other related texts. The site provides a wealth of resources and a mixture of interpretation and exhibition of those sources. However, it is often challenging to parse exactly what the scope of what one is looking at in the site.

The idea behind the Rossetti Archive, and a related idea in the William Blake Archive, was to develop a sort of ever growing hypertext aggregation of related digital copies of sources anchored around an individual[25]. In this vein, it has much more of a hybrid of a critical edition with the idea of providing the breadth of resources one might find in a literary archive.

When working with sources in this kind of digital archive it is essential to understand the context from which the original source materials were taken from. In this case, the site is likely presenting materials from a range of different provenance and as such it is important to identify where something is coming from and then think through the kinds of questions one considers about why a particular object persists and others don’t related to the history of a given source. 

Digitized Copies of Entire Archival Collections

In some cases, the term digital archive is used to refer to a digitized copy of the entire contents of an archival collection. For example, the Clara Barton Papers at the Library of Congress are available in full online. It’s not just the contents of the collection that was digitized but the folders they are contained in as well.

Presented online according to the boxes and folders they can be found in at the physical collection in Washington D.C. this kind of presentation of sources provides transparent access to the collection as it was arranged and described by archivists. In this vein, the scope and context note in same finding aid that one would use to contextualize sources and understand how selection and arrangement decisions were made is useful for working with the digitized collection. To this end, something like the Clara Barton papers is functionally a digital surrogate of an entire manuscript collection.

In a case like the Barton papers, the provenance of a given collection is much clearer and easier to parse than in the case of the previously discussed aggregations of digitized sources. With that noted, it is worth considering why a particular archive is digitized and not another as that itself represents it’s own selection/appraisal like decision. In the case of collections at most archives it will be a mixture of legal issues (generally focusing on digitizing older collections that are much less likely to involve a range of copyright and other rights issues), issues of what is thought to be most popular, and what is easiest to digitize.

As another example of where this kind of selection issues is raised, many state archives and historical societies are entering into contracts with companies like to digitize large parts of their collections. In these cases, companies are generally deciding what collections to digitize based on what they deem to be the most useful to the genealogists who are their customers[26].  To this end, it is worth considering why a particular collection is available and the extent to which the selection of that collection over another for digitization might change the direction of your research and writing. With that said, this is a much less significant issue than in other cases where individual documents have been cherry picked from an archival collection and digitized in that you have a sense of the structure and content of a whole coherent archival collection.

Aside from issues of selection, it is also important to think through considerations of the quality of a given set reproductions of sources for your purpose. In the case of the Clara Barton papers, part of why they were digitized in full is that the entire collection was already microfilmed. So instead of doing high quality digital captures of the original documents it was much less expensive to simply digitize the black and white microfilm. For most purposes those digitized copies of the microfilm are perfectly serviceable. However, as the cases from the EEBO Shakespeare folios illustrated, higher quality color images of the documents would likely enable access to a much broader range of the potentially significant properties of those documents. So it’s still important to consider if the quality of a digital reproduction of an object is good enough for the purpose one intends to use it for. 

Born Digital Archival Collections

When archives acquire born digital materials and process those collections the results are often called digital archives, or born digital archives, as well. For example, Emory University acquired Salman Rushdie’s papers that came with a series of his laptops[27]. Disk images were created of those laptops and at this point it is possible for researchers to login and study the contents and environment he worked in. In this case, researchers can engage directly with an emulated version of his whole computer.

In this case, the digital archive is generally a subset or a hybrid component of an analog archival collection. Often these kinds of materials are described as part of a finding aid and as such it is relatively easy to ascertain their provenance and understand why a particular set of digital objects exists and how decisions have been made in terms of their processing, arrangement and description. With that noted, the standards and practices for collecting, processing and preserving born digital archival material are still developing and evolving. So the quality and consistency of how born digital materials are described and made available varies widely across different repositories.

All of the questions and issues raised earlier about born digital primary sources are important to consider when working with these kinds of collections. In much the same way that a historian who studies 18th century documents needs to learn to read various kinds of handwriting scripts to develop an ability to read and decipher those texts, historians are going to need to develop sophisticated understandings of how digital media systems functioned at particular points in time and how different kinds of users used them. For example, understanding how different people organize their desktops, or how they name their files, and how conventions around those sorts of things have changed over time will be an important part of interpreting born digital archives.

Web Archives

Web Archives represent another genre of born digital archives that are both significant and different enough to warrant their own consideration. At the Internet Archive, a range of National Libraries, and a host of smaller archives and libraries are engaged in work to collect and preserve websites and webpages and these collections are going to be of critical importance for future research. With that said, Web Archives represent a rather different approach to collecting and organizing sources.

The various organizations that archive the web use tools like Heritrix, an open source web crawler, are sent out to grab all of the rendered content of a webpage they can get ahold of and, within defined parameters, the other pages that link to it and all their associated files. As part of this collection process, the tools log information about the date and time that the data was collected. At this point, tools store that content in WARC files, or Web Archive files, which can then be played back via tools like the Wayback machine. So there is a lot of information in here that can be used to assert the authenticity of the data, how a particular URL presented itself to Heritrix and how Heritrix interpreted it at a particular moment in time.

There are a few key points for interpreting and studying web archives. First, web archives are consciously created. That is, an organization has a selection policy and works to collect sites that fit with that policy. So understanding those policies and the scope of a given collection is a key part of interpreting it. In that vein, it is also important to understand how a given repository works, that is many organizations require permission from content creators to collect particular kinds of sites, so in those cases, the scope of a given collection is only going to contain content from site owners that were OK with having their content collected and preserved.

Along with that, a given archived website is actually a copy of how the content of a given URL presented itself to the web crawler at a given moment in time. So, for example, if a site reconfigures how it displays itself based on the IP address of a site visitor then that will be reflected in the archived copy. There various ways that web crawling technologies can miss some of the content provided as well. So it is important to remember that web archives are not exact and pristine copies of the content of a particular URL at a moment in time but instead copies of how that content appeared to the crawler at that point in time.

Collections of User Generated Born Digital Primary Source

One of the biggest affordances of the World Wide Web is the ability for users to respond; to comment, to upload and “share”. This has not been lost on historians and archivists. Projects like the September 11 Digital Archive illustrate the possibility to “crowdsource” an archive and create a collection of born digital materials around a particular issue or topic.

Shortly after the September 11th attacks, the American Social History Project at the City University of New York Graduate Center and the Roy Rosenzweig Center for History and New Media launched a site that allowed anyone to upload records and reflections related to the attacks[28]. It contains copies of email messages, digital photographs, and a range of first hand accounts which a range of site visitors have provided over time. This sort of archive has been similarly developed around other incidents, like the Hurricane Digital Memory Bank created to digital record of Hurricanes Katrina and Rita[29].

Where an archival collection, like the papers of an individual or the records of an organization, accrue over time and have a clear and central connection to the individual or organization as the basis of their provenance these crowdsourced collections have a different kind of cohesion. Something like the September 11th digital archive can’t be understood as being a representative sample of individual’s reactions. It is a partial collection made up of who decided to participate at any given time. To that end, the individual reflections and objects in the collection are invaluable as records of individual experience but making sense of them as a whole is going to be challenging. Ideally, as researchers work with these kinds of collections in the future they will focus on understanding the kinds of voices that are represented in the collections as much as they work to interpret those voices. To that end, records of how these sites prompted users to participate and how those prompts developed and changed over time and how decisions were made about how to set up a site are going to be invaluable for helping researchers understand the scope and content of these collections.

Going Forward

Sources don’t speak for themselves. To that end, historians have developed and deployed techniques for interrogating and understanding sources based on their properties and the context of their creation, use and management. In this essay I’ve worked to explicate some of the work necessary for historians to continue to be as rigorous in working with digital sources and archives as they have been with their analog counter parts.

The key questions of source criticism are the same irrespective of if a source is digital or not. However, given the rapid pace of change around digital technology it is likely that historians are going to need to increasingly focus on establishing and sharing techniques for working with different kinds of digital sources. As information ecologies continually shift it is going to be critical for historians to show their work in making sense of the stratigraphy of digital sources.


[1] For examples of tree rings, see. William Cronon, Changes in the Land: Indians, Colonists, and the Ecology of New England (New York: Hill and Wang, 1983). For examples of the perpetual value of probate records see Bushman, Richard L. The Refinement of America: Persons, Houses, Cities. New York: Knopf, 1992.For examples of using court proceedings see Pagan, John Ruston. Anne Orthwood’s Bastard: Sex and Law in Early Virginia. New York: Oxford University Press, 2003.

[2] Howell, Martha C., and Walter Prevenier. From Reliable Sources: An Introduction to Historical Methods. Ithaca, N.Y: Cornell University Press, 2001, p 28.

[3] For further discussion of digital source criticism see Hering, Katharina. “Provenance Meets Source Criticism.” Journal of Digital Humanities, August 4, 2014.

[4] Droysen, Johann Gustav Bernhard. Outline of the Principles of History: (Grundriss Der Historik). Translated by Elisha Benjamin Andrews. Boston: Ginn & company, 1897.

[5] Tyrrell, Ian R. Historians in Public: The Practice of American History, 1890-1970. Chicago: University of Chicago Press, 2005, p. 38.

[6] For further exploration of discussion of informational verses artifactual qualities of digitized sources see Fleischhauer, Carl. “Information or Artifact: Digitizing a Book, Part 1 | The Signal: Digital Preservation.” Webpage, October 17, 2011.

[7] For a more extensive exploration of this example, see Sarah Werner Where Material Book Culture Meets Digital Humanities , from the Journal of the Digital Humanities, Vol. 1, No. 3 Summer 2012

[8] For an excellent example of the way that searches for obscure terms have made it possible for historians to discover things that would have been nearly impossible in the past see Leary, Patrick. “Googeling the Victorians.” Journal of Victorian Culture 10, no. 1 (Spring 2005): 72–86.

[9] For an exploration of how searching through millions of books is changing research processes in the humanities see Ramsay, Stephen. “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” In Pastplay: Teaching and Learning History with Technology, edited by Kevin Kee. University of Michigan Press, 2014. For further exploration on the way that searching through massive amounts of sources suggests the need for changes in how historical writing is framed see Gibbs, Fred, and Trevor Owens. “The Hermeneutics of Data and Historical Writing.” In Writing History in the Digital Age, edited by Kristen Nawrotzki and Jack Dougherty. University of Michigan Press, 2013.

[10] For further exploration on the theme of digital objects as performance in the context of a digital art manuscript collection see Arcangel, Cory. “The Warhol Files: Andy Warhol’s Long-Lost Computer Graphics.” Artforum, Summer (2014).

[11] For more on screen essentialism see Montfort, Nick. “Continuous Paper: The Early Materiality and Workings of Electronic Literature.” Philadelphia, 2004.

[12] For further detail on Reside’s work with these files see Reside, Doug. “‘No Day But Today’: A Look at Jonathan Larson’s Word Files,” April 22, 2011.

[13] Kirschenbaum, Matthew G. Mechanisms: New Media and the Forensic Imagination. Cambridge, Mass: MIT Press, 2008, pp. 111-159.

[14] For a series of examples of how different browser rendering can dramatically effect the aperance of a born digital work of art see Fino-Radin, Ben. “Rhizome Artbase: Preserving Born Digital Works of Art.” Washington, D.C, 2012.

[15] For discussion of how email signatures like “sent from my iPhone” effect how messages are interpret see Carr, Caleb T., and Chad Stefaniak. “Sent from My iPhone: The Medium and Message as Cues of Sender Professionalism in Mobile Telephony.” Journal of Applied Communication Research 40, no. 4 (November 1, 2012): 403–24. doi:10.1080/00909882.2012.712707.

[16] A growing body of research on how people manage digital information will likely be invaluable for future historians in contextualizing the strategies that individuals used to organize and manage their digital information. For example see, Henderson, Sarah, and Ananth Srinivasan. “Filing, Piling & Structuring: Strategies for Personal Document Management.” In System Sciences (HICSS), 2011 44th Hawaii International Conference on, 1–10. IEEE, 2011.

[17] For an exploration of the various reasons individuals copy, edit and describe a range of derivative copies of digital photos see Marshall, Catherine C. “Digital Copies and a Distributed Notion of Reference in Personal Archives.” In Digital Media: Technological and Social Challenges of the Interactive World, edited by Megan Alicia Winget and William Aspray, 89–115. Lanham, Md: Scarecrow Press, 2011.

[18] For documentation of the historically contingent nature of photographs and an exploration of issues in interpreting photos from different historical contexts see  Trachtenberg, Alan. Reading American Photographs: Images As History, Mathew Brady to Walker Evans. 1st ed. New York, N.Y.: Hill and Wang, 1989.

[19] For an exploration of some trends in the history of numbers of photographs taken see Good, Jonathan. “How Many Photos Have Ever Been Taken?” 1000memories, September 15, 2011.

[20] Bailey, Jefferson. “Disrespect Des Fonds: Rethinking Arrangement and Description in Born-Digital Archives – Archive Journal Issue 3.” Archive Journal, no. 3 (2013).

[21] For an exploration of the logic, structure and assumptions of databases see Manovich, Lev. The Language of New Media. Cambridge, Mass: MIT Press, 2002 pp. 212-236.

[22] For an example of working through a set of search results on Flickr as a primary source see Owens, Trevor. “Lego, Handcraft, and Costumed Zombies: What Zombies Do on Flickr.” New Directions in Folklore 12, no. 2 (2015): 3–25.

[23] Sternfeld, Joshua. “Archival Theory and Digital Historiography: Selection, Search, and Metadata as Archival Processes for Assessing Historical Contextualization.” The American Archivist 74, no. 2 (October 1, 2011): 544–75.

[24] McGann, Jerome J., ed. The Complete Writings and Pictures of Dante Gabriel Rossetti. Accessed August 8, 2015.

[25] McGann, Jerome J. “The Rationale of Hyper Text.” Text 9 (January 1, 1996): 11–32.

[26] For a discussion of how digitization selections are made in public private partnerships see Kriesberg, Adam M. The Changing Landscape of Digital Access: Public-Private Partnerships in US State and Territorial Archives., 2015. pp. 122-125.

[27]  For further background on the Salman Rushdie digital archive see Emory University. Rushdie Researcher Workstation Tutorial, 2011.

[28] For further exploration of the September 11th digital archive see Roy Rosenzweig  Scarcity or Abundance? Preserving the Past in a Digital Era American Historical Review 108, 3 (June 2003): 735-762 as well as Between archive and participation: Public memory in a digital age E Haskins Fall 2007 37, 4

[29] For more background on this see Why Collecting History Online is Web 1.5 Sheila A. Brennan and T. Mills Kelly Center for History and New Media, Case Study

