Full Draft of Theory & Craft of Digital Preservation

Here it is, the book printed out for the first time. Or I suppose more accurately, a digital photo of the book printed out for the first time.

This weekend I’m submitting the full draft of the manuscript for my book The Theory and Craft of Digital Preservation to the publisher, Johns Hopkins University Press.

Update: to make it easier to read, I’ve shared a PDF preprint of the whole draft.

I’ve had a lot of fun working on this on nights and weekends over the last year. I have also learned a ton from everyone who has read drafts of the work in progress.

I’ve had a few folks reach out to me after reading parts of drafts and say things like “I’d love to read more of this. When will it be out?” I’m not sure exactly how long it will take for the next round of review and all the improvements that will come from working with a great press. With that said, drafts of the entire book are now online. Instead of having folks pick through my previous blog posts with the links, I figured I would put them all together in order in this post.

So to that end, below you can find an index to the eight chapters and the intro and conclusion. I’m going to leave this up with all the comments in them. I went through and resolved comments offline in my own copies of these but thought it would be fun to leave up the messy original drafts and a record of all the great input and ideas that folks have offered up to improve the text.

Table of Contents

Introduction Beyond Digital Hype & Digital Anxiety (7 pages)

Section One: Theory of Digital Preservation

Ch 1: Preservation’s Divergent Lineages (14 pages)

Ch 2: Understanding Digital Objects (12 pages)

Ch 3: Challenges & Opportunities for Digital Preservation  (11 pages)

Section Two: The Craft of digital Preservation

Ch 4: The Craft of Digital Preservation (6 pages)

Ch 5: Preservation Intent & Collection Development (13 pages)

Ch 6: Managing Copies & Formats (15 pages)

Ch 7: Arranging & Describing Digital Objects (19 pages)

Ch 8: Enabling Multimodal Access & Use (18 pages)

Conclusion: Tools for Looking Forward (9 pages)

Advance Twitter Praise for the Book

I pulled out a few fun tweets from folks responding to the book that I thought were fun to share.


Getting Beyond Digital Hyperbole & Tools for Looking Forward

The book is now whole! I’m going to be spending this weekend working through revisions to the last section based on all of the great comments I’ve been getting, but I’m also now excited to share both the introduction and the conclusion.

If you have any comments or suggestions on these please do go ahead and chime in on them in comments on the docs. In the intro I try to lay out a whole set of axioms for digital preservation, which I’ve gone ahead and reposted below.

Fifteen Guiding Digital Preservation Axioms

As a point of entry to the book I have distilled a set of fifteen guiding axioms. I realize that sounds a little pretentious, but it’s the right word for what these are. These axioms are points that I think should serve as the basis for digital preservation work. They are also a useful way to work out some initial points for defining what exactly digital preservation is and isn’t. Some of them are unstated assumptions that undergird orthodox digital preservation perspectives; some are at odds with that orthodoxy. These axioms are things to take forward as assumptions going into the book. Many of these are also points that I will argue for and demonstrate throughout the book.

  1. A repository is not a piece of software. Software cannot preserve anything. Software cannot be a repository in itself. A repository is the sum of financial resources, hardware, staff time, and ongoing implementation of policies and planning to ensure long-term access to content. Any software system you use to enable you preserving and providing access to digital content is by necessity temporary. You need to be able to get your stuff out of it because it likely will not last forever. Similarly, there is no software that “does” digital preservation.
  2. Institutions make preservation possible. Each of us will die. Without care and management, the things that mattered to us will persist for some period of time related to the durability of their mediums. With that noted, the primary enablers of preservation for the long term are our institutions (libraries, archives, museums, families, religious organizations, governments, etc.) As such, the possibility of preservation is enabled through the design and function of those institutions. Their org charts, hiring practices, funding, credibility, etc. are all key parts of the cultural machinery that makes preservation possible.
  3. Tools can get in the way just as much as they can help. Specialized digital preservation tools and software are just as likely to get in the way of solving your digital preservation problems as they are to help. In many cases, it’s much more straightforward to start small and implement simple and discrete tools and practices to keep track of your digital information using nothing more than the file system you happen to be working in. It’s better to start simple and then introduce tools that help you improve your process then to simply buy into some complex system without having gotten your house in order first.
  4. Nothing has been preserved, there are only things being preserved. Preservation is the result of ongoing work of people and commitments of resources. The work is never finished. This is true of all forms of preservation; it’s just that the timescales for digital preservation actions are significantly shorter than they tend to be with the conservation of things like books or oil paintings. Try to avoid talking about what has been preserved; there is only what we are preserving. This has significant ramifications for how we think about staffing and resourcing preservation work. Preservation is ongoing work. It is not something that can be thought of as a one time cost.
  5. Hoarding is not preservation. It is very easy to start grabbing lots of digital objects and making copies of them. This is not preservation. To really be preserving something you need to be able to make it discoverable and accessible and that is going to require that you have a clear and coherent approach to collection development, arrangement, description and methods and approaches to provide access.
  6. Backing up data is not digital preservation. If you start talking about digital preservation and someone tells you “oh, don’t worry about it, we back everything up nightly” you need to be prepared to explain how and why that does not count as digital preservation. This book can help you to develop your explanation. Many of the aspects that go into backing up data for current use are similar to aspects of digital preservation work but the near term concerns of being able to restore data are significantly different from the long term issues related to ensuring access to content in the future.
  7. The boundaries of digital objects are fuzzy. Individual objects reference, incorporate and use aspects of other objects as part of their everyday function. You might think you have a copy of a piece of software by keeping a copy of its installer, but that installer might call a web service to start downloading files in which case you can’t install and run that software unless you have the files it depends on. You may need a set of fonts, or a particular video codec, or any number of other things to be able to use something in the future and it is challenging to articulate what is actually inside your object and what is external to it.
  8. One person’s digital collection is another’s digital object is another’s dataset.  In some cases the contents of a hard drive can be managed as a single item, in others they are a collection of items. In the analog world, the boundaries of objects were a little bit more straightforward or at least taken for granted. The fuzziness of boundaries of digital objects means that the concept of “item” and “collection” is less clear than with analog items. For example, a website might be an item in a web archive, but it is also functionally a serial publication which changes over time. A collection of web pages are themselves a collections of files.
  9. Digital preservation is about making the best use of your resources to mitigate the most pressing preservation threats and risks. You are never done with digital preservation. It is not something that can be accomplished or finished. Digital preservation is a continual process of understanding the risks you face for losing content or losing the ability to render and interact with it and making use of whatever resources you have to mitigate those risks.
  10. The answer to nearly all-digital preservation question is “it depends.” In almost every case, the details matter. Deciding what matters about an object or a set of objects is largely contingent on what their future use might be. Similarly, developing a preservation approach to a massive and rapidly growing collection of high-resolution video will end up being fundamentally different to the approach an organization would take to ensuring long-term access to a collection of digitized texts.
  11. It’s long past time start taking actions. You can read and ponder complicated data models, schemas for tracking and logging preservation actions, and a range of other complex and interesting topics for years but it’s not going to help “get the boxes off the floor.” There are practical and pragmatic things everyone can and should do now to mitigate many of the most pressing risks of loss. I tried to highlight those “get the boxes off the floor” points throughout the second half of the book. So be sure to prioritize doing those things first before delving into many of the more open ended areas of digital preservation work and research.
  12. Highly technical definitions of digital preservation are complicit in silencing the past. Much of the language and specifications of digital preservation have developed into complex sets of requirements that obfuscate many of the practical things anyone and any organization can do to increase the likelihood of access to content in the future. As such, a highly technical framing of digital preservation has resulted in many smaller and less resource rich institutions feeling like they just can’t do digital preservation, or that they need to hire consultants to tell them about complex preservation metadata standards when what they need to do first is make a copy of their files.  Along with this, digital media affords significant new opportunities for engaging communities with the development of digital collections. When digital preservationists take for granted that their job is to preserve what they are given, they fail to help an organization rethink what it is possible to collect. Digital preservation policy should be directly connected to and involved in collection development policy. That is, the affordances of what can be easily preserved should inform decisions about what an organization wants to go out and collect and preserve.
  13. Accept and embrace the archival sliver. We’ve never saved everything. We’ve never saved most things. When we start from the understanding that most things are temporary and likely to be lost to history, we can shift to focus our energy on making sure we line up the resources necessary to protect the things that matter the most. Along with that, we need to realize that there are varying levels of effort that should be put toward future proofing different kinds of material.
  14. The scale and inherent structures of digital information suggest working more with a shovel than with a tweezers.  While we need to embrace the fact that we can’t collect and preserve everything, we also need to realize that in many cases the time and resources it takes to make decisions about individual things could be better used elsewhere. It’s often best to focus digital preservation decision making at scale. This is particularly true in cases where you are dealing with content that isn’t particularly large. Similarly, in many cases it makes sense to normalize content or to process any number of kinds of derivative files from it and keep the originals. In all of these cases, the computability of digital information and the realities of digital files containing significant amounts of contextual metadata means that we can run these actions in batch and not one at a time.
  15. Doing digital preservation requires thinking like a futurist. We don’t know the tools and systems that people will have and use in the future to access digital content. So if we want to ensure long term access to digital information we need to, at least on some level, be thinking about and aware of trends in the development of digital technologies. This is a key consideration for risk mitigation. Our preservation risks and threats are based on the technology stack we currently have and the stack we will have in the future so we need to look to the future in a way that we didn’t need to with previous media and formats. 

Theory and Craft of Digital Preservation: Part Two Posted for Comment

Some class notes from Alice Rogers in my digital preservation seminar.

It took me longer than I anticipated, but I am now both excited and rather anxious to share drafts of the rest of my forthcoming book. A while back, I posted drafts of the first section of the book. The comments and responses I received on that have been fantastic. I’m now going to turn to reviewing and revising that section based on the generous wealth of feedback I’ve received.

The Craft Half of the Book

The first half of the book was the theory part, the second is the craft part. In the five chapters in this section I try to offer up a set of interrelated frames for working through the ongoing issues and challenges that make up digital preservation as a craft.

Chapter four is largely an explanation and justification for why and how I’ve set up the following four chapters. So I won’t delve too much into giving any context here as it’s better to just read the context in the chapter. With that noted, I’ve included the diagram I use in that chapter to explain how I see each of the subsequent chapters connecting with each other.




First 3 Chapters of Theory and Craft of Digital Preservation for Comment

As I mentioned in December, I’m working on a book called The Theory and Craft of Digital Preservation for Johns Hopkins University Press. For an overview of the book go read that post.

At this point I have a full working rough draft of the book together and I’m getting to a point where it could really benefit from readers input and insights. To that end, I’m posting drafts of the first three chapters up as Google Docs which you should be able to comment on and suggest edits to. When I’ve posted drafts of essays like this in the past I’ve received fantastic comments that has helped me refine both my writing and my thinking. So now we will see if the same kind of thing works for a book.

I’m interested in any and all feedback and input, however, I’m particularly interested in any suggestions for work that I should be citing from women, people of color, and people from the majority world.  Much of the digital preservation and digital media studies literature I’m drawing from is (like many fields) very white, very male and U.S/Eurocentric and I’d like to be working against that not reinforcing it.

So with that context, I’ve provided links to each chapter below and a bit of context for each chapter from the book proposal. My plan is to work through all the comments I get in early March.

Ch 1: Artifact, Information, or Folklore: Preservation’s Divergent Lineages

Interdisciplinary dialog about digital preservation often breaks down when an individual begins to protest “but that’s not preservation.” Preservation means a lot of different things in different contexts. Each of those contexts has a history. Those histories are tied up in the changing nature of the mediums and objects for which each conception of preservation and conservation was developed. All to often, discussions of digital preservation start by contrasting digital media to analog media.  This contrast forces a series of false dichotomies. Understanding a bit about the divergent lineages of preservation helps to establish the range of competing notions at play in defining what is and isn’t preservation.

Building on work in media archeology, this chapter establishes that digital media and digital information should not be understood as a rupture with an analog past, Instead, digital media should be understood as part of a continual process of remediation embedded in the development of a range of new mediums which afford distinct communication and preservation potential. Understanding these contexts and meanings of preservation establishes a vocabulary to articulate what aspects of an object must persist into the future for a given preservation intent.

To this end, this chapter provides an overview of many of these lineages. This includes; the culture of scribes and the manuscript tradition; the bureaucracy and the development of archival theory for arranging archives and publishing records; the differences between taxidermy and insect collecting in natural history collections and living collections like butterfly gardens and zoos; the development of historic preservation of the built environment; the advent of recorded sound technology and the development of oral history; and the development of photography, microfilming and preservation reformatting. Each episode and tradition offers a mental model to consider deploy for different contexts in digital preservation.

The purpose here is not a detailed history of lineages of preservation and the development of media, but instead to illustrate the many different conceptions of preservation exist and how those conceptions are anchored in different objectives. This overview provides readers with a focus on the distinct conceptions of what matters about an object and the innate material properties and affordances of different kinds of media as they relate to preservation.

Ch 2: Understanding Digital Objects

Doing digital preservation requires a foundational understanding of the structure and nature of digital information and media. This chapter works to provide such a background through three related strands of new media studies scholarship. First, all digital information is material. Second, digital information is best understood as existing in and through a nested set of platforms. Third, that the database is an essential media form and metaphor for understanding the logic of digital media.

Given that digital information is always physically encoded on digital media, it is critical to recognize that the raw bit stream (the sequence of ones and zeros encoded on the original medium) have a tangible and objective ability to be recorded and copied. This provides an essential first level basis for digital preservation. It is possible to establish what the entire sequence of bits is on a given medium, or in a given file, and use techniques to create a kind of digital fingerprint for it that can then be used to verify and authenticate perfect copies.

With that noted, those bit streams are animated, rendered, and made usable through nested layers of platforms. In interacting with a digital object, computing devices interact with the structures of file systems, file formats and various additional layers of software, protocols and drivers. Drawing on examples from net art, video games, and born digital drafts of literary works, I explore multiple ways to approach them anchored in different layers of their digital platforms. The experience of the performance of an object on a particular screen, like playing a video game or reading a document, can itself obfuscate many of the important aspects of digital objects that are interesting and important but much less readily visible, like how the rules of a video game actually function or deleted text in a document which still exists but isn’t rendered on the screen.

As a result of this nested platform nature, the boundaries of digital objects are often completely dependent on what layer one considers to be the most significant for a given purpose. In this context, digital form and format must be understood as existing as a kind of content. Across these platform layers digital objects are always a multiplicity of things. For example, an Atari video game is a tangible object you can hold, a binary sequence of information encoded on that medium identical to all the other copies of that game, source code authored as a creative work, a packaged commodity sold and marketed to an audience, and a signifier of a particular historical moment. Each of these objects can coexist in the platform layers of a tangible object, but depending on which is significant for a particular purpose one should develop a different preservation approach.

Lastly, where the index or the codex can provide a valuable metaphor for the order and structure of a book, new media studies scholarship has suggested that the database is and should be approached as the foundational metaphor for digital media. From this perspective, there is no “first row” in a database, but instead the presentation and sorting of digital information is based on the query posed to the data. Given that libraries and archives have long based their conceptions of order on properties of books and paper, embracing this database logic will have significant implications for making digital material available for the long term.

Ch 3: Challenges and Opportunities for Digital Preservation 

With an understanding of digital media and some context on various lineages of preservation, it is now possible to break down what the inherent challenges, opportunities and assumptions of digital preservation are.

We can’t count on long-lived media, interfaces, or formats. Popular digital media of all kinds Disc, Disk, and NAND Flash Wafers all degrade rather quickly — in terms of years, not decades or centuries. Many of these media are relatively complex to read, so the interfaces required to interpret them are likely to not be particularly long lived. The costs of trying to either repair these media or to fix and repair interfaces to read them rapidly becomes prohibitive. As a result, traditional notions of conservation science are, outside of some niche cases, going to be effectively useless for the long-term preservation of digital objects.

Going back to the discussions of preservation lineages, this means that digital preservation is an enterprise that can only focus on the allographic digital object. While all digital information is material, the conservation of that material over the long haul is not broadly practical. Where conservation science is concerned with the chemical and material properties of mediums and artifacts, the science of digital preservation is and will be computer science. With that said, because bitstreams are always originally encoded on tangible media and then created by, acted on and interpreted by all kinds of human made layers of software they end up presenting an extensive range of seemingly artifactual and not simply informational qualities. That is, the physical and material affordances of different digital mediums will continue to shape and structure digital content long after it has been transferred and migrated to new mediums.

First 3 Chapter’s Bibliography 

  • Archimedes Palimpsest Project. “About the Archimedes Palimpsest.” Accessed February 3, 2017. http://archimedespalimpsest.org/about/.
  • Association for Documentary Editing. “About Documentary Editing.” The Association for Documentary Editing. http://www.documentaryediting.org/wordpress/?page_id=482.
  • Bearman, David. Archival Methods. Archives and Museum Informatics Technical Report, vol. 3, no. 1. Pittsburgh, Pa: Archives & Museum Informatics, 1989.
  • Bird, Graeme D. Multitextuality in the Homeric Iliad: The Witness of the Ptolemaic Papyri. Hellenic Studies 43. Washington, D.C. : Cambridge, Mass: Center for Hellenic Studies ; Distributed by Harvard University Press, 2010.
  • Bogost, Ian. Alien Phenomenology, Or, What It’s like to Be a Thing. Posthumanities 20. Minneapolis: University of Minnesota Press, 2012.
  • Brylawski, Sam, Maya Lerman, Robin Pike, and Kathlin Smith. “ARSC Guide to Audio Preservation.” CLIR Publication. Washington, D.C, 2015. http://cmsimpact.org/wp-content/uploads/2016/08/ARSC-Audio-Preservation.pdf.
  • Chun, Wendy Hui Kyong. Control and Freedom: Power and Paranoia in the Age of Fiber Optics. The MIT Press, 2005.
  • Fino-Raidin, Ben. “Rhizome Artbase: Preserving Born Digital Works of Art.” Washington, D.C, July 24-26. http://digitalpreservation.gov/meetings/documents/ndiipp12/DigitalCulture_fino-radin_DP12.pdf.
  • Galloway, Alexander R. Protocol: How Control Exists after Decentralization. The MIT Press, 2006.
  • Gitelman, Lisa. Always Already New: Media. Cambridge, MA: MIT Press, 2006.
  • ———. Paper Knowledge: Toward a Media History of Documents. Durham ; London: Duke University Press Books, 2014.
  • International Council of Museums, Committee for Conservation. “The Conservator-Restorer: A Definition of the Profession,” 1984. http://www.icom-cc.org/47/history-of-icom-cc/definition-of-profession-1984.
  • Kirschenbaum, Matthew. “Software, It’s a Thing.” Medium, July 25, 2014. https://medium.com/@mkirschenbaum/software-its-a-thing-a550448d0ed3.
  • Kirschenbaum, Matthew G. Mechanisms: New Media and the Forensic Imagination. Cambridge, Mass: MIT Press, 2008.
  • Kittler, Friedrich A. Gramophone, Film, Typewriter. Translated by Michael Wutz and Geoffrey Winthrop-Young. Stanford, Calif: Stanford: Stanford University Press, 1999.
  • Krajewski, Markus. Paper Machines: About Cards & Catalogs, 1548-1929. History and Foundations of Information Science. Cambridge, Mass: MIT Press, 2011.
  • Lee, Christopher A. “Digital Curation as Communication Mediation.” In Handbook of Technical Communication, edited by Alexander Mehler and Laurent Romary, 507–530. Boston, MA: Walter de Gruyter, 2012.
  • Manovich, Lev. “Database as a Genre of New Media,” 1997. http://vv.arts.ucla.edu/AI_Society/manovich.html.
  • ———. Software Takes Command: Extending the Language of New Media. International Texts in Critical Media Aesthetics. New York ; London: Bloomsbury, 2013.
  • ———. The Language of New Media. Cambridge, Mass: MIT Press, 2002.
  • McNeill, Lynne S. Folklore Rules: A Fun, Quick, and Useful Introduction to the Field of Academic Folklore Studies. University Press of Colorado, 2013.
  • Mir, Rebecca, and Trevor Owens. “Modeling Indigenous Peoples: Unpacking Ideology in Sid Meier’s Colonization.” In Playing with the Past: Digital Games and the Simulation of History, 91–106, 2013.
  • Montfort, Nick. “Continuous Paper: MLA,” 2004. http://nickm.com/writing/essays/continuous_paper_mla.html.
  • Montfort, Nick, and Ian Bogost. Racing the Beam: The Atari Video Computer System. Platform Studies. Cambridge, Mass: MIT Press, 2009.
  • Nakamura, Lisa. Digitizing Race: Visual Cultures of the Internet. Electronic Mediations 23. Minneapolis: University of Minnesota Press, 2008.
  • Office of Communications, and Library of Congress Office of Communications. “Hyperspectral Imaging by Library of Congress Reveals Change Made by Thomas Jefferson in Original Declaration of Independence Draft.” Press Release. Washington, D.C, July 2, 2010. https://www.loc.gov/item/prn-10-161/analysis-reveals-changes-in-declaration-of-independence/2010-07-02/.
  • Owens, Trevor. “Pixelated Commemorations: 4 In Game Monuments and Memorials.” Play the Past, June 18, 2014. http://www.playthepast.org/?p=4811.
  • Reside, Doug. “‘No Day But Today’: A Look at Jonathan Larson’s Word Files.” New York Public Library Blog, April 22, 2011. http://www.nypl.org/blog/2011/04/22/no-day-today-look-jonathan-larsons-word-files.
  • Rinehart, Richard, and Jon Ippolito, eds. Re-Collection: Art, New Media, and Social Memory. Leonardo. Cambridge, Massachusetts: The MIT Press, 2014.
  • Saylor, Nicole. “Computing Culture in the AFC Archive.” Folklife Today, January 8, 2014. https://blogs.loc.gov/folklife/2014/01/computing-culture-in-the-afc-archive/.
  • Sharpless, Rebecca. “The History of Oral History.” In History of Oral History: Foundations and Methodology, edited by Lois E. Myers and Rebecca Sharpless, 9–32. Lanham, MD: AltaMira Press, 2007.
  • Smigel, Libby, Martha Goldstein, and Elizabeth Aldrich. Documenting Dance: A Practical Guide. Dance Heritage Coalition, 2006. http://www.danceheritage.org/DocumentingDance.pdf.
  • Sterne, Jonathan. MP3: The Meaning of a Format. Sign, Storage, Transmission. Durham: Duke University Press, 2012.
  • Thesaurus Linguae Graecae Project. “Thesaurus Linguae Graecae – History.” Accessed February 3, 2017. https://www.tlg.uci.edu/about/history.php.
  • Tomasello, Michael. The Cultural Origins of Human Cognition. Harvard University Press, 2009.
  • Tyrrell, Ian R. Historians in Public: The Practice of American History, 1890-1970. Chicago: University of Chicago Press, 2005. http://www.loc.gov/catdir/toc/ecip058/2005003459.html.
  • Werner, Sarah. “Where Material Book Culture Meets Digital Humanities.” Journal of Digital Humanities 1, no. 3 (2012). http://journalofdigitalhumanities.org/1-3/where-material-book-culture-meets-digital-humanities-by-sarah-werner/.