Full Draft of Theory & Craft of Digital Preservation

Here it is, the book printed out for the first time. Or I suppose more accurately, a digital photo of the book printed out for the first time.

This weekend I’m submitting the full draft of the manuscript for my book The Theory and Craft of Digital Preservation to the publisher, Johns Hopkins University Press.

Update: to make it easier to read, I’ve shared a PDF preprint of the whole draft.

I’ve had a lot of fun working on this on nights and weekends over the last year. I have also learned a ton from everyone who has read drafts of the work in progress.

I’ve had a few folks reach out to me after reading parts of drafts and say things like “I’d love to read more of this. When will it be out?” I’m not sure exactly how long it will take for the next round of review and all the improvements that will come from working with a great press. With that said, drafts of the entire book are now online. Instead of having folks pick through my previous blog posts with the links, I figured I would put them all together in order in this post.

So to that end, below you can find an index to the eight chapters and the intro and conclusion. I’m going to leave this up with all the comments in them. I went through and resolved comments offline in my own copies of these but thought it would be fun to leave up the messy original drafts and a record of all the great input and ideas that folks have offered up to improve the text.

Table of Contents

Introduction Beyond Digital Hype & Digital Anxiety (7 pages)

Section One: Theory of Digital Preservation

Ch 1: Preservation’s Divergent Lineages (14 pages)

Ch 2: Understanding Digital Objects (12 pages)

Ch 3: Challenges & Opportunities for Digital Preservation  (11 pages)

Section Two: The Craft of digital Preservation

Ch 4: The Craft of Digital Preservation (6 pages)

Ch 5: Preservation Intent & Collection Development (13 pages)

Ch 6: Managing Copies & Formats (15 pages)

Ch 7: Arranging & Describing Digital Objects (19 pages)

Ch 8: Enabling Multimodal Access & Use (18 pages)

Conclusion: Tools for Looking Forward (9 pages)

Advance Twitter Praise for the Book

I pulled out a few fun tweets from folks responding to the book that I thought were fun to share.

https://twitter.com/save4use/status/877128435696619520

Getting Beyond Digital Hyperbole & Tools for Looking Forward

The book is now whole! I’m going to be spending this weekend working through revisions to the last section based on all of the great comments I’ve been getting, but I’m also now excited to share both the introduction and the conclusion.

If you have any comments or suggestions on these please do go ahead and chime in on them in comments on the docs. In the intro I try to lay out a whole set of axioms for digital preservation, which I’ve gone ahead and reposted below.

Fifteen Guiding Digital Preservation Axioms

As a point of entry to the book I have distilled a set of fifteen guiding axioms. I realize that sounds a little pretentious, but it’s the right word for what these are. These axioms are points that I think should serve as the basis for digital preservation work. They are also a useful way to work out some initial points for defining what exactly digital preservation is and isn’t. Some of them are unstated assumptions that undergird orthodox digital preservation perspectives; some are at odds with that orthodoxy. These axioms are things to take forward as assumptions going into the book. Many of these are also points that I will argue for and demonstrate throughout the book.

  1. A repository is not a piece of software. Software cannot preserve anything. Software cannot be a repository in itself. A repository is the sum of financial resources, hardware, staff time, and ongoing implementation of policies and planning to ensure long-term access to content. Any software system you use to enable you preserving and providing access to digital content is by necessity temporary. You need to be able to get your stuff out of it because it likely will not last forever. Similarly, there is no software that “does” digital preservation.
  2. Institutions make preservation possible. Each of us will die. Without care and management, the things that mattered to us will persist for some period of time related to the durability of their mediums. With that noted, the primary enablers of preservation for the long term are our institutions (libraries, archives, museums, families, religious organizations, governments, etc.) As such, the possibility of preservation is enabled through the design and function of those institutions. Their org charts, hiring practices, funding, credibility, etc. are all key parts of the cultural machinery that makes preservation possible.
  3. Tools can get in the way just as much as they can help. Specialized digital preservation tools and software are just as likely to get in the way of solving your digital preservation problems as they are to help. In many cases, it’s much more straightforward to start small and implement simple and discrete tools and practices to keep track of your digital information using nothing more than the file system you happen to be working in. It’s better to start simple and then introduce tools that help you improve your process then to simply buy into some complex system without having gotten your house in order first.
  4. Nothing has been preserved, there are only things being preserved. Preservation is the result of ongoing work of people and commitments of resources. The work is never finished. This is true of all forms of preservation; it’s just that the timescales for digital preservation actions are significantly shorter than they tend to be with the conservation of things like books or oil paintings. Try to avoid talking about what has been preserved; there is only what we are preserving. This has significant ramifications for how we think about staffing and resourcing preservation work. Preservation is ongoing work. It is not something that can be thought of as a one time cost.
  5. Hoarding is not preservation. It is very easy to start grabbing lots of digital objects and making copies of them. This is not preservation. To really be preserving something you need to be able to make it discoverable and accessible and that is going to require that you have a clear and coherent approach to collection development, arrangement, description and methods and approaches to provide access.
  6. Backing up data is not digital preservation. If you start talking about digital preservation and someone tells you “oh, don’t worry about it, we back everything up nightly” you need to be prepared to explain how and why that does not count as digital preservation. This book can help you to develop your explanation. Many of the aspects that go into backing up data for current use are similar to aspects of digital preservation work but the near term concerns of being able to restore data are significantly different from the long term issues related to ensuring access to content in the future.
  7. The boundaries of digital objects are fuzzy. Individual objects reference, incorporate and use aspects of other objects as part of their everyday function. You might think you have a copy of a piece of software by keeping a copy of its installer, but that installer might call a web service to start downloading files in which case you can’t install and run that software unless you have the files it depends on. You may need a set of fonts, or a particular video codec, or any number of other things to be able to use something in the future and it is challenging to articulate what is actually inside your object and what is external to it.
  8. One person’s digital collection is another’s digital object is another’s dataset.  In some cases the contents of a hard drive can be managed as a single item, in others they are a collection of items. In the analog world, the boundaries of objects were a little bit more straightforward or at least taken for granted. The fuzziness of boundaries of digital objects means that the concept of “item” and “collection” is less clear than with analog items. For example, a website might be an item in a web archive, but it is also functionally a serial publication which changes over time. A collection of web pages are themselves a collections of files.
  9. Digital preservation is about making the best use of your resources to mitigate the most pressing preservation threats and risks. You are never done with digital preservation. It is not something that can be accomplished or finished. Digital preservation is a continual process of understanding the risks you face for losing content or losing the ability to render and interact with it and making use of whatever resources you have to mitigate those risks.
  10. The answer to nearly all-digital preservation question is “it depends.” In almost every case, the details matter. Deciding what matters about an object or a set of objects is largely contingent on what their future use might be. Similarly, developing a preservation approach to a massive and rapidly growing collection of high-resolution video will end up being fundamentally different to the approach an organization would take to ensuring long-term access to a collection of digitized texts.
  11. It’s long past time start taking actions. You can read and ponder complicated data models, schemas for tracking and logging preservation actions, and a range of other complex and interesting topics for years but it’s not going to help “get the boxes off the floor.” There are practical and pragmatic things everyone can and should do now to mitigate many of the most pressing risks of loss. I tried to highlight those “get the boxes off the floor” points throughout the second half of the book. So be sure to prioritize doing those things first before delving into many of the more open ended areas of digital preservation work and research.
  12. Highly technical definitions of digital preservation are complicit in silencing the past. Much of the language and specifications of digital preservation have developed into complex sets of requirements that obfuscate many of the practical things anyone and any organization can do to increase the likelihood of access to content in the future. As such, a highly technical framing of digital preservation has resulted in many smaller and less resource rich institutions feeling like they just can’t do digital preservation, or that they need to hire consultants to tell them about complex preservation metadata standards when what they need to do first is make a copy of their files.  Along with this, digital media affords significant new opportunities for engaging communities with the development of digital collections. When digital preservationists take for granted that their job is to preserve what they are given, they fail to help an organization rethink what it is possible to collect. Digital preservation policy should be directly connected to and involved in collection development policy. That is, the affordances of what can be easily preserved should inform decisions about what an organization wants to go out and collect and preserve.
  13. Accept and embrace the archival sliver. We’ve never saved everything. We’ve never saved most things. When we start from the understanding that most things are temporary and likely to be lost to history, we can shift to focus our energy on making sure we line up the resources necessary to protect the things that matter the most. Along with that, we need to realize that there are varying levels of effort that should be put toward future proofing different kinds of material.
  14. The scale and inherent structures of digital information suggest working more with a shovel than with a tweezers.  While we need to embrace the fact that we can’t collect and preserve everything, we also need to realize that in many cases the time and resources it takes to make decisions about individual things could be better used elsewhere. It’s often best to focus digital preservation decision making at scale. This is particularly true in cases where you are dealing with content that isn’t particularly large. Similarly, in many cases it makes sense to normalize content or to process any number of kinds of derivative files from it and keep the originals. In all of these cases, the computability of digital information and the realities of digital files containing significant amounts of contextual metadata means that we can run these actions in batch and not one at a time.
  15. Doing digital preservation requires thinking like a futurist. We don’t know the tools and systems that people will have and use in the future to access digital content. So if we want to ensure long term access to digital information we need to, at least on some level, be thinking about and aware of trends in the development of digital technologies. This is a key consideration for risk mitigation. Our preservation risks and threats are based on the technology stack we currently have and the stack we will have in the future so we need to look to the future in a way that we didn’t need to with previous media and formats.