Book.Files and the Inversion of Born Digital

Below are three images of my book, The Theory and Craft of Digital Preservation. The first is a picture I took of a print copy of the book. The second is the book cover on Amazon. The third is from Johns Hopkins University Press page for the book.

One of them is not like the other. Can you spot the difference?

The first one, the print copy, is the outlier. It has a completely different picture than the latter two. It’s almost the same but not quite. The picture on the cover of the print book was taken by Jermaine Taylor and posted to instagram shortly after I got my floppy disk tattoo in 2017. The second is a photo I took of my arm, when I couldn’t source a higher resolution file of Jermaine’s instagram photo. I emailed both of them to the press as an idea for the cover of the book. I took the second photo because I was concerned that the instagram photo might not be at a high enough resolution to use in print.

The photo I took is, in all the places where book covers appear, the cover of the book. It is also the cover of the eBook version. With that said, all the print copies of the book actually have Jermaine’s photo on their cover. Which in all honesty I think is an objectively better picture. As far as I can ascertain, at some point whoever was actually doing the layout for the print book ended up deciding to use the photo from instagram instead of the photo I took. I think it was a good call.

I proposed that picture of my tattoo as a cover for the book because I thought it spoke to some of the themes in the book. The floppy disk is a medium on which we write digital content. Beyond that it’s now the save icon. Jermaine encoded an instance of that icon with ink in my skin. Much of the book is about how messy and complicated the world of digital content is, in large part, because it’s the result of the accrual of the work of people kludging together things on-top of the work of other people. The fact that there are now these variances in the book cover out in the world itself helps to further demonstrate that point. Online, and in its eBook form there exists one cover based on a photo I took. But based on decisions made in the workflow and process that created the physical copies of my book, the print copies all have Jermaine’s instagram photo on them. The messiness of the digital plays out through the workflows and processes that create digital books. Some of those files get printed out. I interacted with a ton of digital files in getting the book to the publisher and then a range of digital files had lives I don’t know about that resulted in the production of the tangible book.

I offer this anecdote as my own personal point of entry and connection to the Matthew Kirschenbaum’s m recently published report Books.Files Preservation of Digital Assets in the Contemporary Publishing Industry

Book.Files and the Inversion of Born Digital 

If you’re work has any connections to book publishing and production or collections in libraries and archives related to creative production you are going to want to make time to download and read Books.Files Preservation of Digital Assets in the Contemporary Publishing Industry. The report does an excellent job in providing an overview of the shift to digital workflows in the publishing industry. In this respect, it makes for a great companion to the 2011 report from CRL on digital workflows in the news industry Preserving News in the Digital Environment: Mapping the Newspaper Industry in Transition. I’m always interested to see work like this that involves in-depth engagement with partners in the creative industries. 

From the Book.Files report you get a great sense of the handoffs that occur in the production, transmission, tracking and management of books as digital files. From all the handoffs of word files with track changes into processes with Adobe InDesign, through to XML files and or PDFs that become the basis of printing books or creating eBook files.

Below are some quotes I pulled out that I found particularly striking and relevant for thinking about collections for libraries and archives. 

  •  “As early as 1999, an article in Publishing Research Quarterly observed that publishing “is coming to mean producing digital content which can subsequently be delivered in different media, rather than producing books or journals””
  • “there is at least one simple, uncontestable fact that obtains for any book produced with commercial press processes in the last twenty years, and which will continue to obtain for the foreseeable future. That fact is this: a book is a file, which is to say it is a persistent digital asset stored in a digital repository somewhere.“
  • “A “book” is thus the born-digital potential for a file to become a book first, and a physical, tangible object in our hands only secondarily. Every new book on our shelves has its shadow in a digital file, or more precisely a set of digital files consisting of the various assets needed to bring the book into being. A physical book nowadays is a surrogate for a digital master.”
  • “Increasingly, this means that the EPUB file becomes the version of record for the book. If the publisher wishes to retain a separate format-independent rendition of the book, any changes or updates in the EPUB must then be back-propagated to the original XML in order to keep versions consistent.”

I think these observations, along with the rest of the report, offer an opportunity for folks that work with collecting, preserving, and providing access to books and records of the history of the book. My sense is that the results this kind of study in nearly any other creative industry would produce similar results. So I think the results here are relevant to anyone interested in the production and circulation of creative works and their histories. 

Born Digital is the Norm, Born Analog is the Outlier

The report hits home that cultural heritage institutions interested in collecting and preserving contemporary cultural works need to be centering digital content in their approaches.  Increasingly the physical objects that come into collections are themselves the digital surrogates and it’s worth asking when the print surrogate for the digital asset is good enough given that the source for that object is increasingly a digital resource. 

History of Creative Industries is Increasingly Born Digital 

The report illustrates the ongoing major shifts relevant to the records of cultural production. This has huge implications for special collections work that involves acquiring the archives of creative industries. At this point books are a key case study in this shift, but the same is true for photography, film, the performing arts, music, etc. Creative production has become almost an entirely digital set of workflow process and the future of archives of industries and creators in these media will involve engaging with these increasingly born digital content streams. 

Variance Abounds Across Digital Instantiations of Works

The report includes a series of examples of how and where variances enter into the workflows and processes as various stakeholders “touch” book files over the course of their production and the varied output files that are produced. The example of my book cover is in this case not an outlier, it’s  another example of kinds of variances that enter into the management of books as digital assets. In that cloud of files, it becomes increasingly difficult to talk about a definitive or authoritative copy of a work. My sense is that this issue of variance is going to become increasingly important for libraries, archives, and museums to figure out. On that front, I think some of Cathy Marshal’s work and  Richard Rinehart and Jon Ippolito’s work is relevant for further exploring this issue.

In Digital Copies and a Distributed Notion of Reference in Personal Archives Cathy Marshal explores the various kinds of copies and derivatives that people produce in managing and sharing videos and photos. Below is a map of the kinds of variances she observes.

Ultimately, Marshal argues for the need to step back from thinking of their being a canonical instance of a work and to instead embrace that what we are going to end up with is varied copies that instantiate important differences as those copies take on lives of their own. Significantly, this means stepping bak from the notion of “derivatives” to instead see each copy of a work as contributing to a distributed notion of it.

I think Marshal’s observations are relevant for thinking forward about how we likely want to approach all kinds of digital creative workflows; “we will not only want to see copies, but we‘ll also want to harmonize them, to harvest their metadata, to select among them. Instead of relying on a simple notion—the truth is in the cloud, embodied as a single reference copy—we will want to expand our sense of what is entailed by the notion of a reference copy and turn to a distributed, social model.”

In Re-collection: Art, New Media, and Social Memory  Richard Rinehart and Jon Ippolito also take a run at the idea of a canonical master file for any given work. They suggest that with digital media, in many cases instead of thinking about a master file it’s more important to be looking for what they call “mother files;” the editable files that enable a wide range of outputs. You can see more on their thinking in the pull quote from the book below.

Altogether I think the report makes for a great read and I think it helps to draw out some major issues facing libraries and archives for the future. More and more of the material of culture is digital from the start. The issues faced by the proliferation of variance and copies still something that we have a long way to go to fully understand and integrate into how we think about our work.

I’m curious for thoughts any and all of you have about the questions the report brings out.