Wherein I Answer 13 Questions About Digital Humanities Blogging

Matt Burton, PhD candidate in the University of Michigan’s iSchool, is writing his dissertation on the role that blogs play in scholarly communication, primarily focused on digital humanities blogs. He asked me if I would respond to a set of 13 questions he put together as part of his study. Shawn Graham recently shared his responses, which I enjoyed reading, so I figured I would share mine as well.

In responding to Matt’s questions, I realized that there is likely a lot of tacit knowledge that comes from the practice of blogging in this community which it would be useful to make explicit for anyone else that wants to join. So I’d love to see other people respond to Matts 13 questions. If you link back to my post and Shawn’s we can keep track of all of this in trackbacks.

Matt: When did you start your blog (career wise: as a grad student,  undergrad, etc)?

Trevor: I started keeping an academic blog around the time I started my M.A. program. I had kept a personal blog for a year or so with my wife, but launched two blogs that had an academic bent in 2007. The first was a blog for a digital history course I was taking and the second was a site I was going to run that was called firstpast.org. That was about history as represented in children’s books. The children’s books thing didn’t keep my interest long enough, so I eventually rolled them all together.

Matt: Why did you decide to start blogging?

Trevor: The digital history focused blog was the direct result of a course requirement, we had to start a blog and keep notes on it. At the same time, I decided I would stand up that other blog, the one about history through children’s books, because I saw it as an opportunity to

  1. refine some of my tech skills
  2. show folks that I could create and manage a decent looking blog
  3. set myself up with a structure and regular set of deadlines to get myself in a habit of writing for an audience
  4. because I saw the kind of exposure and connections that other colleagues at CHNM (Dan Cohen, Tom Scheinfeldt, Dave Lester & Jeremy Boggs) were getting from keeping blogs. So that is a web of reasons I ended up getting into blogging.

Matt: How do you host your blog, i.e. Do you use a generic web-host like Dreamhost with WordPress, do you use a blogging service like Blogger.com?

Trevor: Currently, I use dreamhost to run an instance of WordPress. When I started blogging for the course I was using a wordpress.com instance but had set up and was running my own instance of the wordpress software for the history through children’s literature site.

Matt: How did you learn to set up your blog?

Trevor: I read the five minute tutorial for setting up a wordpress instance. It took a lot more than five minutes. I had put up websites before, but had never used anything that involved a database backend. I remember futzing around with a bunch of configuration issues to get the site up and running. At that point, I modified a theme at that point too. I wanted to make my own theme partly to show I could and to figure out more about how the HTML, CSS and PHP all interacted. Most of that tinkering just involved using Firebug to poke around and see what tweaks to the site would look like and then making those edits in a text editor to files via FTP.

Matt:  What are the challenges with maintaining your blog (i.e. spam, approving comments, dealing with trolls, finding time to write, etc)?

Trevor: At this point, the main challenge has been figuring out what role the blog plays in my productivity and work. I struggled a lot in the beginning to figure out what voice to write in and about how much my writing on the blog should be polished final product and how much it should be part of a kind of open notebook where I worked out things in a more personal voice. At this point, I feel like I’ve hit that stride, but at this point I also have so many places and commitments for writing that it’s tricky to do all the writing that I want to be doing. As a result of my blogging about history in video games, I was invited as one of the initial bloggers for Play the Past. At the same time, I also ended up blogging for my job, for the Library of Congress Digital Preservation blog. The result of this is that the “Trevorowens.org” blog has locked in as a place where I share more of the things that don’t easily fall into either of those other two spaces or that are the most perspectival of my writings. To sum that up, I haven’t had much trouble with technical or social issues around blogging. For me the challenge remains getting things up and out there via the blog and focusing on how I can make the best use of it as a place to develop and forward my thinking and writing.

Matt: What topics do you normally write about? Do you try and keep it strictly academic, or do you mix in other topics?

Trevor: At this point, I mostly talk about interpreting history as represented in new media, discussion of methods of research and scholarship in digital history and the digital humanities, and issues around the design, development, and process for the use of digital technologies in collecting, preserving, and providing access to cultural heritage materials. I upon occasion will delve into other issues in changes in scholarly communication. Another way to say this is that the thematic unity of the blog is that it covers the things I have an academic/professional interest in. The origins for a lot of posts are discussions with archivists, librarians, curators, artists, humanities scholars and scientists at conferences, on twitter, in the comments on their blogs and or reactions to presentations, papers or books that I’ve read.

At this point, Trevorowens.org is a professional/personal blog. I offer running commentary on issues in the field, but for the most part, it is not a place where I present original research as much as a place where I offer and develop my perspective on issues in this area of professional practice and scholarship. In contrast, when I write for Play The Past I envision my audience as a more general reader interested in issues and stories about history in video games. So the Play the Past posts are a bit more of a mixture between academic research writing and journalistic writing.

Matt: If you allow comments on your blog, do you often get comments? What has been your experience managing comments/commenters on your blog?

Trevor: When I write something really long, like the full write up of a talk I gave, I will often get nothing in the comments. I might see a lot of people sharing it around on twitter, or offering a word or two there, but I don’t see much engagement on the post. In contrast, if I write something short as a reaction to something that a lot of people are engaging in I can get some real substantive back and forth going. For example, Implications for Digital Collections Given Historian’s Research Practices responded to the ITHAKA report,Supporting the Changing Research Practices of Historians. Similarly, the satirical bent of  Notes toward a Bizarro World AHA Dissertation Open Access Statement responding to the AHA’s dissertation embargo information kicked up a lot of exchange. Along with that, some of the very technical proposals I’ve written up, like the recent piece on Linked Open Crowdsourced Description: A Sketch have had a tendency to spark a good bit of back and forth.

On the whole, I totally love comments. With the exception of that Bizzaro World post, and a post I wrote up about misogyny in tech communities, I haven’t really steered into waters where there is much divisiveness or trolls. Oh, wait, except for that one time when I co-authored a blog post on that asked if the source code of a video game could be racist and it got picked up by Rock Paper Shotgun and we ended up with all kinds of irate, but relatively thoughtful but very antagonistic comments. So those aside, I generally feel like the comment section of my blog works like the web once did. I put things up, and the small virtual community of practice I participate in on twitter and other blogs has a bunch of folks who pop in and read what I write and post thoughtful reactions that can open up discussions that I find myself going back to all the time.

Matt: What kinds of interactions (scholarly or otherwise) emerge out of your blogging practice?

Trevor: A bunch of them. I will try and break these out.

  • Finding and Establishing a Scholarly Community: Early on, I wrote a lot about history in video games on my blog, for example this post on the tech tree in Civilization from 2009 as a result I ended up getting roped into Play the Past at the launch. Through that, I ended up meeting a bunch of other bloggers I did not already know. As the blog has continued and I became a co-editor I’ve been thrilled to connect with and find people who I didn’t know at all who have now come to get a ton of traffic for their writing on this topic. We have had a range of folks move from commenters there, to guest bloggers to regular bloggers and a lot of their writing get’s a ton of traffic and exposure.
  • Refinement of Ideas and Writing and Collaborative Projects: A lot of my work toward publications and research projects occurs through a process of blogging. I blogged through drafts of parts of my dissertation proposal and writing process. A workshop I gave on crowdsourcing became a four part series of blog posts on the topic which turned into an invited essay for Curator: The Museum Journal which I then was invited to republish in an edited volume on the topic.
  • Getting My Name Out There: Every month or two I run into someone at a conference or event who says something like “I hope this isn’t weird but I read your blog.” At this point it’s totally not weird. It is a huge compliment and I feel really lucky about how the whole thing has worked out for me. The blogs are part of how I do public scholarship and it’s a continual part of the professional network and community I participate in.

As an example of how these things all weave together. One of my early pieces for Play the Past (still one of my personal favorites) asked if the game Colonization was offensive enough. Rebecca Mir, then a graduate student, read my post for a course and ended up writing this amazingly cool course paper that opened up a whole bunch of other themes from it. We corresponded a bit about the original blog post over twitter and she ended up sending me a copy of the course paper. Rebecca had found a ton of great stuff digging through some of the Civ modder discussion forums, and had some neat ideas about how to take a close look at the ways native peoples are represented in the game. At about the same time, there was a call for proposals for book chapters for what would become Playing With the Past I was planning on putting in a proposal based on the Colonization things I had written, and encouraged her to as well. Rebecca smartly suggested that it was unlikely that the editors would want to run two Colonization focused essays and suggested that we consider co-authoring something which I thought was a great idea.

So we put in for that and it was accepted, but ended up deciding to use Play the Past as a place for us to take turns blocking out and taking the lead on drafting a series of posts to explore the themes and issues I had laid out and she had begun exploring in her course paper. The result was a series of very widely read posts that got a ton of comments and ended up giving us a lot of great critical feedback to incorporate when we stitched them all back together into our essay for the book. What I love about this whole process is that it pulls at the seams of the traditional research and writing process and in doing so opens up the possibilities for a range of levels of collaboration and exposure to your work.

As is the case when you really spend time working on a piece, there is a bunch of material from the blog posts that we ended up leaving on the cutting room floor at the end. But, all of that material is still up and out there in the blog posts. The possibility of that collaboration hinged on her reading the short post I had written on the topic and the extensive feedback we received in comments helped us to refine and polish up the essay. Along with it, Rebecca went on to become a regular blogger for Play the Past and I know her participation in the site has helped to get her invited to present at conferences and played a role in her professional resume.

Matt: Do you find these interactions informative, useful, enlightening, tedious, frustrating, obligatory, etc? How do they feel?

Trevor: On the whole my interactions around blogging are informative and enlightening. I think a few other words I would use are challenging, rewarding, exhilarating, generous and warm.

  • Challenging in that on several occasions folks have called me out on things, or I have seen others called out, and on the whole I think that process has worked to make the broader community of folks in the digital humanities and library and archives tweet/blogosphere engage with aspects of privilege that help move the fields forward as they continue to grow.
  • Rewarding in that I get feedback and recognition for my work and am in regular and ongoing communication with the folks in a range of different communities of practice that I respect and admire.
  • Exhilarating in that every so often a post I write will blow up on reddit or something. One time a piece I wrote on Fallout was getting hundreds of more views each time I refreshed the stats page. That has the dual experience of “Yay! Look at how many people are reading something I wrote” and “Oh no! I really hope I didn’t mess anything up in there, look at how many people are now scrutinizing something I wrote.”
  • Generous and warm in that I have found myself in a community of peers, mentors, mentees and colleagues who regularly give of their time, and opinions and share in humor and the ups and downs of our careers and professional lives.

Matt: How do you think digital humanities blogging is different from more traditional forms of academic writing and reading?

Trevor: One of the essays in the report from this summit I helped plan on collecting and preserving science blogs is relevant in this case. The author suggested that part of the problem with pinning down what blogging is and how it is different from other modes of scholarly communication is that it’s something defined by particular technologies (text syndicated via RSS) and a set of practices that is socially defined in how people use those technologies at particular times. With that said, I think i can venture to offer two different approaches for going about this. Blogging is at once both a much more expansive and diffuse mode of communication than something like a journal article and simultaneously something that is an emergent genre of writing with a set of conventions.

Diffuseness: So, if you scroll down a bit further and read through some of my favorite blogs you will find that I like a lot of things that are totally different. Some of them have short posts that show up on a daily basis, some of them have long posts and are posted to every three to six months. Some have custom drafted material created for the blog, some are mostly sharing notes from talks and presentations and working drafts of papers. Some are filled with images and subheads some are just huge walls of text. To this end, one of the characteristics of blogging in the digital humanities is that it is far more particular to the person and their approach than something like a journal article. That is, I think you get a lot more variety in what people do with their blogs and what is considered acceptable practice.

Coherent Genre-ness: While I realize it might seem contradictory to now go on to suggest how blogging represents a coherent genre of writing after just saying that it’s so diffuse that isn’t so. While there is a broad diversity in practice, there are also a lot of conventions that bundle up in the middle of that diffuseness. So here are some things that makes blog writing, on the whole, different from other genres of scholarly communication like journal articles, book chapters, and conference presentations. On the whole, blog writing is more informal. It often is more conversational. It often involves less fancy talk, that is more straightforward attempts to get points across. Blog writing is generally much shorter than other forms of academic writing. Blog writing often has shorter paragraphs, makes use of hyperlinks to point out to ongoing discussion elsewhere instead of recapping that discussion, and includes more subheads to be easier to skim. Blog writing can often assume/connect with a broader audience than other forms of academic writing. Blog writing is often less finely tuned and honed than other forms of scholarly communication.

Matt: How would you characterize the relationship between blogging and the digital humanities (however broadly conceived)?

Trevor: Oh gosh, it sounds like that involves the infinite regress of attempting to define the digital humanities :) I will lean on a recent review I wrote about the book Pastplay which I think get’s at a fruitful connection between what DH had become and what blogging does (I’m going to post a pre-print of the review on the blog one I get it back with final edits.)

While I’m challenged at exactly where I should put Pastplay on my bookshelf (educational psychology? historiography & method?) I’m glad to know it is in my collection. From my perspective, the most valuable contribution of this book isn’t really articulated in the text. The book offers a framework for defining the ever-nebulous digital humanities. Many of the authors of chapters in the book are leading thinkers in the digital humanities, and the ideas about the playful use of technology to experiment, dabble, and explore our ideas about the past offers insight into an epistemology of the digital humanities. Often simply described as the application of computing technologies to humanistic inquiry, the playful hermeneutics described here, and the implication that there is no substantive difference between students learning about the past and historians themselves as perpetual learners lets us pin down what is different and significant about how these digital humanists are approaching understanding the past.

So I think that’s it. I think it’s about play. Not play in the games sense or childish sense but in the sense of individually collectively learning how to do things. That is, play in terms of how learning happens at the individual and community level as we fumble around and figure out how to do better work and develop better ways to understand our world, our cultures and their pasts. I think when digital humanities blogging is at it’s best you have people stepping away from “fancy writing” to play with ideas and play with methods, to be honest, to be generous but to not shy away from calling each other out on our respective shit. I think this is something that has been a huge asset to the development of the community but at the same time it’s a real challenge. It is there inside the ups and the downs of concepts like “niceness.”

Academia has always had a bit of a rough and tumble discourse, go find the forum section of just about any history journal over the last 80 years and you have a very real chance of finding a real knock down drag out fight over what counts as good work and or whose work is or isn’t original or groundbreaking. With that said, the personal valence of blogging and the immediacy of it and of comment threads has some of the effect of making it all the more critical for the community to continue to figure out and reflect on how we can maintain an open and friendly network that is also ready to have it’s privilege checked and it’s background assumptions checked. Blog writing is also an incredibly immediate form of academic writing. You write it, you hit publish, you tweet it, you start talking about it. If it’s a hot topic, there is a good chance you could be reading someone’s response and reaction in another post in a few hours.

Matt: What DH blogs/bloggers do you read and why do you read them? What do you like about them?

Trevor: There are really too many to name here, I follow hundreds of blogs in my reader, so I will just point to some highlights. Here is a rundown of some of my favorites off the top of my head.

  • I read everything Bethany Nowviskie  writes, more or less as soon as I know it is up. She is routinely insightful and reflective and the fact that she is situated in a library context ends up meaning that her perspectives are particularly relevant to me.
  • Ted Underwood is another favorite. He does a great job at doing number crunching computing sort of DH in a way that opens up and elucidates big questions.
  • Sheila Brennan always has great things to say about work at the intersection of digital history, public history and the digital humanities.
  • Tim Sherratt’s posts often seem to come with some fully formed new project he concocted that is both immediately interesting and useful and simultaneously something that forwards the theoretical potential of building things in the field.
  • Miriam Posner has both a great voice for blog writing and covers a lot of issues thoughtfully and deeply.
  • DH+Lib often surfaces posts and pieces I would not otherwise have come across
  • Mark Sample is one of the most creative people I follow. I love how he has a focus on issues in born digital media like video games and twitter bots and his writing is really smart.
  • Steven Ramsay isn’t really a high volume blogger but I appreciate his perspective and I think his work on algorithmic criticism and the hermeneutics of screwing around are some of the best pieces of work at bridging the computational and mathematical with the epistemology and values of the humanities.
  • Ernesto Priego has a valuable perspective and I enjoy the intersection of library science and digital humanities in his work.
  • Natalia Cecire is a great writer and scholar and a thoughtful critic.
  • Adam Crymble writes a lot about issues around the practice of digital history and it’s both good stuff and particularly relevant to my interests.
  • Melissa Terras has written a bunch of great stuff and her work is often directly related to issues I am working on related to things like use and reuse of digital content.
  • Shannon Mattern is always writing about these amazing courses she teaches, about visits to galleries in New York and sharing these in depth and thoughtful pieces and talks that have a media studies bent. It’s great stuff.
  • Kate Theimer likely does not consider herself to be in the digital humanities tent, but her work on the future of archives is always thoughtful and relevant to folks in DH.
  • Scott Weingart has a bunch of great posts about things like network analysis and I appreciate his background in the history of science which situates his perspective on tools and methods in an understanding of the sociocultural framework that those tools operate.
  • Ian Milligan is someone whose posts I’m almost always tweeting out. He is one of a handful of historians doing work with Web Archives and he shares parts of the process of that work that are enlightening.
  • Fred Gibbs is a great historian and a thoughtful commentator on digital history.
  • Ed Summers builds very cool things and always has smart reasons and things to say about the things he builds.
  • Sharon Leon does great work in digital history and public history and I’m always interested in her perspective.

Matt: What was your most popular blog post? Why do you think it was so popular? What is your *favorite* post?

Trevor: Unquestionably, the most read things I’ve ever written are posts about Colonization and Fallout 3 for Play the Past. Both of those became and continue to be so popular because they have connected with audiences outside the network of academics and cultural heritage professionals I usually write for. Another hit in that vein, is a 400 word post I wrote about an amazing Pac-Man t-shirt.

For my personal blog, I’ve included the stats for the top 15 of my blog posts below (this really only goes three or four years back but it’s illustrative). The top post there is a perennial hit. I think that one resonated so well because it’s really in the sweet spot for a blog post, I lay out a point that turns some conventional wisdom about crowdsourcing on it’s head and that works in a short post. The second is actually a really long one, the transcript of a talk I gave earlier this year at the University of Pittsburgh that I think made it around a good bit because it weaves together a lot of the different things that I focus on (digital preservation, born digital materials and the digital humanities). So I think that one got around because it touches on and tries to connect more or less all the sectors of my professional network. The Bizarro world post was a fast moving issue in the higher education blogs. From there you see a few more of my posts on crowdsourcing and a range of things I’ve writing about research methods that tend to get some traction.


As far as a favorite post of mine, I’m not sure. I think I’d probably go with either the Fallout 3 post or the Is Colonization Offensive enough post. At the beginning of Play the Past I would spend a lot of time honing and refining pieces like those and I think it shows. For better or worse, most of the blogging I do these days is much more immediate and responsive and rushed between a bunch of other things. So I think I’m putting out good stuff that is useful but I don’t think it’s nearly as refined.

Posted in Uncategorized | Leave a comment

Digital Public History Course for an iSchool

I’m excited to announce that I will be teaching my digital public history graduate seminar again! I am tweaking the course I taught for American University’s Public History Program (in 2011 and 2012) and will be teaching it as a special topics course this spring in the University of Maryland’s iSchool program.

So, if you are a grad student at UMD (or if you have friends that are) it will be Thursday nights, 6:00-845 in College Park Maryland.

Here is the blurb on the course:

Digital Public History, LBSC 708 (Section D), College Park Maryland, Thursday nights, 6:00-845 

This course will explore the current and potential impact of digital media on the theory and practice of history. We will focus on how digital tools and resources are enabling new methods for analysis in traditional print scholarship and the possibilities for new forms of scholarship. For the former, we will explore tools for text analysis and visualization as well as work on interpreting new media forms as primary sources for historical research. For the latter, we will explore a range of production of new media history resources, including practical work on project management and design. As part of this process we will read a range of works on designing, interpreting and understanding digital media. Beyond course readings we will also critically engage a range of digital tools and resources.

Below is a bit of a scratch pad for how I am thinking about tweaking things for the course. I am curious for other comments/suggestions for things to consider with these.

Topics/Weeks I am Considering Swapping in

At the moment there are four areas I am considering as potential revisions/additions to the week by week topics of the course.

Books I am Considering Adding or Swapping in

One of the things I need to get done sooner rather than later is decide on what books I’m going to keep and or swap out. Here are a few I am considering. I am curious to hear if there are any other books folks think I should be considering.

Reviewing Some Syllabi for Related Courses 

I’ve been trying to keep track of some great looking relevent/related courses to review. This is the list I have so far. I’d love to know of other courses folks think I should take a look at.

So, what do you think?



Posted in Uncategorized | 2 Comments

Personal Digital Archeology Illustrated

Bundled up inside sectors of many of our hard disks you can find the traces of our digital past recursively tucked away in a hastily named directories. Our Old Files form layers of digital sediment ripe for personal digital archeology.

I love how this XKCD illustrates the way that personal computing becomes inherently archeological. Until recently, the cost of storage space kept plummeting.  Along with that, the nature of search in file systems enabled many of us to move from filing to piling. The result is something like the stratigraphy in the comic.

It was easy to just stick “Old Desktop” inside the new documents folder, which itself had the stack of files you recovered from an earlier hard drive crash. Nested deeper and deeper down you’ve got your high school zip disk.

If you tunnel down in there, you can even find out things about yourself you had forgotten. In this case, an 850k text file with forgotten poetry is uncovered.

As scholars in the future work with logical or forensic disk images of personal computers in the future their work will likely look much the same. Except they won’t have the benefit of memory to fill in the blanks about how this order came to be.

The comic is chaotic, haphazardly named files  and folders created on the fly become the long term structure of the data. Still, we get the joke because we can understand what these layers and files mean without knowing anything about their contents. We see the high school love note, the pile of files shared over Kazaa, the collection of pictures from Facebook. The directory names, file names and file extensions tell us a great deal about what we are looking at. Even in the chaos there is a lot of context and description in the arrangement of the files.

Interestingly, as we increasingly move to using cloud storage for more and more computing and as the days of really great Kryder rate continues to level off this is going to likely only be the case for a particular period in the history of personal computing. In any event, when we right up the digital historiography and source criticism text books for historians of the near and distant future who want to make sense of our old hard drives we should print up and explicate this XKCD and feature it on the cover.

Posted in Uncategorized | 1 Comment

Linked Open Crowdsourced Description: A Sketch

Systems and tools for crowdsourcing transcription and description proliferate, and libraries and archives are getting increasingly serious about collectively figuring out how to let others describe and transcribe their stuff. At the same time, there continues to be a lot of interest in the potential for linked open data in libraries archives and museums. I thought I would take a few minutes to try and sketch out a way that I think these things could fit together a bit.

I’ve been increasingly thinking it would be really neat if we could come up with some lightweight conventions for anyone anywhere to describe an object that lives somewhere else. At this point, things like the Open Annotation Collaboration presumably provide a robust grammar to actually get into markup and whatnot if folks wanted to really blow it out, but I think there is likely some very basic things we could just do to try and kick off an ecosystem for letting anyone mint URLs that have descriptive metadata that describe objects that live at other URLs.

My hope in this, is that instead of everyone building or standing up their own systems, we could have a few different hubs and places across the web where people describe, transcribe and annotate that could then be woven back into the metadata records associated with digital objects at their home institutions. In some ways this is really the basic set of promises and aspirations that Linked Open Data is intended to help with. Here I am just intending to try an think through how this might fit together in a potential use case.

A Linked Open Crowdsourcing Description Thought Experiment

With a few tweaks, we are actually very close to having the ability to connect the dots between one situation in which people further describe archival materials (in this case to create bibliographies) that could provide enhanced metadata back to a repository. I’ll talk through how a connection might be forged between Zotero and one online collection, but I think the principles here are generic enough that if folks just agreed on some conventions we could do some really cool stuff.

The Clara Barton papers are digitized in full, but in keeping with archival practice, they are not described at the item level. In this case, the collection has folder level metadata. So since it’s items all the way down in a sense, the folders are the items.

As a result, you get things that look like this, Clara Barton Papers: Miscellany, 1856-1957; Barton (Clara) Memorial Association; Resolutions and statements, 1916, undated. This is great. I am always thrilled to see folks step back from feeling like they need item level description to make materials available on the web. Describe to whatever level you can and make it accessible.

Clara Barton Papers Folder Level Item

Clara Barton Papers Folder Level Item


With that said, I’m sure there are people who are willing to pitch in and make some item level metadata for the stuff in that folder. Beyond that, if a scholar is ever going to actually use something in that folder and cite it in a book or a paper they are going to have to create item level description. Wouldn’t it be great if there was a generic way for the item level description that happens as a matter of course to put a footnote in an article or a book could be leveraged and reused?

Scholars DIY Item Level Description in Zotero

Everyday, a bunch of scholars key in item level description for materials in reference managers like Zotero. To that end, I’ll briefly talk through what would happen if someone wants to capture and cite something from the Clara Barton Papers in Zotero. Because there is some basic embedded metadata in that page, if you click the little icon by the URL you get that initial data, which you can then edit. You can also then directly save the page images into your personal Zotero library.

So you can see what that would look like below. I started out by saving the metadata that was there, I logged the URL that the actual item starts at inside the folder, changed it from a web page to a document, keyed in the title and the author of the document. I also saved the 2 actual images that are associated with the two images from the 19 images that are actually part of the item I am working with as attachments to my Zotero item.


Creating an item level record for materials in the Clara Barton papers folder in Zotero for the purpose of citing it.

So, now I can go ahead and drag and drop myself a citation. Here is what that looks like. This is what I could put in my paper or wherever.

Logan, Mrs. John. A. “Affidavit of Mrs. John A. Logan,” 1916. Miscellany, 1856-1957; Barton (Clara) Memorial Association; Resolutions and statements, 1916. Clara Barton Papers. http://www.loc.gov/resource/mss11973.116_0449_0467/#seq-3.

Now, wouldn’t it be great if there was a way for Zotero to ping, or do some kind of track back to the repository to notify folks that there is potentially a description of this resource that now exists in Zotero. That is, if I could ask Zotero’s API to see every public item they have that is associated with a loc.gov URL. In particular, every item that someone actually went through the trouble to tweak and revise as opposed to the things that are just the default information that came out to begin with.

Connecting Back from the Zotero instance of the Item

At this point, I added in descriptive information, and because I have the two actual image files, I also know that the information I have refers directly to mss/mss11973/116/0400/0451.jp2 and mss/mss11973/116/0400/0452.jp2. So, from this data we have enough information to actually create a sub-record for 2 of the 19 images in that folder.

Because I have a public Zotero library, anyone can actually go and see the Item level record I created for those 2 images from the Clara Barton Papers. You can find it here https://www.zotero.org/tjowens/items/itemKey/IHKBH5WQ/. In this case, the URL tells you a lot about what this is off the bat. It’s an item record from Zotero.org user tjowens and it has a persistent arbitrary item ID in tjowens’ library (IHKBH5WQ). Right that page could track back to the URL it is associated with, or even something simpler than that, just a token in the link that a repository owner could look for in their HTTP referrer logs as an indicator that there is some data out there at some URL that describes data at a URL that the repository has minted. So for instance, just stick ?=DescribesThis or something on the URL, like http://www.loc.gov/resource/mss11973.116_0449_0467/#seq-3?=DescribesThis . Then tell folks who run online collections to go and check out their referrer traffic for any incoming links that have ?DescribesThis in them. From there, it would be relatively trivial to review the incoming links from logs and decide if any of them were worth pulling over to add in as added value of descriptive metadata.


Here is an image of the Item page created for the record I made in Zotero

Aside from just having this nice looking page about my item, the Zotero API means that it’s trivial to get the data from this marked up in a number of different formats. For instance, you can find the JSON of this metadata at https://api.zotero.org/users/358/items/IHKBH5WQ?format=json


The JSON from the Zotero API for the item I created there. It’s easy enough to parse that you can pick out the added info I have in there, like the title and author.

So, if someone back at the repository liked what they saw here, they could just decide to save a copy of this record, and then ingest it or integrated it with the existing records in your index through an ETL process.

What I find particularly cool about this on a technical level, is that it becomes trivial to retain the provenance of the record. That is, an organization could say “description according to Zotero user tjowens” and link out to where it shows up in my Zotero library. This has the triple value of 1) giving credit where credit is do and 2) offering a statement of caveat emptor regarding the accuracy of the record (That is, it’s not minted in the authority of the institution but instead the description of a particular individual) and 3) providing a link out to someone’s Zotero library that likely could enable discovery or relate materials from other institutions.

Linked Open Crowdsourced Description

The point of that story isn’t so much about Zotero and the Clara Barton Papers, but more about how with a little bit of work, those two platforms could better link to each other in a way that the repository could potentially benefit from the description of it’s materials that happens elsewhere.  If a repo could just get a sense of what people are describing of it’s materials, they could start playing around with ways to link to, harvest, and integrate that metadata. From there, organizations could likely move away from building their own platforms to enable users to describe or transcribe materials and instead start promoting a range of third party platforms that simply enable users to create and mint descriptions of materials.


Posted in Uncategorized | 5 Comments

Where to Start? On Research Questions in The Digital Humanities

How should digital humanities scholars develop research questions? Spurred on by this recent conversation on twitter, I figured I would lay out a few different ways to go about answering this question about questions. The gist of the dialog is that Jason Heppler suggested that one should “Fit the tool to the question, not the other way around” in terms of working with various kinds of new digital humanities tools. I take tools here to mean any computational instrument employed to understand the world; for examples GIS, topic modeling, creating simulations using cellular automata or agent based models, analyzing frequencies of audio files, or visualizing trends in images. I get where Jason was going, but at least as it was formulated I don’t think it is the right advice.

The conversation prompted me to try and clarify a bit of how I see the relationship between research questions, primary sources, and tools and methods.

Start with the Question, the Archive or the Tool?

Some historians start with their question, some start with a familiarity with a period that suggests that exploration of a particular archive or collection of primary resources could answer. Here are two examples I can recall from colleagues who I worked with doing research in the history of science.

One colleague was aware of the shift that had occurred between classical and modern physics in one astronomer’s work, documented in a recent essay. So he went to look at the papers of another astronomer, which had not yet been particularly well explored, to see if similar or different responses to the notion of a distinction between classical and modern physics had emerged in that astronomer’s work. In short, it was largely about abstracting the results of one exploration into the information available in another individuals archive.

In either case, it’s a bit of a dance between formation of questions and the ways that those questions open up or shift and change as one gets into the complicated, rich and vast space of the possibilities of primary sources.

The Function of Research Questions in History/the Humanities

Back up a bit. What is the purpose of research questions in the humanities? I would posit that the purpose of them is to clarify what is in and out of scope in a project. To define where a project should start and end. Lastly, research questions provide a constant point of reference to check back on when working on a project. You write down your questions as you go, and you can always pull them out again and check to see if, in fact, you are actually working to answer them or if you have drifted off to some other problem. Research questions are useful structures to organize your work and inquiry and they are valuable tools for signifying to others what to expect from a piece of scholarship. Research question are functionally an attempt to establish the set of criteria by which a piece of scholarship should be evaluated.

The Problem of Research Proposals and Fancy Writing

One of the big problems in talking about research questions is that one often describes research questions and methods in research proposals (for grants or dissertations etc.), and those proposals are often really a form of what Joe Maxwell calls “fancy writing.” That is, those kinds of research proposals are more about the performance of demonstrating how smart you are and why you should be given permission to do work than they are about actually trying to get research done. If you haven’t read it, I can’t recommend Joe’s Qualitative Research Design: An Interactive Approach strongly enough. In focusing on the actual purpose of research design and not the performance of proposal writing he cuts through a bunch of the fancy stuff to get to the way that research questions actually develop and evolve. He calls it an interactive approach, but I think iterative would be just as descriptive.

In Maxwell’s approach, there are five components of research design as it is actually practiced.

  1. Your goals (the reason you are doing the research),
  2. Your conceptual framework (the literature you are working in, your field, your experience that you draw from),
  3. Your research questions (a set of clear statements of exactly what you are studying)
  4. Your methods (broadly conceived as the way you are going to answer the question, so for historians both the archives/sources you will work from and their perspectives are relevant as well as the way you will sample/explore them, and the actual techniques you will use to analyze and interpret them)
  5. The validity concerns and threats (literally, answers to the question “how might you be wrong” where you work through inherent limitations and biases in your methods, sources, perspective, etc.)

The diagram below illustrates how 5 components of design interact

Illustration of how research questions should be itteritivly defined and developed in relation to purpose, conceptual framework, methods, and validity threats.

Illustration of how research questions should be iteratively defined and developed in relation to goals, conceptual framework, methods, and validity threats. From Maxwell 2014

The main point of the diagram, is that your research questions should be iteratively revised and refined throughout the work based on all the four other things that you are working on.

So… research questions aren’t something you state and then follow through on, they are best thought of as statements about your inquiry that are iteratively refined through the process of defining what you are working on.

Generally, the way that research questions are stated in quantitative research is bogus, or at least, bogus in terms of the way that people who do more qualitative research think of research questions. That is, you do a lot of work and scholarship before you can ever formulate a hypothesis that you can test. In that case, you end up with a research question at the end of an exploration not at the front of it.

Tools, Archives, & Research Questions are Inherently Theory Laden

Getting back to the issue of questions, tools, and sources; being good humanists, it is worth leaning back to grok that all method is theory laden. That is, every attempt to answer a question comes with inherent theoretical assumptions about the problem and limitations in what that method can provide in terms of answers. This is true of method broadly conceived; every method for collecting sources/evidence, the original intent by which records and sources are collected create silences, identifying a problem, interpreting sources, composing and reporting on results, all of that, comes with some inherent biases.

That is, all tools, all archives and all research questions are in and of themselves instrumental. We use them in an attempt to understand the world. That is they all serve as lens like tools reflecting and refracting back information in a tool like fashion. I’ve always liked the way that Umberto Eco explains this in Kant and the Platypus as a core concept in hermeneutics; we make interpretations but the underlying reality of existence exerts the force to resist some of those interpretations by simply saying “No” by making it clear that an interpretation can be refuted. A hermeneutics of data that emerges through the use of tools.

So where to start? Start wherever, as long as where you start is anchored in your goals. The hermeneutics of screwing around is itself invaluable. A technique of messing with tools and datasets at hand may well surface interesting patterns that no one would have found if they were working at sources in a another fashion. Pick and archive and find the questions. Or, just start with your questions and work it that way. Whatever you do, realize that it’s an exploratory process.

What matters most in where you start is your actual goals in doing the research. That is, why is it that you are actually doing your work? What is it that you hope your work will potentially do. Don’t confuse your goals with what you are interested in, realize and recognize that your goals area about the purpose of your work. If you want to do work that ultimately helps to understand and give voice to the voiceless then you likely don’t want to start messing around with the text of inaugural presidential speeches. If you want to figure out new kinds of things that can be done with topic modeling then you would presumably want to start with some sources that are in a form or close to a form that you can topic model.

Thanks to Thomas Padilla and Zach Coble who reviewed and provided input on a draft of this post.


Posted in Uncategorized | 3 Comments

Digital Archivists: Doing or Leading the Digital?

I’ve been enjoying Jackie Dooly’s recent series of posts looking at the skills and duties that are showing up in job postings for digital archivists.  I’m excited to see archives listing these. Staffing up illustrates how the issues of electronic records have risen to a significant issue in the minds of the deciders.

Like many who share this particular job title, I have some complicated feelings about the idea of “The Digital Archivist.” While my official job title is Digital Archivist, I’ve generally added a caveat. When I encounter someone else with that title, I often go on to explain that I’m more of a meta-digital archivist. That is, most of what I do is about policy, strategy, and standards; establishing and documenting practices, and collaborating to document and codify emerging practices. However, I’m becoming increasingly convinced that most of what I do is actually largely what digital archivist jobs should be doing.

I think the confusion about what a digital archivist should do is mostly summed up as follows;

Digital archivists should not the people who do the digital stuff. Everybody (including the digital archivists) need to pick up the skills necessary to work with digital records. Instead, digital archivists should be the people who are hired to lead the digital stuff.

I will elaborate on what I mean by this a bit more. I think my main issue with the idea of the digital archivist role is that I want to answer yes to two questions that some folks might imagine to be directly opposed to each other.

Should all archivists be able to work with digital materials? Yes. In this sense, all archivists must become digital archivists. It’s just a part of ongoing professional development. Digital records are not a niche area of material. Digital records are increasingly just a part of the materials archivists need to be able to process. I think some of Rebecca Goldman’s  tweets on this subject illustrate the point. Other fields haven’t hired digital waitstaff, digital nurses, digital journalists, or digital lawyers to deal with the challenges of professional development around technology in their fields.

Screen Shot 2014-06-12 at 11.44.35 AM

Then, does it make sense to have digital archivists as digital specialists? Yes. While everybody needs to have a basic capability, it does make sense to be cultivating leaders and specialists. In this sense, I think the digital archivists jobs are best thought of as having someone who devotes their time to continually 1) figuring out and refining digital process, workflows and tools, and  2) teaching the rest of the staff the techniques and processes they are developing. This means ideally digital archivists straddle a leadership and practice role.

Ongoing Leadership in Digital Work:  Ideally, we all become educators in this future because the only likely thing to stay constant is going to be change. We aren’t going to just establish the new “digital” practices and be done with it. The nature of digital technologies are continually shifting dramatically. That is, the shift from storing information on devices to thin client cloud set ups is frankly has big as the shift from paper to hard drives. The first sixty years of digital technologies has illustrated that there is every reason to believe that the technological mediums and nature of records  will continue to evolve frequently and we are going to need responsive practices to continually evolve with them.

An example from a different field:  I think we can look to the idea of the “School Based Technology Specialist” (SBTS) role as a way to think about this. Instead of hiring someone to be the “computer person” for each of the schools in Fairfax county school district the district created the SBTS role. The idea being that across the schools teachers need to be making better use of computing technology. So it’s not about hiring someone to be the computer person but hiring someone who is functionally an administrator to build capacity for teachers to incorporate digital technology into their practice.

In this vein, SBTS are described as trainers, liaisons, managers, troubleshooters, consultants and collaborators. I think the parallels to the digital archivists role are rather clear. Now, schools and archives are still rather different, so it doesn’t necessarily map over straight away. But still, I think the parallels are meaningful. The digital archivist role can be thought of as a leadership role for establishing practice. I think organizations would do best to think of how digital archivists can be empowered and given the authority to lead work on digital materials.

Curious for others’ thoughts on this.

Posted in Uncategorized | 3 Comments

Mecha-Archivists: Envisioning the Role of Software in the Future of Archives

The Cybermen, exemplify our worst fears about the future of technology. People literally turned into machines replaced and ruled by machines. I think this is the face of a fear of a technological future of archives.

The Cybermen, exemplify our worst fears about the future of technology. People literally turned into machines replaced and ruled by machines. I think this is the face of a fear of a technological future of archives.

I had the privilege of participating in The Radcliffe Workshop on Technology and Archival Processing a few weeks back. I was thrilled to be on a great panel with some early career historians and Maureen Callahan.

Maureen posted her talk The Value of Archival Description Considered online. I encourage you to read it. It’s super good. I was thrilled to find that, I think we are on nearly the exact same wavelength about the future of the finding aid.

There was a nice write up about the event in the Harvard Gazette. I won’t deny that I may be “a millennial who displayed affection for the word “awesome” during the panel.” However, there are some clarifications I should make.  I did not talk about obeying “cyborg overlords”, or a “mechanized shirt of armor.” In sharing some of the points of my talk I thought it would be good to focus in particular on parts of these clarifications. I think getting the language right about the future of our relationships with software is important, so here goes.

Maureen Welcomed the Robot Overloards, but with good reason!

Maureen had a few great lines in her talk (again, if you haven’t read it go do so now). One of those lines was her take on a Simpsons quote, “I for one welcome our robot overlords.” She went on to explain, in an even better line, “I don’t think that archivists are just secretaries for dead people, and I welcome as much automation as we can get for this kind of direct representation of what the records tell us about themselves.” I love this quote. When I was sitting there listening to her I was nodding so much. This is exactly the sentiment I wanted to get at.

The future of digital tools for archives is not replacing the work. It is automating the parts of the work that are not the intellectual labor. Along with that, the future of these tools is largely about taking advantage of the affordances in the nature, structure and order of digital media which give us considerable power to scale up our actions and interventions in the record.

I took the key theme from her pitch to be something like, let the algorithms and digital tools do the repetitive and less intellectual labor of the archivist, and get the archivist more involved in the intellectual labor of the archives. Specifically, in better contextualizing, explaining and describing the provenance of collections and making the decisions that require the kind of sophisticated judgment that people have and exercise. Without knowing where she was going, I touched on several similar themes in my talk. Ideas and visions of the labor relationship between the archivist of the future and the algorithms, scripts and tools that work for her and do her bidding.

Robot Overloards

The welcoming of Robot overlords

We get to wear the robots!

This lego mecha exo-suit is the vision I think we want for the future of digital tools in archives. Here, this mechanized power armor gives the Archivist super powers. Forget lifting a 30 lb box, in this suit you could move whole collections with ease. But that’s aside from the point. This kind of power tool lets you do a lot of the laborious parts of the work and get back more quickly to the intellectual labors.

This lego mecha exo-suit is the vision I think we want for the future of digital tools in archives. Here, this mechanized power armor gives the Archivist super powers. Forget lifting a 30 lb box, in this suit you could move whole collections with ease. But that’s aside from the point. This kind of power tool lets you do a lot of the laborious parts of the work and get back more quickly to the intellectual labors.

So we don’t want the dark vision of the robot master. We certainly don’t want the machines turn us into into the Borg or Cybermen, who lose their souls as they are taken over by the emotionless machine.

My vision for the future of the archivist using digital tools is less Borg and more Exo-suit.

The idea of mecha or exo-suits, illustrates a vision of technology that extends the capabilities of it’s user. That is, the kinds of tools I think we need going forward are exactly the sort of thing that Maureen was talking about. Things that let us automate a range of processes and actions.

We need tools that let us quickly work across massive amounts of items and objects by extending and amplify the seasoned judgment, ethics, wisdom, and expertise of the archivist-in-the-machine.

Fondz as a Tool Thought Experiment for Automation

I was recently working with some archivists who had a project where they had nearly 400 floppy disks containing drafts of letters, books, essays, etc. In short, digital copies of all the kinds of things you find in a collection of someone’s personal papers. I hope to write about that project in more detail in the future, but for now I just wanted to talk a little about a tool that got cooked up in the process. So, what can you do with some 19,000 documents like this? Now, you can learn a ton about a set of digital files by extracting and identifying them in automated processes. That is, what kinds of files they are, their file names, size, etc. It’s really useful data! However, in most cases, this is not at all the data that a researcher or other user who might work with the collection would want. Inevitably, users want to know where information related to x, y, or z is in a collection. That is, users care about topics and subjects, and the kinds of tools most of us have at hand don’t really do much with that.

Here you can see some of the very basic kind of information that is relatively easy to get at with existing tools, numbers of files, their size and their formats. This image shows the files processed and presented by Fondz in a particular test set come from 379 bags (in this case each bag contains a logical disk image). Collectively this includes 18,414 files in 49 formats.

Here you can see some of the very basic kind of information that is relatively easy to get at with existing tools, numbers of files, their size and their formats. This image shows the files processed and presented by Fondz in a particular test set come from 379 bags (in this case each bag contains a logical disk image). Collectively this includes 18,414 files in 49 formats.

To this end, I asked my colleague Ed Summers a while back if it would be possible to  strip out all the text from these documents, topic model it, and then use the topic models as an interface to the documents. In response, he cooked up a tool called Fondz.

For those unfamiliar, the MAchine Learning for LanguagE Toolkit (MALLET) describes topic modeling as follows. “Topic models provide a simple way to analyze large volumes of unlabeled text. A “topic” consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings.” In this case a tool like MALLET can quickly look across a large collection of texts and identify topical clusters of terms that appear near each other.

How Edsu describes Fondz on github.

How Edsu describes Fondz on github.

I really like how Ed describes Fondz, so I’ll share it here.

fondz is a command line tool for auto-generating an “archival description” for a set of born digital content found in a bag or series of bags. The name fondz was borrowed from a humorous take on the archival principle of provenance or respect des fonds. fondz works best if you point it at a collection of content that has some thematic unity, such as a collection associated with an individual, family or organization.

Example of the Fondz topic driven interface to documents in an archival collection

Example of the Fondz topic driven interface to documents in an archival collection

Above, you can see an example of Fondz in use. This is a list of the topics that Mallet identified, in each case you see the number of documents associated with the topic on the left and in the blue box you see the terms which Mallet has identified as being associated with that topic. That first one, with 776 documents, ends up being a cluster of files versions of biographical notes and CVs, the third one, with 309 topics, is materials related to a novel and a film adaptation of that novel. Mallet doesn’t know what those topics are. It just sees clusters of terms. Based on my knowledge of the collection, I’m able to identify and name those clusters.

The result of all this is a topical point of entry to explore 19,000 digital files from hundreds of floppies. It would work just as well for OCR’ed text from recent typed and printed text. I can’t show it to you in action because I don’t have a test collection that I can broadly share. (Note, anyone who has a similar collection they can broadly share contact me about it) But take my word for it. You click on one of those topics and you see a list of all the files that are associated with it and if you click on the name of one of those files you end up seeing an HTML representation of all the text inside that file. Alongside this, a future idea would be to integrate tools that do things like Named Entity Extraction (NER) to identify strings of text that look like names of people, places and locations. Indeed, there are already attempts to use NER for disambiguation in cultural heritage collections. What is particularly important here is not that we build tools that do this “right” but that we find and use tools that make things that are “good enough” in that they are useful in helping people explore and find things in collections. This isn’t about robots just doing all the work. It’s about extending and amplifying our ability to make materials available to users in ways that help them “get to the stuff.” Aside from that, there is a need to provide users with information on what actions were preformed on the collection to make it available. To that end, it’s exciting to realize that we can simply document what tools were used so that anyone can explore the potential biases of those tools in how they create interfaces to collection data.

So what does this all have to do with cyborgs and mecha? What is in some ways most interesting to me about topic modeling is that the topics themselves are actually somewhat arbitrary and meaningless. A topic in MALLET isn’t so much a topic in regular parlance as it is just a cluster of words that tend to appear together. It takes someone who knows the texts to make sense of those topics, to fiddle with the dials till they get topics that seem hang together right (in MALLET you pick how many topics you want it to look for). So Fondz will be far more useful when it integrates processes for archivists to exercise their expertise and their judgment and intervene. When they can name the topics and describe them. When they can accept or reject some of the topics, when they can rerun them.

Since the goal here is to make useful descriptions there is a potential here for topic modeling to be used instrumentally to surface connections for an archivist to find useful or not useful and to save the useful ones and describe them. Given that good processing is done with a shovel, not with a tweezers it is exciting to think about how tools like Fondz could integrate a range of techniques for computational analysis of the content of files to act as steam shovels; instruments that put the archivist in the driver’s seat to explore and work through relationships in collection materials and expose those to users.

There are a bunch of other cleaver things that Ed is doing with Fondz that warrant further discussion, but for the purpose of this post that does it. As far as take-away messages go, I’d suggest the following. The future of digital tools for digital archives is not about tools that “just work.” It’s not about replacing the work of archivists with automated processes, it’s about amplifying and extending the capabilities of an archivist to do cleaver things with somewhat blunt instruments (like topic modeling, NER, etc.) that make it easier for us to make materials accessible. Given that the nature of digital objects is a multiplicity of orders and arrangements, if we can generate a range of relatively quick and dirty points of entry to materials we can invest more time and energy in making sure that when someone gets down to the item they have breadcrumbs and information that situates and contextualizes the item in it’s collection and it’s custodial history. We need archival-mecha, tools that give archivists superpowers by amplifying their judgment, wisdom, knowledge, ethics and expertise in working with digital materials. We need to make sure we are getting the computers to do what computers do best in supporting the praxis of archival practice.


Posted in Uncategorized | 4 Comments

Einstein as Science Santa: Monumental Meanings & Wil Wheaton

Recently, Wil Wheaton posted a picture and quote on twitter and his blog (That time I met Albert Einstein) making use of the Albert Einstein Memorial at the National Academy of Science. It’s great, he is sitting on Einstein’s lap, making requests to Einstein as a kind of physics Santa. I really love how the post, and all the likes and favorites it has gotten reinforces a set of points I made about the memorial in my essay Tripadvisor rates Einstein: using the social web to unpack the public meanings of a cultural heritage site.wheaton-eisntein

I love the way this photo fits with much of the informal and playful ways that other photos of the monument work. Here is a bit of some of what I wrote in the piece on some photos of the monument on Flickr. The images are from Flickr, and the quotes from Yelp reviews of the memorial.

Most monuments in the area establish a kind of formality between visitors and the monument. Many are constructed to physically remove the subject from the reach of visitors. Others, like the nearby Lincoln Memorial establish this formality through written rules about respectful behaviour, and a request for hushed voices. Nearly all of the reviews (17 of 21) focus on elements of the informality of the monument as a key component of what makes it enjoyable. The reviewers tell us to “climb all over ‘Al’” or as another suggests “sit on his lap, or kiss his cheek”. On Flickr, photographers have captured this in images of visitors picking and rubbing his nose, kissing him, or in a few cases arguing with him. While there is no posted notices which suggest that it is ok to climb him, if you stop by the monument on any summer day you will witness a queue of visitors waiting to climb up on him and have their picture taken.

An example of how groups of tourists use the memorial to stage group photos

The pictures are themselves an important element in this experience. The image above provides an example of one of the most popular kinds of images of the memorial posted on Flickr. As one reviewer notes, “everyone needs at least one picture of themselves sitting on “Al’s” lap”. As you can see from the photograph, the scale and size of the monument makes it work as a space for staging photos. The monument is so photogenic that one reviewer suggests that it “just begs you to go sit on Uncle Al’s lap and get our picture taken”. For these reviewers a central part of the experience is the informality that the monument provides. It invites them to climb him, and leave with photographic evidence of them sitting on the world’s most instantly recognisable scientist. While everyone has photos of themselves standing in front of the Lincoln Memorial these reviewers believe “Your tour of the Mall is not complete” without having your picture taken on Einstein’s lap.

Photo: Schmidt, C., 2008. Arguing with Einstein, Available at: http://www.flickr.com/photos/chrisbrenschmidt/2190660089/

It is worth taking moment to reflect on how some of the previous quotes refer to Einstein. The informality of these experiences is further communicated through a persistent use of his first name, or in some cases the diminutive form of his name, Al. This is itself a frequent component of these reviews. In using his first name, or calling him ‘Al’ the reviewers are communicating and playing with the informality of the memorial. The pervasiveness of this informality may be best evidenced in the recollections of a college student from a nearby university who ‘spent a lot of time just hanging out with ‘Al’’. The informality of the space and the fact that it is climbable leads many reviewers to discuss how it is a perfect place to bring kids. Many of the photos of the monument on Flickr show young children climbing all over him.

This level of informality is not something that all the reviewers think is necessarily a good thing. One reviewer suggests “most of the neat stuff was totally ignored by all the kids using the statue as a playground”. This reviewer goes on to suggest that the other elements in the composition of the statue, the quotations, and the map of the stars at his feet go unnoticed. From his perspective, visitors were “just jumping around”. He felt that “no one learned or read about the man memorialised”. This reviewer further suggests that it is ‘disrespectful’ to climb all over the monument, particularly, when there is no clear indication that touching or climbing the memorial is officially sanctioned by the sculptor or the National Academy of Sciences. There is defiantly credence to the questions the reviewer raises. To what extent are these visitors leaving with an understanding of the intentions behind the memorial? Certainly some visitor’s suggestions that “You can climb on the damn thing and stick pennies up his nose” take on a disrespectful tone. However, that is itself an interesting point of tension in the idea of Einstein. The more recently constructed Franklin Delano Roosevelt Memorial, which is built on a scale that would allow one to climb on him, does not invite the same kind of interaction. Popular notions of Einstein as an informal figure have translated into how people interact with the memorial. The relaxed experience Berks found in sculpting the memorial from life is very directly translated into visitor’s comments about the informality and relaxing nature of the experience of the monument.

This is just a way of saying, for those of us interested in public memory and the role of memorials really need to be watching the ways that people make use and sense of them on social media. At this point, our experiences of these spaces are increasingly going to be seen through the lens of the tweets, reviews, and photos that others have taken and shared and commented on them.

Posted in Uncategorized | Leave a comment

Digital Preservation’s Place in the Future of the Digital Humanities

The following is the rough notes for a talk I gave at the University of Pittsburgh’s iSchool. I’ll likely come back later to iron out any kinks in them, but figured I would get them up sooner rather than later so here they are. Thanks to Alison Langmead for the invitation. You can review all the sides here

Ensuring long term access to digital information sounds like a technical problem; like it could be a problem for computer scientists to solve. If we could only set up the right system we could “just solve it”. Far from it.

Digital Preservation is not primarily a technical problem

I’ve become increasingly convinced that digital preservation is in fact a core problem and issue at the heart of the future of the digital humanities.

In this talk, I will suggest how some issues and themes from the history of technology, new media studies, and archival theory, gesture toward the critical role that humanities scholars and practitioners should play in framing and shaping the collection, organization, description, and modes of access to the historically contingent digital material records of contemporary society. That’s a mouthful. In short, I think there is a critical need for a dialog and conversation between work in the digital humanities and work building the collections of sources they are going to draw from.

This is a broad topic, and I am trying to pull a lot of different strands from different fields together here. So this is going to be less a comprehensive argument and more of a survey, glancing off a range of projects and ideas that point toward the important interconnections that already exist between the digital humanities and digital preservation.

What is a Digital Historian Doing with Digital Preservation

When I tell people I am a historian and I work on digital preservation I get a lot of confused looks. What on earth is a digital historian and what does it have to do with digital preservation? I’m not entirely sure what being a digital historian entails, but as far as google image search is concerned, I’m part of the definition. (It’s my picture there in the green).

What google image search thinks digital historian looks like

What google image search thinks digital historian looks like. I’m on the grass.

But back to the point, when I mention that I do digital history and I work on digital preservation I’m often asked questions like “Isn’t that IT? Isn’t that technical? Is that like computer science? Or, library science or something?” Initially I was a bit timid, in responding to these queries. I was still finding my way through a highly technical field myself. I’d assert that understanding the born digital records of our society are in fact very important to historians. But I’ve been becoming bolder in this regard.

Trying not to Define the Digital Humanities

Yes, digital preservation is a technical field, one that requires technical skills. However, it also requires extensive technical skills in, say German to be able to be a good Art Historian studying Modern German Art. An understanding of digital artifacts should be a central part of the emergent digital humanities.

What Google Image Search's Hive Mind thinks the Digital Humanities is/are.

What Google Image Search’s Hive Mind thinks the Digital Humanities is/are.

This brings us to the second part of the title. What does digital preservation have to do with the emergent field of digital humanities. The digital humanities are different things to different people and I don’t want to spend too much time trying to define it/them. Again, in google image search’s hive mind the digital humanities have something to do with word clouds, projects, debates and logos.

Working Definitions of the Digital Humanities

In any event, I see three primary areas of activity in DH.

  1. Computational Analytic Methods: Here I’m thinking about computational approaches to studying primary sources (think here of Google’s n-gram viewer, of corpus analysis, of various and sundry ways of using computers to count things and conduct distant reading),
  2. Experimentations in the Format of Scholarship: Here I’m thinking about work on the future of digital scholarly communication and publication (new kinds of journals, about digital scholarship, projects like Ed AyersValley of the Shadow, various kinds of online exhibitions and presentations of primary sources using platforms like Omeka),
  3. Interpreting the digital record: interpreting born digital primary sources. This last area is essential to the future of the first two.

If the digital humanities is ever to study the 21tst century that study is going to be based on born digital primary sources. We need forms of digital Hermeneutics, the reflexive process of interpretation at the heart of humanities scholarship, that fit with digital texts and artifacts.

Selection and Definition: Points of Contact Between Humanists and Preservers

Importantly, there are two primary issues that humanists have a lot to offer in shaping the digital historical record. Selection and Definition.

  1. Selection: What is collected and preserved
  2. Definition: What features of digital objects are significant to preserve


We can’t count on benign neglect as a process of waiting to figure out what might matter in the future. The failure rate on most consumer grade digital media is much, much shorter than the failure rate on analog media. Further, when digital media fail it’s often complete, as opposed to being partially recoverable. To that end, there is a need for many to follow in the footsteps of projects like the Center for History and New Media’s September 11th Digital Archive, where a group of historians intervened and launched a site to crowdsource the collection of everything from text messages, emails, and other digital traces of the attacks for future historians to make sense of them. Learning lessons from areas like oral history collection, it is essential for historians to wade in and actively work to ensure that the digital ephemera of society will be available to historians of the future.

The point about selection is important, but it’s largely contiguous with current practices. Decisions about selection for collections are always fraught and contingent on the values and perspective of the collecting institution. Far more problematic, is the fact that the very essence of what a digital object is is itself contentious and dependent on the kinds of questions one is interested in.

What is Pitfall? It depends on what your research questions are.

What is Pitfall? It depends on what your research questions are.

For instance, what is Pitfall? Is it the binary source code, is it the assembly code written on the wafer inside the cartridge, is it the cartridge and the packaging, is it what the game looks like on the screen? Any Screen? Or is it what the game looked like on a cathode ray tube screen? What about an arcade cabinet that plays the game? The answer is, that these are all pitfall. However, for different people; individual scholars, patrons, users, etc. what Pitfall is is different. If humanists want to have the right kind of thing around to work from they need to be involved in pinning down what features of different types of objects matter for what circumstances.

This point is expansive, so I’ll briefly gloss it before going into depth on each of these topics. In keeping with much of the discourse of computing in contemporary society, there is a push toward technological solutionism that seeks to “solve” a problem like digital preservation. I suggest that there isn’t a problem, so much as there are myriad local problems contingent on what different communities’ value. With that said, this is not a situation of “anything goes” digital media are material, and based on inscription, a set of insights from new media studies which offers a new basis for us to develop a an approach to source analysis and criticism that has a long standing history in fields like textual scholarship


One of the biggest problems in digital preservation is that there is a persistent belief by many that the problem at hand is technical. Or that, digital preservation is a problem that can be solved. I’m borrowing this term from Evegeny Morozov, who himself borrowed the term solutionism from architecture. Design theorist, Michael Dobbins explains, “Solutionism presumes rather than investigates the problem it is trying to solve, reaching for the answer before the questions have been fully asked.” Stated otherwise, digital preservation, ensuring long term access to digital information, is not so much a straightforward problem of keeping digital stuff around, but a complex and multifaceted problem about what matters about all this digital stuff in different current and future contexts.

The technological solutionism of computing in contemporary society can easily seduce and delude us into thinking that there could be some kind of “preserve button”. Or that we could right click on the folder of American Culture on the metaphorical desktop of the world and click “Preserve as…” In fact, as noted in the case of Pitfall! defining what it is that one wants to keep around is itself a vexing issue. In digital preservation this problem is often smuggled into the notion of “significant properties.”


Chimerical Significance

The problem that is all too often swept away in technical discussions of preservation is what is to be preserved. That is, in established practices for digital preservation, like web archiving, attempting to preserve rendered content is the assumed solution. Just grab the HTML and files displayed when an HTTP request is made and then play them back in a tool like the wayback machine. With that noted, it’s critical to realize that making sense of and interpreting, performing if you will, that content is itself a complex dance involving differing ideas about authenticity.

In the case of a web page, is it its source code, or what it looks like rendered? Is it what it looks like rendered on the particular version of the particular browser it was composed to be viewed on? Is it what it looks like when it runs on a computer with a particular vintage of internal memory clock that produces part of how visual elements flicker? If you are only interested in the textual record of the site, then the text is all you need. But if you are a conservator of net art and this happens to be an important work, you may need to spend considerable time doing ticky tacky work to ensure that the work retains it’s fidelity to it’s creators intent.

To make this a bit more concrete, we can turn to a small corner of a now extinct neighborhood in Geocities. For those unfamiliar, Geocities was an early online community which Yahoo! turned off in 2009. Due largely to the work of ArchiveTeam, a self described group of rogue archivists, much of Geocities was collected and distributed. Looking at a small sliver of that archive can underscore some of the issues at the heart of the problem of preserving and accessing this kind of material.

Geocities page viewed through the Internet Archive's Wayback Machine

Geocities page viewed through the Internet Archive’s Wayback Machine

Same Geocities site as presented in One Terabyte of the Kilobyte Age.

Same Geocities site as presented in One Terabyte of the Kilobyte Age.

Here are two images of archived copies of a spot in the Capitol Hill neighborhood of Geocities. This first one is what it looks like rendered on my browser at work. This second one, is what it looks like as presented in One Kilobyte of the Terabyte Age. Created by Olia Lialina & Dragan Espenschied. One Terabyte of Kilobyte Age,  is in effect a designed reenactment of geocities grounded in an articulated approach to accessibility and authenticity which plays out in an ongoing stream of posts to a tumblr account. Back to the two images: Note that the header image is missing in the first one, as displayed in my modern browser. The image is still there, but my browser isn’t doing a good job at creating a high fidelity presentation of what the site should look like.

The point is, that you can’t just “preserve it” because the essence of what matters about “it” is something that is contextually dependent on the way of being and seeing in the world that you have decided to privilege. In the case of something like Geocities, it turns out that there are a bunch of different decisions one can make about fidelity and authenticity and different collections are taking different approaches.

Dragan's take on the trade offs inherent in different approaches to authenticity and accessibility for preserving webpages.

Dragan’s take on the trade offs inherent in different approaches to authenticity and accessibility for preserving webpages.

Dragan’s vision for the presentation is anchored in this continuum of authenticity and accessibility across the entire stack of technologies at play in the presentation of a web page. That is, One Kilobyte of the Terabyte age is a kind of critical edition (a mainstay as a scholarly product) of geocities. Unlike many other web archiving projects, Dragan is very upfront about what it is that he has decided to privilege and focus on in this special collection or critical edition of geocities. The resource he has created here is both an interpretation and a point of access into some of the most significant properties of Geocities that might otherwise be lost.

In short, deciding what it is that one want’s to keep is vexing and problematic, with that said, it is critical to note that we do actually have something to hang on to here. There is in fact a there there when it comes to digital objects. Further, the work of humanities scholars to understand the fundamental forensic and textual traces of digital objects points the way forward to a hermeneutics, an interpretive approach to understanding and studying digital primary sources. The most essential work in this area is Mathew Kirshenbaum’s work in Mechanisms: New Media and the Forensic Imagination.

Materiality & Inscription

We all know that digital media is binary, that somewhere there are screens of ones and zeros doing something like in the Matrix.


The binary essence of digital media, the one’s and the zeros of it all, are in fact texts. Inscribed at the limits of augmented human perception, the sequences of bits on a hard drive are still very much material. Inscribed in the sectors of a disk are files in formats intended to be read and interpreted by different pieces of software, software which is itself inscribed on different pieces of storage media. The point here is that the longstanding traditions of studying texts, of interpreting them, have a home at the basic root level of digital objects which are both sequences of textual information and material culture visible in magnetic flux transitions on disk or the pits on optical media.


The structures of this media share an affinity with a strand of archival theory too.

Media and Data Structures as Fonds

Whatever your feelings about the imperative of the archivist to Respect Des Fonds, the imposition to maintain original order and to pay attention to provenance of materials, it remains a cornerstone of the identity and professional practice of archives. Attempting to maintain the original order in which materials were managed before being accessioned and making decisions when processing an archive with respect to the whole both suggest a kind of archeological or paleontological understanding of documents, records and objects. An Object’s meaning is always to be understood in context of the objects near it and the structure it is organized in.

In the analog world, it’s often difficult to infer what that order is. For instance, the Herbert A Philbrick papers came to the Library of Congress in a mixture of boxes and trash cans.


Contrast that with the order of a floppy disk from playwright John Larson’s papers. Irrelevant of his own strategies for organizing his data, and his .trashes, the computer saves and stores information like the time he last opened the files. (For more on this example, see the work of Doug Reside, Digital Curator for the Preforming Arts New York Public Library)


The logic of digital media, of data structures, is one of order. Even if a user tries to eschew that order, the machine insists on creating, storing and retaining all manner of technical metadata and time stamps.

The order of bits on a disk, the structure of files in a file system, the organization and structure in of data available from an API are each fonds like. Data and records accrue according to the process and logic of digital media. Just as the structure and organization of records and knowledge in the analog world says as much about the materials as what is inside them so is the same true in the digital. The layers of sediment in which something is found enables you to understand its relationship to other things. Context is itself a text to be read.

With this noted, other humanities scholars, have clarified that all too often we privilege one mode of reading that underlying data structure. Our knee jerk reaction is that what is significant about an digital object is what it looks like or does on the screen.

Screen Essentialism

Digital objects are encoded information. They are bits encoded on some sort of medium. We use various kinds of software to interact with and understand those bits. In the simplest terms software reads those bits and renders them. However, the default application for opening a file isn’t the only way to go about it. You can get a sense of how different software reads different objects by changing their file extensions and opening them with the wrong application.

For example, if you just change the file extension of an .mp3 to .txt and then open the file up in your text editor of choice, you can see what happens when your computer attempts to read an audio file as a text. Slide24

While this is a big mess, notice that you read some text in there. Notice where it says “ID3″ at the top, and where you can see some text about the object and information about the collection. What you are reading is embeded metadata, a bit of text that is written into the file. The text editor can make sense of those particular arrangements of information as text.


Here is an.mp3 and a .wav file of the same original recording changed to a .raw file and opened in Photoshop. Look at the difference between the .mp3 on the left and the .wav on the right. What I like about this comparison is that you can see the massive difference between the size of the files visualized in how they are read as images. Notice how much smaller the black and white squares are. It’s also neat to see a visual representation of the different structure of these two kinds of files. You get a feel for the patterns in their data.

These different readings or performances of a file aren’t particularly revelatory, except to underscore that the very act of opening a file, of seeing its contents is a process of interpretation a text. The sequence of 1’s and 0’s is enacted in front of us by software. Formats and software are themselves essential actants in this performance which other humanities scholars have done great work to help us understand.

Format and Medium in Platform Study

In a detailed study of the Atari 2600, Nick Montfort and Ian Bogost suggest that the study of software inevitably involves the study of layers of software on top of software intertwined with particular pieces of hardware. For example, the tiny amounts of RAM in the 2600 resulted in a complicated problem for programmers to display graphics. They extensively discuss the game Pitfall, so we can return again to its example.

Illustration from Montfort and Bogost's Racing the Beam

Illustration from Montfort and Bogost’s Racing the Beam

This illustration shows what the game screen looks like from inside the system. Note what we see on the screen, the area with the fellow swinging there, is really just a small portion of how the game thinks of its screen. The three large areas (vertical blank, horizontal blank, and overscan, are actually where the computations necessary for keeping score and working through the game are done. In this case, being able to understand how a game like Pitfall was innovative is intimately connected to being able to actually understand the relationship between the game’s functionality and the underlying constraints of the Atari Platform. For those interested in presentation it further complicates the idea of collecting and preserving such an artifact as a more nuanced understanding of the platform continues to reveal important, seemingly hidden, characteristics of its nature.

Going forward, Bogost and Montfort’s notion of “platform studies” should be come increasingly important to those working to preserve digital artifacts.

From their perspective, the layers in these platforms provide particular affordances and constraints but are generally taken for granted by users as a part of the platform. In this case, Platform could be anything from a piece of hardware, like the 2600, a programing language like c++, Java, or Python, or a format, like MP3, or .gif, or a set of protocols, like HTTP and the DNS, or something like Adobe Flash that provides a language and runtime environment for works.

I’ll quote Montfort and Bogost’s explanation of platforms here at length as it is particularly pertinent.

By choosing a platform, new media creators simplify development and delivery in many ways. Their work is supported and constrained by what this platform can do. Sometimes the influence is obvious: A monochrome platform can’t display color, a video game console without a keyboard can’t accept typed input. But there are more subtle ways that platforms interact with creative production, due to the idioms of programming that a language supports or due to transistor-level decisions made in video and audio hardware. In addition to allowing certain developments and precluding others, platforms also encourage and discourage different sorts of expressive new media work. In drawing raster graphics, the difference between setting up one scan line at a time, having video RAM with support for tiles and sprites, or having a native 3D model can end up being much more important than resolution or color depth.

The point is as follows, the nested nature of platforms, their ties in and out of software and hardware and culture are the essential problem of digital preservation and a key question for anyone interested in long term access to our digital records to grapple with. Our world increasingly runs on software and hardware platforms. From operating streetlights and financial markets, to producing music and film, to conducting research and scholarship in the sciences and the humanities, software platforms shape and structure our lives. Software platforms are simultaneously a baseline infrastructure and a mode of creative expression. It is both the key to accessing and making sense of digital objects and an increasingly important historical artifact in its own right. When historians write the social, political, economic and cultural history of the 21st century they will need to consult the platforms of our times. As underscored already, even defining the boundaries of such works is itself a fraught and interpretive project. For this reason alone I firmly believe that digital preservation is a primary challenge which should pique the interest of digital humanists.

To recap, in work on the materiality of digital objects, in conceptions like screen essentialism, humanists are already providing critical information for those interested in collecting and preserving the digital record.

Example’s like Dragan’s work with Geocities illustrate how there is considerable value in closer collaboration here, where scholars actually dig in and create special collections or critical editions of digital records to clarify the perspective taken in their collection.

Aside from this, I think there is one other key reason that digital primary sources should cry out for the attention of digital humanities.

The Born Digital Record is Already Computable 

When I opened my talk, I noted that to many, the digital humanities is synonymous with computational approaches to studying texts. Importantly, coming around from the other side of this, consideration of digital primary source for digital preservation, we end up with far, far, far more computable data then the digitized corpora of historical texts which occupy many of those interested in doing computational research in the humanities are working from.

Where historical works must be digitized, born digital media is by definition already computable. That is, when we gather together aggregations of data, be they web archives, aggregates of selfies from instagram, or corpora of files from software packages, they are already computable.

In a talk about working with web archives, Historian Ian Milligan stated the problem concisely.

If history is to continue as the leading discipline in understanding the social and cultural past, decisive movement towards the digital is necessary. Every day most people generate born-digital information that if held in a traditional archive would form a sea of boxes, folders, and unstructured data. We need to be ready.

In short, the future of the computational humanities is itself going to be turning to the increasingly heterogeneous digital fonds, data sets, data dumps, corpora of software and images and logs of transactional data.


The Praxis of Digital Preservation

Dialog with areas of work in the humanities are all essential to the future of digital preservation.

What we need is a generation of conservators, archivists, and historians with extensive technical chops who realize just how contingent and complex deciding what bits to keep and how to go about keeping them is.

Digital objects, artifacts, texts, and data are something more than “content” they are the material anchors, the primary sources, through which we can interpret, critique, and understand our society.

I firmly believe that ours should be a golden age for born-digital special collections, archives, troves and critical editions. The future of digital preservation is less about defining a hegemonic set of best practices, than it is about scholars, curators, conservators and archivists working together to define what it is that they value about some kind of digital content and to then go out and collect it and make it available for use to their constituencies. It is about setting definitions that are often at odds with each other but that are coherent toward their own ends.


Posted in Uncategorized | 3 Comments

A Draft Style Guide for Digital Collection Hypertexts


The cover of A Signal from Mars: March and Two Step, shows the rather civilized Martians relaying a piece of music to earthlings with the use of a spotlight. As featured in Messages to and From Outerspace.  A signal from Mars1901.Music DivisionThe Library of Congress.

I spent about 60% of my work hours last year selecting a thematic collection of 330 cultural heritage objects and interpreting and explicating facets of those objects  in a set of 18 linked essays. I had a style guide for questions of grammar, and the HTML structure of the layouts were rather straightforward. However, I realized rather quickly that if I was going to do this consistently I should put together my own set of guidelines for the actual structure, function and style I would use for approaching this writing project. Nothing about this is formal or official or anything like that. This is just my own personal notes, thoughts and reflections that informed how I approached framing the work.

What follows is the short list of guidelines/rules for composing online exhibition-ish narrative pages for the web which I developed for my own use. Given some recent great discussion of what the ideal for history on the web should be, I figured I would share the rules I set for myself as they might be of use to others working in this form. Ultimately, in the collection objectives section I decided to call it a “hypertext,” which ideally expresses

The Chimera of the Digital Collection Hypertext

An online only interpretive presentation of representations of cultural heritage objects is something of a chimeric creature. It’s the sort of online collection/interpretive material that all kinds of folks develop when they use platforms like Omeka—ticky-tacky interpretive analytical writing and explication alongside a massive pile of related historical primary sources for users to go out and explore on their own.

  • Part Exhibition: It’s purpose is similar in purpose to a physical museum exhibit, except that the restraints and benefits of physical space are absent. For example, an online exhibition can sprawl out forever, but you lose out on the quality of “being there” in the presence of “being there with the artifacts.
  • Part Illustrated Publication: As text and images on a web page, they are also like those “illustrated history” books, where one works through a linear narrative but can stop off to read detailed information about an image. In this case, the similarity falls off in that hypertext provides a much more networked and connective potential structure for an online text. Furthermore, while people do skim books, web reading is fundamentally different.
  • Part Expansive Collection of Sources: Where you only have the space to show an image on part of a page in a book, and there is a limit to what you can display in the physical space of an exhibit on the web you can provide links out to every page in a draft or the whole audio recording.
  • All Hypertext: Ultimately, I think the most precise term for what these things are is hypertext. A term that sadly fell out of vogue with cyberspace a while back, but a term I think is worth going back to as HTTP is itself the defining logic and form of the web.

A Ready-to-hand Draft Style Guide

I had some web writing information to work with, but I ended up working up my own style guide-ish set of rules to work from for putting together these pieces. What follows is my rundown of rules (most of which I didn’t break much). As such, the intention of this set of guidelines was to try and take the ideas of exhibition and print publications that make extensive use of deep captions and figure out how they fit into the way the web writing works and people engage with the web. I feel like these served me well, and figured others might be interested in them. I’d similarly be interested in comments/discussion of these.

  1. Every narrative page stands on it’s own: The web is not a physical space and you have no control over what page someone will see first. The result of this fact is that a well conceived online exhibition narrative page needs to stand on it’s own. That means it needs to have a compelling title that includes key terms in the page, and that the text of a page cannot assume that a reader has read any other text in the exhibition. Every page is effectively the first page/front door for some set of potential users. It’s critical that the page stand on its own and invite users for further exploration at every turn.
  2. Every caption should explicate/interpret the image/object presented. Images, audio and moving image content needs to be captioned in such a way that the captions explicate and interpret the items. It is not enough to simply say what something is but to scaffold a visitor into seeing what is important about the artifact in this context. Ideally, the way the object is presented/cropped/edited suggests part of this, that is helps to actually show and not just tell. Part of the purpose of presenting these objects is to demonstrate reading and interpreting them. As such, they should not be extraneous. For example, if one want’s to include a portrait of an individual one should not simply say it is a portrait of them. It’s necessary to suggest points in the work to read, like the way they are drawn or items they are holding and how those communicate something about how that individual is being represented in this case.
  3. Object captions should always stand on their own: The captions for objects presented should also stand on their own. Web readers skim and make use of images as a form of visual headings. As such the captions for those images should make enough sense on their own that visitors can use them as a different index to the content of the page.
  4. A new heading should break up text after every few paragraphs: Again, Web writing is different from print writing in that web readers are far more likely to skim content. Good and frequent use of headings makes it easy to skim text and further hook readers to dig into the narrative content. Think more Associated Press style and less Chicago Manual of Style.
  5. An image from an item should always be visible as one scrolls through the page: The goal is showcasing the objects, so there should always be items from the collection visible on the screen at any given moment. This focuses attention on the items while also making the page easier to explore and read. Note: This is a particularly vexing thing to deal with in responsive design for mobile devices. I’d be curious for ideas about how this point should change in a mobile situation.
  6. Each page should be in the long blog post sweet spot–700-2000 words: This length makes them substantive enough to tell an interesting story and make a few important points but keeps them from being too long that they are difficult to briefly explore. If a piece is getting significantly longer than this it could likely be broken into smaller individual pieces which would have the benefit of creating another page that serves as it’s own point of entry into the exhibition.
  7. Hyperlink text for connections and emphasis: Each two paragraphs should have at least one hyperlink connecting to an important concept in another section of the exhibit. The links underscore what matters in a given paragraph and make it easy for visitors to chart their own path through the exhibition. This is the primary power of hypertext as a medium. Think of how rich a Wikipedia page entry is with links. The goal of this, and many of these guidelines, is to create a fertile network of connections that can spur the ability for someone to get lost in the content much like people do with Wikipedia. Ideally, item pages will record essays that link to them too, making each item itself into a potential point of entry to the presentation.
  8. Links should connect consistently connect out across subsections : Each page in the exhibit should ideally include at least one hyperlink to a page in a completely different section. Silos are bad, and history is not a straightforward progression of events. If you think different thematic sections of an exhibition are coherent enough to hang together there should be connections between individual pieces as you go.
  9. Show parts of items, link out to whole items: Unlike a physical exhibition you are not limited by the size of a frame, showing one page in a book, or putting a video on loop and hoping that people will stick around for it to come back again. Good exhibition narrative pages direct a visitor’s attention to features of items that are particularly interesting in a given context, but ideally that user is just a click away from looking at the whole of a work, or seeing things next to a given letter in a particular folder. There will be cases where this is impossible as either a strain on resources to digitize, or for rights reasons. With that noted, the ideal is to put up as whole a copy of any primary sources that can be integrated in their own right and not to simply crop photos to frame to illustrate the narrative.

 What do you think?

Are there things you would add, refine, or take off the list? Do you have any suggestions for other kinds of guidance that is worth integrating with this sort of thing? What thoughts do you have about how this sort of thing would change given different potential audiences? In short, I’m curious to hear what you think of all of this.

Posted in Uncategorized | 8 Comments