Curating in the Open: Martians, Old News, and the Value of Sharing as you go

The Salt Lake Tribune speculates about "vast thinking vegetable" on Mars

Speculation about the “vast thinking vegetable” on Mars from The Salt Lake Tribune

This is ultimately a story about how doing research for an online exhibition ended up sparking articles on Boing Boing, i09, and The Atlantic which explored a theme from the exhibit eight months before the exhibit would launch. I think the story has some lessons for thinking about the future of digital collections and exhibitions.

Finding our place in the cosmos

I spent 60% of my time at work in 2013 curating an online exhibition/collection/hypertext contextualizing the Carl Sagan papers in the history of astronomy and life on other worlds as evident in objects from across the Library of Congress collections. I’ve written before, about what I think that project has to say about how to compose such online things, but I haven’t shared much about how I went about identifying and selecting materials for it.

Through the process of working on the collection, I think I stumbled into something that has considerable potential to impact the way we should go about doing the work of creating such thematic narrative explorations of content in digital collections of libraries, archives and museums.


A big part of the interesting story about the idea of life on other worlds is that, for a good while, it was completely reasonable, if not expected that there would be intelligent life on the other planets in our solar system. One great episode in this story is the history of the Martian canals. Knowing how big of a topic this would be for popular press I realized I could just turn to Chronicling America, the website for a partnership between the NEH, LC and a network of libraries and archives from around the country to provide access to millions of digitized newspapers. I knew there would be a good bit of material here, and I was thrilled to find that a search for “martians” in the millions of digitized newspaper pages from 1836 to 1922 turned up a trove of pieces to explore. So I noted the pieces in this search that were particularly relevant for the collection. Instead of keeping these in a document on what my institution lovingly calls a “workstation,” I went ahead and just used Pinterest to keep track of them.

Work in Progress on Pinterest Progresses the Work

So I made Pinterest boards for each of the thematic sections of the collection I was working on. Below is an image of the Pinterest board I created on free and publicly available materials from across LC’s digital collections related to ideas of life on Mars. I liked using Pinterest for this as it created a visual way for me to track and organize these things. A big part of the project was to find what I could do with already publicly available digitized content, so it seemed like it would be fine to track these public materials using a personal Pinterest account. It had an interesting side benefit too.


I started using Pinterest for this purpose because it was easy, but it being public had an interesting secondary effect. As you can see from the image below, the board I started on Mapping Mars & Life on Mars ended up with 191 followers. It’s not a part of any official anything, but it turned out that many of the historians of science and history of science curious who follow me on twitter were interested enough to review and share some of the raw material I was pulling together on Pinterest. I needed to do this kind of aggregation for my own work for the essays and online collection, so it made sense to keep that up and out there for others to benefit from.


What Vast Thinking Vegetable of Mars Taught Me

Which brings me to the vast thinking vegetable that lives on Mars. One of the newspaper pages I found ended up showing up in my feed reader on Ptak Science Books.


If you don’t read John Ptak’s blog and you are into cool quirky history of science object stuff you are missing out. He is always sharing interesting finds. As you can see above, one day he found the article I found. It wasn’t just a coincidence, either. As you can see from the image below, John credited both the Chronicling America site and my Pinterest board in the post.


That alone was a hoot. What a success. I set out to use Pinterest to keep track and organize materials I might work with, but in the process I found an audience interested in the topics on Pinterest and that rolled into John getting in there and not only sharing what I had found but digging in and interpreting and explicating what about that article was interesting. While I hadn’t provided any interpretive frame, the things that I found interesting about the article were the same that John focused on. But it didn’t stop there. It turned out that Alexis Madrigal also reads John’s blog and that he thought this was interesting enough to take it to an even larger audience. It also hit BoingBoing and io9.


From my Pinterest board, to Ptak’s blog and from there to The Atlantic.  At this point, the Atlantic article ended up generating a surge of web traffic to the Chronicling America Website. So much so, that one of the project leaders noted the spike and went looking to see where it was coming from. The work I was doing to organize my notes, on at that point a project that had yet to be announced, had helped to punch a bunch of traffic and eyeballs back onto the content. That is, eight months before the launch the research process itself was hitting home a core objective of much of our work, spurring engagement and use of the collections. The traffic was nice, but importantly, it also had the effect of promoting thinking about the exact set of issues that the essays I was working on were focused on.

Both Ptak’s blog post and Alexis Madrigal’s piece on The Atlantic are brief but substantive. They contextualize and explore the issues of what it was and wasn’t reasonable to think about the existence of life on mars in the early 20th century. To this end, before I had even gotten close to publishing my essays, simply sharing the way I was organizing my resources and tweeting about them had prompted public scholarship exploring the same issues in the same resources.

Succeeding before You’ve Even Launched

So, before anyone had even formally announced this project, I was already meeting many of my objectives to spark conversations about the history of ideas of life on other worlds and generating significant use of the Library of Congress collections. I see a few different implications of this process.

  1. Defaulting to sharing serves the mission: The research that goes into preparing a thematic collection/exhibition is itself something that can be made into a public project that contributes to the objects of exhibiting materials. Using Pinterest to organize my research made that research into it’s own resource. While you can’t plan to have this kind of thing happen, you can plan to enable the possibility of it.
  2. There is great stuff on the cutting room floor that can have a life of it’s own: It ended up that I didn’t even use that giant vegetable eye story for the exhibition. It wasn’t the right fit in the end. The Pinterest boards I made are loaded with items that didn’t make the final cut but they still found their own audiences. This is to say, If I hadn’t shared the process there is little reason to believe this story would have gotten much attention. Just think about all the objects that someone considered featuring. Just the fact that it was considered is likely an interesting link that someone might be interested in following.
  3. Sharing Objects in the Research Process Encouraged Deeper Use: In the thematic essays, I work out what the objects mean and people scroll through and read that. However, just sharing the items I was working with in progress ended up inviting others to take those materials and interpret and explicate them on their own. Intriguingly, less became more there. It helped encourage others to explicate and contextualize.
Posted in Uncategorized | 1 Comment

PastPlay as the Digital Humanities

9780472035953I was invited to review Kevin Kee’s new edited volume Pastplay: Teaching and Learning History with Technology for the current issue of The American Historian. The author agreement allows one to post the “manuscript” version of this kind of thing to one’s personal website, so it’s shared here to that end. As I note, I think the concept of play at the heart of the volume is of potential interest for defining a perspective on play as something that defines the ever-nebulous digital humanities. 

Play can and should be a core part of both historical research and the teaching of history. This is the central argument the historian Kevin Kee frames around the fifteen essays gathered together in Pastplay: Teaching and Learning History with Technology.

The thesis of this collection emerges by stringing together the titles of the four sections of the book. Historians should be 1) teaching and learning history, 2) playfully, 3) with technology, 4) by building. Teaching and Learning History includes four cases studies of historical educational games. Playfully focuses on how play, or what author Stephen Ramsay calls the “Hermeneutics of Screwing Around,” can function as part of the practice of research and writing. With Technology explores board games, 3D printing, and simulation computer games as instruments for teaching history and engaging in historical scholarship. Finally, By Building provides four essays that argue that making things, from historical hoaxes to digital models of Victorian homes, can be powerful tools for historical inquiry. The Playfully section of Pastplay includes three essays that argue that play itself is an instrument for learning about the past. William J. Turkel and Devon Elliot connect work with 3D printing and fabrication with the value that historians of science have found in re-creating historical experiments. Ramsay argues for the value of serendipitous “screwing around” as a response to the massive scale of source material offered by millions of digitized books. Bethany Nowviskie explores a medieval device that served as a “mechanical aid to hermeneutics and interpretive problem solving” as inspiration for how humanists might make use of digital technologies (p. 140).

Pastplay focuses more on teaching and learning than it does historical scholarship, and as a result, the book is somewhat thin on addressing how play can and should be a component of historical inquiry. From my perspective, the most valuable contribution of Pastplay isn’t really articulated in the text. The book offers a framework for defining the ever-nebulous digital humanities. Many of the contributors are leading thinkers in the digital humanities, and their ideas about the playful use of technology to experiment, dabble, and explore the past offer insight into digital humanities epistemology. Often simply described as the application of computing technologies to humanistic inquiry, the playful hermeneutics described here, and the implication that there is no substantive difference between student learners and historians as perpetual learners, allow us to pin down what is different and significant about how these digital humanists approach the understanding of the past.

Pastplay is a book about teaching history, but the most intriguing parts of it deal primarily with historiography and method. In this respect, I might have liked to see two separate books: one focused on the educational possibilities of play and the other on how playful approaches to building models and exploring texts can provide value to the practices of historical research. While I’m still not entirely sure where this book belongs on my bookshelf, or what kind of course for which it is best suited, I am glad to know it is in my collection.

Posted in Uncategorized | 1 Comment

Wherein I Answer 13 Questions About Digital Humanities Blogging

Matt Burton, PhD candidate in the University of Michigan’s iSchool, is writing his dissertation on the role that blogs play in scholarly communication, primarily focused on digital humanities blogs. He asked me if I would respond to a set of 13 questions he put together as part of his study. Shawn Graham recently shared his responses, which I enjoyed reading, so I figured I would share mine as well.

In responding to Matt’s questions, I realized that there is likely a lot of tacit knowledge that comes from the practice of blogging in this community which it would be useful to make explicit for anyone else that wants to join. So I’d love to see other people respond to Matts 13 questions. If you link back to my post and Shawn’s we can keep track of all of this in trackbacks.

Matt: When did you start your blog (career wise: as a grad student,  undergrad, etc)?

Trevor: I started keeping an academic blog around the time I started my M.A. program. I had kept a personal blog for a year or so with my wife, but launched two blogs that had an academic bent in 2007. The first was a blog for a digital history course I was taking and the second was a site I was going to run that was called That was about history as represented in children’s books. The children’s books thing didn’t keep my interest long enough, so I eventually rolled them all together.

Matt: Why did you decide to start blogging?

Trevor: The digital history focused blog was the direct result of a course requirement, we had to start a blog and keep notes on it. At the same time, I decided I would stand up that other blog, the one about history through children’s books, because I saw it as an opportunity to

  1. refine some of my tech skills
  2. show folks that I could create and manage a decent looking blog
  3. set myself up with a structure and regular set of deadlines to get myself in a habit of writing for an audience
  4. because I saw the kind of exposure and connections that other colleagues at CHNM (Dan Cohen, Tom Scheinfeldt, Dave Lester & Jeremy Boggs) were getting from keeping blogs. So that is a web of reasons I ended up getting into blogging.

Matt: How do you host your blog, i.e. Do you use a generic web-host like Dreamhost with WordPress, do you use a blogging service like

Trevor: Currently, I use dreamhost to run an instance of WordPress. When I started blogging for the course I was using a instance but had set up and was running my own instance of the wordpress software for the history through children’s literature site.

Matt: How did you learn to set up your blog?

Trevor: I read the five minute tutorial for setting up a wordpress instance. It took a lot more than five minutes. I had put up websites before, but had never used anything that involved a database backend. I remember futzing around with a bunch of configuration issues to get the site up and running. At that point, I modified a theme at that point too. I wanted to make my own theme partly to show I could and to figure out more about how the HTML, CSS and PHP all interacted. Most of that tinkering just involved using Firebug to poke around and see what tweaks to the site would look like and then making those edits in a text editor to files via FTP.

Matt:  What are the challenges with maintaining your blog (i.e. spam, approving comments, dealing with trolls, finding time to write, etc)?

Trevor: At this point, the main challenge has been figuring out what role the blog plays in my productivity and work. I struggled a lot in the beginning to figure out what voice to write in and about how much my writing on the blog should be polished final product and how much it should be part of a kind of open notebook where I worked out things in a more personal voice. At this point, I feel like I’ve hit that stride, but at this point I also have so many places and commitments for writing that it’s tricky to do all the writing that I want to be doing. As a result of my blogging about history in video games, I was invited as one of the initial bloggers for Play the Past. At the same time, I also ended up blogging for my job, for the Library of Congress Digital Preservation blog. The result of this is that the “” blog has locked in as a place where I share more of the things that don’t easily fall into either of those other two spaces or that are the most perspectival of my writings. To sum that up, I haven’t had much trouble with technical or social issues around blogging. For me the challenge remains getting things up and out there via the blog and focusing on how I can make the best use of it as a place to develop and forward my thinking and writing.

Matt: What topics do you normally write about? Do you try and keep it strictly academic, or do you mix in other topics?

Trevor: At this point, I mostly talk about interpreting history as represented in new media, discussion of methods of research and scholarship in digital history and the digital humanities, and issues around the design, development, and process for the use of digital technologies in collecting, preserving, and providing access to cultural heritage materials. I upon occasion will delve into other issues in changes in scholarly communication. Another way to say this is that the thematic unity of the blog is that it covers the things I have an academic/professional interest in. The origins for a lot of posts are discussions with archivists, librarians, curators, artists, humanities scholars and scientists at conferences, on twitter, in the comments on their blogs and or reactions to presentations, papers or books that I’ve read.

At this point, is a professional/personal blog. I offer running commentary on issues in the field, but for the most part, it is not a place where I present original research as much as a place where I offer and develop my perspective on issues in this area of professional practice and scholarship. In contrast, when I write for Play The Past I envision my audience as a more general reader interested in issues and stories about history in video games. So the Play the Past posts are a bit more of a mixture between academic research writing and journalistic writing.

Matt: If you allow comments on your blog, do you often get comments? What has been your experience managing comments/commenters on your blog?

Trevor: When I write something really long, like the full write up of a talk I gave, I will often get nothing in the comments. I might see a lot of people sharing it around on twitter, or offering a word or two there, but I don’t see much engagement on the post. In contrast, if I write something short as a reaction to something that a lot of people are engaging in I can get some real substantive back and forth going. For example, Implications for Digital Collections Given Historian’s Research Practices responded to the ITHAKA report,Supporting the Changing Research Practices of Historians. Similarly, the satirical bent of  Notes toward a Bizarro World AHA Dissertation Open Access Statement responding to the AHA’s dissertation embargo information kicked up a lot of exchange. Along with that, some of the very technical proposals I’ve written up, like the recent piece on Linked Open Crowdsourced Description: A Sketch have had a tendency to spark a good bit of back and forth.

On the whole, I totally love comments. With the exception of that Bizzaro World post, and a post I wrote up about misogyny in tech communities, I haven’t really steered into waters where there is much divisiveness or trolls. Oh, wait, except for that one time when I co-authored a blog post on that asked if the source code of a video game could be racist and it got picked up by Rock Paper Shotgun and we ended up with all kinds of irate, but relatively thoughtful but very antagonistic comments. So those aside, I generally feel like the comment section of my blog works like the web once did. I put things up, and the small virtual community of practice I participate in on twitter and other blogs has a bunch of folks who pop in and read what I write and post thoughtful reactions that can open up discussions that I find myself going back to all the time.

Matt: What kinds of interactions (scholarly or otherwise) emerge out of your blogging practice?

Trevor: A bunch of them. I will try and break these out.

  • Finding and Establishing a Scholarly Community: Early on, I wrote a lot about history in video games on my blog, for example this post on the tech tree in Civilization from 2009 as a result I ended up getting roped into Play the Past at the launch. Through that, I ended up meeting a bunch of other bloggers I did not already know. As the blog has continued and I became a co-editor I’ve been thrilled to connect with and find people who I didn’t know at all who have now come to get a ton of traffic for their writing on this topic. We have had a range of folks move from commenters there, to guest bloggers to regular bloggers and a lot of their writing get’s a ton of traffic and exposure.
  • Refinement of Ideas and Writing and Collaborative Projects: A lot of my work toward publications and research projects occurs through a process of blogging. I blogged through drafts of parts of my dissertation proposal and writing process. A workshop I gave on crowdsourcing became a four part series of blog posts on the topic which turned into an invited essay for Curator: The Museum Journal which I then was invited to republish in an edited volume on the topic.
  • Getting My Name Out There: Every month or two I run into someone at a conference or event who says something like “I hope this isn’t weird but I read your blog.” At this point it’s totally not weird. It is a huge compliment and I feel really lucky about how the whole thing has worked out for me. The blogs are part of how I do public scholarship and it’s a continual part of the professional network and community I participate in.

As an example of how these things all weave together. One of my early pieces for Play the Past (still one of my personal favorites) asked if the game Colonization was offensive enough. Rebecca Mir, then a graduate student, read my post for a course and ended up writing this amazingly cool course paper that opened up a whole bunch of other themes from it. We corresponded a bit about the original blog post over twitter and she ended up sending me a copy of the course paper. Rebecca had found a ton of great stuff digging through some of the Civ modder discussion forums, and had some neat ideas about how to take a close look at the ways native peoples are represented in the game. At about the same time, there was a call for proposals for book chapters for what would become Playing With the Past I was planning on putting in a proposal based on the Colonization things I had written, and encouraged her to as well. Rebecca smartly suggested that it was unlikely that the editors would want to run two Colonization focused essays and suggested that we consider co-authoring something which I thought was a great idea.

So we put in for that and it was accepted, but ended up deciding to use Play the Past as a place for us to take turns blocking out and taking the lead on drafting a series of posts to explore the themes and issues I had laid out and she had begun exploring in her course paper. The result was a series of very widely read posts that got a ton of comments and ended up giving us a lot of great critical feedback to incorporate when we stitched them all back together into our essay for the book. What I love about this whole process is that it pulls at the seams of the traditional research and writing process and in doing so opens up the possibilities for a range of levels of collaboration and exposure to your work.

As is the case when you really spend time working on a piece, there is a bunch of material from the blog posts that we ended up leaving on the cutting room floor at the end. But, all of that material is still up and out there in the blog posts. The possibility of that collaboration hinged on her reading the short post I had written on the topic and the extensive feedback we received in comments helped us to refine and polish up the essay. Along with it, Rebecca went on to become a regular blogger for Play the Past and I know her participation in the site has helped to get her invited to present at conferences and played a role in her professional resume.

Matt: Do you find these interactions informative, useful, enlightening, tedious, frustrating, obligatory, etc? How do they feel?

Trevor: On the whole my interactions around blogging are informative and enlightening. I think a few other words I would use are challenging, rewarding, exhilarating, generous and warm.

  • Challenging in that on several occasions folks have called me out on things, or I have seen others called out, and on the whole I think that process has worked to make the broader community of folks in the digital humanities and library and archives tweet/blogosphere engage with aspects of privilege that help move the fields forward as they continue to grow.
  • Rewarding in that I get feedback and recognition for my work and am in regular and ongoing communication with the folks in a range of different communities of practice that I respect and admire.
  • Exhilarating in that every so often a post I write will blow up on reddit or something. One time a piece I wrote on Fallout was getting hundreds of more views each time I refreshed the stats page. That has the dual experience of “Yay! Look at how many people are reading something I wrote” and “Oh no! I really hope I didn’t mess anything up in there, look at how many people are now scrutinizing something I wrote.”
  • Generous and warm in that I have found myself in a community of peers, mentors, mentees and colleagues who regularly give of their time, and opinions and share in humor and the ups and downs of our careers and professional lives.

Matt: How do you think digital humanities blogging is different from more traditional forms of academic writing and reading?

Trevor: One of the essays in the report from this summit I helped plan on collecting and preserving science blogs is relevant in this case. The author suggested that part of the problem with pinning down what blogging is and how it is different from other modes of scholarly communication is that it’s something defined by particular technologies (text syndicated via RSS) and a set of practices that is socially defined in how people use those technologies at particular times. With that said, I think i can venture to offer two different approaches for going about this. Blogging is at once both a much more expansive and diffuse mode of communication than something like a journal article and simultaneously something that is an emergent genre of writing with a set of conventions.

Diffuseness: So, if you scroll down a bit further and read through some of my favorite blogs you will find that I like a lot of things that are totally different. Some of them have short posts that show up on a daily basis, some of them have long posts and are posted to every three to six months. Some have custom drafted material created for the blog, some are mostly sharing notes from talks and presentations and working drafts of papers. Some are filled with images and subheads some are just huge walls of text. To this end, one of the characteristics of blogging in the digital humanities is that it is far more particular to the person and their approach than something like a journal article. That is, I think you get a lot more variety in what people do with their blogs and what is considered acceptable practice.

Coherent Genre-ness: While I realize it might seem contradictory to now go on to suggest how blogging represents a coherent genre of writing after just saying that it’s so diffuse that isn’t so. While there is a broad diversity in practice, there are also a lot of conventions that bundle up in the middle of that diffuseness. So here are some things that makes blog writing, on the whole, different from other genres of scholarly communication like journal articles, book chapters, and conference presentations. On the whole, blog writing is more informal. It often is more conversational. It often involves less fancy talk, that is more straightforward attempts to get points across. Blog writing is generally much shorter than other forms of academic writing. Blog writing often has shorter paragraphs, makes use of hyperlinks to point out to ongoing discussion elsewhere instead of recapping that discussion, and includes more subheads to be easier to skim. Blog writing can often assume/connect with a broader audience than other forms of academic writing. Blog writing is often less finely tuned and honed than other forms of scholarly communication.

Matt: How would you characterize the relationship between blogging and the digital humanities (however broadly conceived)?

Trevor: Oh gosh, it sounds like that involves the infinite regress of attempting to define the digital humanities :) I will lean on a recent review I wrote about the book Pastplay which I think get’s at a fruitful connection between what DH had become and what blogging does (I’m going to post a pre-print of the review on the blog one I get it back with final edits.)

While I’m challenged at exactly where I should put Pastplay on my bookshelf (educational psychology? historiography & method?) I’m glad to know it is in my collection. From my perspective, the most valuable contribution of this book isn’t really articulated in the text. The book offers a framework for defining the ever-nebulous digital humanities. Many of the authors of chapters in the book are leading thinkers in the digital humanities, and the ideas about the playful use of technology to experiment, dabble, and explore our ideas about the past offers insight into an epistemology of the digital humanities. Often simply described as the application of computing technologies to humanistic inquiry, the playful hermeneutics described here, and the implication that there is no substantive difference between students learning about the past and historians themselves as perpetual learners lets us pin down what is different and significant about how these digital humanists are approaching understanding the past.

So I think that’s it. I think it’s about play. Not play in the games sense or childish sense but in the sense of individually collectively learning how to do things. That is, play in terms of how learning happens at the individual and community level as we fumble around and figure out how to do better work and develop better ways to understand our world, our cultures and their pasts. I think when digital humanities blogging is at it’s best you have people stepping away from “fancy writing” to play with ideas and play with methods, to be honest, to be generous but to not shy away from calling each other out on our respective shit. I think this is something that has been a huge asset to the development of the community but at the same time it’s a real challenge. It is there inside the ups and the downs of concepts like “niceness.”

Academia has always had a bit of a rough and tumble discourse, go find the forum section of just about any history journal over the last 80 years and you have a very real chance of finding a real knock down drag out fight over what counts as good work and or whose work is or isn’t original or groundbreaking. With that said, the personal valence of blogging and the immediacy of it and of comment threads has some of the effect of making it all the more critical for the community to continue to figure out and reflect on how we can maintain an open and friendly network that is also ready to have it’s privilege checked and it’s background assumptions checked. Blog writing is also an incredibly immediate form of academic writing. You write it, you hit publish, you tweet it, you start talking about it. If it’s a hot topic, there is a good chance you could be reading someone’s response and reaction in another post in a few hours.

Matt: What DH blogs/bloggers do you read and why do you read them? What do you like about them?

Trevor: There are really too many to name here, I follow hundreds of blogs in my reader, so I will just point to some highlights. Here is a rundown of some of my favorites off the top of my head.

  • I read everything Bethany Nowviskie  writes, more or less as soon as I know it is up. She is routinely insightful and reflective and the fact that she is situated in a library context ends up meaning that her perspectives are particularly relevant to me.
  • Ted Underwood is another favorite. He does a great job at doing number crunching computing sort of DH in a way that opens up and elucidates big questions.
  • Sheila Brennan always has great things to say about work at the intersection of digital history, public history and the digital humanities.
  • Tim Sherratt’s posts often seem to come with some fully formed new project he concocted that is both immediately interesting and useful and simultaneously something that forwards the theoretical potential of building things in the field.
  • Miriam Posner has both a great voice for blog writing and covers a lot of issues thoughtfully and deeply.
  • DH+Lib often surfaces posts and pieces I would not otherwise have come across
  • Mark Sample is one of the most creative people I follow. I love how he has a focus on issues in born digital media like video games and twitter bots and his writing is really smart.
  • Steven Ramsay isn’t really a high volume blogger but I appreciate his perspective and I think his work on algorithmic criticism and the hermeneutics of screwing around are some of the best pieces of work at bridging the computational and mathematical with the epistemology and values of the humanities.
  • Ernesto Priego has a valuable perspective and I enjoy the intersection of library science and digital humanities in his work.
  • Natalia Cecire is a great writer and scholar and a thoughtful critic.
  • Adam Crymble writes a lot about issues around the practice of digital history and it’s both good stuff and particularly relevant to my interests.
  • Melissa Terras has written a bunch of great stuff and her work is often directly related to issues I am working on related to things like use and reuse of digital content.
  • Shannon Mattern is always writing about these amazing courses she teaches, about visits to galleries in New York and sharing these in depth and thoughtful pieces and talks that have a media studies bent. It’s great stuff.
  • Kate Theimer likely does not consider herself to be in the digital humanities tent, but her work on the future of archives is always thoughtful and relevant to folks in DH.
  • Scott Weingart has a bunch of great posts about things like network analysis and I appreciate his background in the history of science which situates his perspective on tools and methods in an understanding of the sociocultural framework that those tools operate.
  • Ian Milligan is someone whose posts I’m almost always tweeting out. He is one of a handful of historians doing work with Web Archives and he shares parts of the process of that work that are enlightening.
  • Fred Gibbs is a great historian and a thoughtful commentator on digital history.
  • Ed Summers builds very cool things and always has smart reasons and things to say about the things he builds.
  • Sharon Leon does great work in digital history and public history and I’m always interested in her perspective.

Matt: What was your most popular blog post? Why do you think it was so popular? What is your *favorite* post?

Trevor: Unquestionably, the most read things I’ve ever written are posts about Colonization and Fallout 3 for Play the Past. Both of those became and continue to be so popular because they have connected with audiences outside the network of academics and cultural heritage professionals I usually write for. Another hit in that vein, is a 400 word post I wrote about an amazing Pac-Man t-shirt.

For my personal blog, I’ve included the stats for the top 15 of my blog posts below (this really only goes three or four years back but it’s illustrative). The top post there is a perennial hit. I think that one resonated so well because it’s really in the sweet spot for a blog post, I lay out a point that turns some conventional wisdom about crowdsourcing on it’s head and that works in a short post. The second is actually a really long one, the transcript of a talk I gave earlier this year at the University of Pittsburgh that I think made it around a good bit because it weaves together a lot of the different things that I focus on (digital preservation, born digital materials and the digital humanities). So I think that one got around because it touches on and tries to connect more or less all the sectors of my professional network. The Bizarro world post was a fast moving issue in the higher education blogs. From there you see a few more of my posts on crowdsourcing and a range of things I’ve writing about research methods that tend to get some traction.


As far as a favorite post of mine, I’m not sure. I think I’d probably go with either the Fallout 3 post or the Is Colonization Offensive enough post. At the beginning of Play the Past I would spend a lot of time honing and refining pieces like those and I think it shows. For better or worse, most of the blogging I do these days is much more immediate and responsive and rushed between a bunch of other things. So I think I’m putting out good stuff that is useful but I don’t think it’s nearly as refined.

Posted in Uncategorized | Leave a comment

Digital Public History Course for an iSchool

I’m excited to announce that I will be teaching my digital public history graduate seminar again! I am tweaking the course I taught for American University’s Public History Program (in 2011 and 2012) and will be teaching it as a special topics course this spring in the University of Maryland’s iSchool program.

So, if you are a grad student at UMD (or if you have friends that are) it will be Thursday nights, 6:00-845 in College Park Maryland.

Here is the blurb on the course:

Digital Public History, LBSC 708 (Section D), College Park Maryland, Thursday nights, 6:00-845 

This course will explore the current and potential impact of digital media on the theory and practice of history. We will focus on how digital tools and resources are enabling new methods for analysis in traditional print scholarship and the possibilities for new forms of scholarship. For the former, we will explore tools for text analysis and visualization as well as work on interpreting new media forms as primary sources for historical research. For the latter, we will explore a range of production of new media history resources, including practical work on project management and design. As part of this process we will read a range of works on designing, interpreting and understanding digital media. Beyond course readings we will also critically engage a range of digital tools and resources.

Below is a bit of a scratch pad for how I am thinking about tweaking things for the course. I am curious for other comments/suggestions for things to consider with these.

Topics/Weeks I am Considering Swapping in

At the moment there are four areas I am considering as potential revisions/additions to the week by week topics of the course.

Books I am Considering Adding or Swapping in

One of the things I need to get done sooner rather than later is decide on what books I’m going to keep and or swap out. Here are a few I am considering. I am curious to hear if there are any other books folks think I should be considering.

Reviewing Some Syllabi for Related Courses 

I’ve been trying to keep track of some great looking relevent/related courses to review. This is the list I have so far. I’d love to know of other courses folks think I should take a look at.

So, what do you think?



Posted in Uncategorized | 3 Comments

Personal Digital Archeology Illustrated

Bundled up inside sectors of many of our hard disks you can find the traces of our digital past recursively tucked away in a hastily named directories. Our Old Files form layers of digital sediment ripe for personal digital archeology.

I love how this XKCD illustrates the way that personal computing becomes inherently archeological. Until recently, the cost of storage space kept plummeting.  Along with that, the nature of search in file systems enabled many of us to move from filing to piling. The result is something like the stratigraphy in the comic.

It was easy to just stick “Old Desktop” inside the new documents folder, which itself had the stack of files you recovered from an earlier hard drive crash. Nested deeper and deeper down you’ve got your high school zip disk.

If you tunnel down in there, you can even find out things about yourself you had forgotten. In this case, an 850k text file with forgotten poetry is uncovered.

As scholars in the future work with logical or forensic disk images of personal computers in the future their work will likely look much the same. Except they won’t have the benefit of memory to fill in the blanks about how this order came to be.

The comic is chaotic, haphazardly named files  and folders created on the fly become the long term structure of the data. Still, we get the joke because we can understand what these layers and files mean without knowing anything about their contents. We see the high school love note, the pile of files shared over Kazaa, the collection of pictures from Facebook. The directory names, file names and file extensions tell us a great deal about what we are looking at. Even in the chaos there is a lot of context and description in the arrangement of the files.

Interestingly, as we increasingly move to using cloud storage for more and more computing and as the days of really great Kryder rate continues to level off this is going to likely only be the case for a particular period in the history of personal computing. In any event, when we right up the digital historiography and source criticism text books for historians of the near and distant future who want to make sense of our old hard drives we should print up and explicate this XKCD and feature it on the cover.

Posted in Uncategorized | 1 Comment

Linked Open Crowdsourced Description: A Sketch

Systems and tools for crowdsourcing transcription and description proliferate, and libraries and archives are getting increasingly serious about collectively figuring out how to let others describe and transcribe their stuff. At the same time, there continues to be a lot of interest in the potential for linked open data in libraries archives and museums. I thought I would take a few minutes to try and sketch out a way that I think these things could fit together a bit.

I’ve been increasingly thinking it would be really neat if we could come up with some lightweight conventions for anyone anywhere to describe an object that lives somewhere else. At this point, things like the Open Annotation Collaboration presumably provide a robust grammar to actually get into markup and whatnot if folks wanted to really blow it out, but I think there is likely some very basic things we could just do to try and kick off an ecosystem for letting anyone mint URLs that have descriptive metadata that describe objects that live at other URLs.

My hope in this, is that instead of everyone building or standing up their own systems, we could have a few different hubs and places across the web where people describe, transcribe and annotate that could then be woven back into the metadata records associated with digital objects at their home institutions. In some ways this is really the basic set of promises and aspirations that Linked Open Data is intended to help with. Here I am just intending to try an think through how this might fit together in a potential use case.

A Linked Open Crowdsourcing Description Thought Experiment

With a few tweaks, we are actually very close to having the ability to connect the dots between one situation in which people further describe archival materials (in this case to create bibliographies) that could provide enhanced metadata back to a repository. I’ll talk through how a connection might be forged between Zotero and one online collection, but I think the principles here are generic enough that if folks just agreed on some conventions we could do some really cool stuff.

The Clara Barton papers are digitized in full, but in keeping with archival practice, they are not described at the item level. In this case, the collection has folder level metadata. So since it’s items all the way down in a sense, the folders are the items.

As a result, you get things that look like this, Clara Barton Papers: Miscellany, 1856-1957; Barton (Clara) Memorial Association; Resolutions and statements, 1916, undated. This is great. I am always thrilled to see folks step back from feeling like they need item level description to make materials available on the web. Describe to whatever level you can and make it accessible.

Clara Barton Papers Folder Level Item

Clara Barton Papers Folder Level Item


With that said, I’m sure there are people who are willing to pitch in and make some item level metadata for the stuff in that folder. Beyond that, if a scholar is ever going to actually use something in that folder and cite it in a book or a paper they are going to have to create item level description. Wouldn’t it be great if there was a generic way for the item level description that happens as a matter of course to put a footnote in an article or a book could be leveraged and reused?

Scholars DIY Item Level Description in Zotero

Everyday, a bunch of scholars key in item level description for materials in reference managers like Zotero. To that end, I’ll briefly talk through what would happen if someone wants to capture and cite something from the Clara Barton Papers in Zotero. Because there is some basic embedded metadata in that page, if you click the little icon by the URL you get that initial data, which you can then edit. You can also then directly save the page images into your personal Zotero library.

So you can see what that would look like below. I started out by saving the metadata that was there, I logged the URL that the actual item starts at inside the folder, changed it from a web page to a document, keyed in the title and the author of the document. I also saved the 2 actual images that are associated with the two images from the 19 images that are actually part of the item I am working with as attachments to my Zotero item.


Creating an item level record for materials in the Clara Barton papers folder in Zotero for the purpose of citing it.

So, now I can go ahead and drag and drop myself a citation. Here is what that looks like. This is what I could put in my paper or wherever.

Logan, Mrs. John. A. “Affidavit of Mrs. John A. Logan,” 1916. Miscellany, 1856-1957; Barton (Clara) Memorial Association; Resolutions and statements, 1916. Clara Barton Papers.

Now, wouldn’t it be great if there was a way for Zotero to ping, or do some kind of track back to the repository to notify folks that there is potentially a description of this resource that now exists in Zotero. That is, if I could ask Zotero’s API to see every public item they have that is associated with a URL. In particular, every item that someone actually went through the trouble to tweak and revise as opposed to the things that are just the default information that came out to begin with.

Connecting Back from the Zotero instance of the Item

At this point, I added in descriptive information, and because I have the two actual image files, I also know that the information I have refers directly to mss/mss11973/116/0400/0451.jp2 and mss/mss11973/116/0400/0452.jp2. So, from this data we have enough information to actually create a sub-record for 2 of the 19 images in that folder.

Because I have a public Zotero library, anyone can actually go and see the Item level record I created for those 2 images from the Clara Barton Papers. You can find it here In this case, the URL tells you a lot about what this is off the bat. It’s an item record from user tjowens and it has a persistent arbitrary item ID in tjowens’ library (IHKBH5WQ). Right that page could track back to the URL it is associated with, or even something simpler than that, just a token in the link that a repository owner could look for in their HTTP referrer logs as an indicator that there is some data out there at some URL that describes data at a URL that the repository has minted. So for instance, just stick ?=DescribesThis or something on the URL, like . Then tell folks who run online collections to go and check out their referrer traffic for any incoming links that have ?DescribesThis in them. From there, it would be relatively trivial to review the incoming links from logs and decide if any of them were worth pulling over to add in as added value of descriptive metadata.


Here is an image of the Item page created for the record I made in Zotero

Aside from just having this nice looking page about my item, the Zotero API means that it’s trivial to get the data from this marked up in a number of different formats. For instance, you can find the JSON of this metadata at


The JSON from the Zotero API for the item I created there. It’s easy enough to parse that you can pick out the added info I have in there, like the title and author.

So, if someone back at the repository liked what they saw here, they could just decide to save a copy of this record, and then ingest it or integrated it with the existing records in your index through an ETL process.

What I find particularly cool about this on a technical level, is that it becomes trivial to retain the provenance of the record. That is, an organization could say “description according to Zotero user tjowens” and link out to where it shows up in my Zotero library. This has the triple value of 1) giving credit where credit is do and 2) offering a statement of caveat emptor regarding the accuracy of the record (That is, it’s not minted in the authority of the institution but instead the description of a particular individual) and 3) providing a link out to someone’s Zotero library that likely could enable discovery or relate materials from other institutions.

Linked Open Crowdsourced Description

The point of that story isn’t so much about Zotero and the Clara Barton Papers, but more about how with a little bit of work, those two platforms could better link to each other in a way that the repository could potentially benefit from the description of it’s materials that happens elsewhere.  If a repo could just get a sense of what people are describing of it’s materials, they could start playing around with ways to link to, harvest, and integrate that metadata. From there, organizations could likely move away from building their own platforms to enable users to describe or transcribe materials and instead start promoting a range of third party platforms that simply enable users to create and mint descriptions of materials.


Posted in Uncategorized | 5 Comments

Where to Start? On Research Questions in The Digital Humanities

How should digital humanities scholars develop research questions? Spurred on by this recent conversation on twitter, I figured I would lay out a few different ways to go about answering this question about questions. The gist of the dialog is that Jason Heppler suggested that one should “Fit the tool to the question, not the other way around” in terms of working with various kinds of new digital humanities tools. I take tools here to mean any computational instrument employed to understand the world; for examples GIS, topic modeling, creating simulations using cellular automata or agent based models, analyzing frequencies of audio files, or visualizing trends in images. I get where Jason was going, but at least as it was formulated I don’t think it is the right advice.

The conversation prompted me to try and clarify a bit of how I see the relationship between research questions, primary sources, and tools and methods.

Start with the Question, the Archive or the Tool?

Some historians start with their question, some start with a familiarity with a period that suggests that exploration of a particular archive or collection of primary resources could answer. Here are two examples I can recall from colleagues who I worked with doing research in the history of science.

One colleague was aware of the shift that had occurred between classical and modern physics in one astronomer’s work, documented in a recent essay. So he went to look at the papers of another astronomer, which had not yet been particularly well explored, to see if similar or different responses to the notion of a distinction between classical and modern physics had emerged in that astronomer’s work. In short, it was largely about abstracting the results of one exploration into the information available in another individuals archive.

In either case, it’s a bit of a dance between formation of questions and the ways that those questions open up or shift and change as one gets into the complicated, rich and vast space of the possibilities of primary sources.

The Function of Research Questions in History/the Humanities

Back up a bit. What is the purpose of research questions in the humanities? I would posit that the purpose of them is to clarify what is in and out of scope in a project. To define where a project should start and end. Lastly, research questions provide a constant point of reference to check back on when working on a project. You write down your questions as you go, and you can always pull them out again and check to see if, in fact, you are actually working to answer them or if you have drifted off to some other problem. Research questions are useful structures to organize your work and inquiry and they are valuable tools for signifying to others what to expect from a piece of scholarship. Research question are functionally an attempt to establish the set of criteria by which a piece of scholarship should be evaluated.

The Problem of Research Proposals and Fancy Writing

One of the big problems in talking about research questions is that one often describes research questions and methods in research proposals (for grants or dissertations etc.), and those proposals are often really a form of what Joe Maxwell calls “fancy writing.” That is, those kinds of research proposals are more about the performance of demonstrating how smart you are and why you should be given permission to do work than they are about actually trying to get research done. If you haven’t read it, I can’t recommend Joe’s Qualitative Research Design: An Interactive Approach strongly enough. In focusing on the actual purpose of research design and not the performance of proposal writing he cuts through a bunch of the fancy stuff to get to the way that research questions actually develop and evolve. He calls it an interactive approach, but I think iterative would be just as descriptive.

In Maxwell’s approach, there are five components of research design as it is actually practiced.

  1. Your goals (the reason you are doing the research),
  2. Your conceptual framework (the literature you are working in, your field, your experience that you draw from),
  3. Your research questions (a set of clear statements of exactly what you are studying)
  4. Your methods (broadly conceived as the way you are going to answer the question, so for historians both the archives/sources you will work from and their perspectives are relevant as well as the way you will sample/explore them, and the actual techniques you will use to analyze and interpret them)
  5. The validity concerns and threats (literally, answers to the question “how might you be wrong” where you work through inherent limitations and biases in your methods, sources, perspective, etc.)

The diagram below illustrates how 5 components of design interact

Illustration of how research questions should be itteritivly defined and developed in relation to purpose, conceptual framework, methods, and validity threats.

Illustration of how research questions should be iteratively defined and developed in relation to goals, conceptual framework, methods, and validity threats. From Maxwell 2014

The main point of the diagram, is that your research questions should be iteratively revised and refined throughout the work based on all the four other things that you are working on.

So… research questions aren’t something you state and then follow through on, they are best thought of as statements about your inquiry that are iteratively refined through the process of defining what you are working on.

Generally, the way that research questions are stated in quantitative research is bogus, or at least, bogus in terms of the way that people who do more qualitative research think of research questions. That is, you do a lot of work and scholarship before you can ever formulate a hypothesis that you can test. In that case, you end up with a research question at the end of an exploration not at the front of it.

Tools, Archives, & Research Questions are Inherently Theory Laden

Getting back to the issue of questions, tools, and sources; being good humanists, it is worth leaning back to grok that all method is theory laden. That is, every attempt to answer a question comes with inherent theoretical assumptions about the problem and limitations in what that method can provide in terms of answers. This is true of method broadly conceived; every method for collecting sources/evidence, the original intent by which records and sources are collected create silences, identifying a problem, interpreting sources, composing and reporting on results, all of that, comes with some inherent biases.

That is, all tools, all archives and all research questions are in and of themselves instrumental. We use them in an attempt to understand the world. That is they all serve as lens like tools reflecting and refracting back information in a tool like fashion. I’ve always liked the way that Umberto Eco explains this in Kant and the Platypus as a core concept in hermeneutics; we make interpretations but the underlying reality of existence exerts the force to resist some of those interpretations by simply saying “No” by making it clear that an interpretation can be refuted. A hermeneutics of data that emerges through the use of tools.

So where to start? Start wherever, as long as where you start is anchored in your goals. The hermeneutics of screwing around is itself invaluable. A technique of messing with tools and datasets at hand may well surface interesting patterns that no one would have found if they were working at sources in a another fashion. Pick and archive and find the questions. Or, just start with your questions and work it that way. Whatever you do, realize that it’s an exploratory process.

What matters most in where you start is your actual goals in doing the research. That is, why is it that you are actually doing your work? What is it that you hope your work will potentially do. Don’t confuse your goals with what you are interested in, realize and recognize that your goals area about the purpose of your work. If you want to do work that ultimately helps to understand and give voice to the voiceless then you likely don’t want to start messing around with the text of inaugural presidential speeches. If you want to figure out new kinds of things that can be done with topic modeling then you would presumably want to start with some sources that are in a form or close to a form that you can topic model.

Thanks to Thomas Padilla and Zach Coble who reviewed and provided input on a draft of this post.


Posted in Uncategorized | 3 Comments

Digital Archivists: Doing or Leading the Digital?

I’ve been enjoying Jackie Dooly’s recent series of posts looking at the skills and duties that are showing up in job postings for digital archivists.  I’m excited to see archives listing these. Staffing up illustrates how the issues of electronic records have risen to a significant issue in the minds of the deciders.

Like many who share this particular job title, I have some complicated feelings about the idea of “The Digital Archivist.” While my official job title is Digital Archivist, I’ve generally added a caveat. When I encounter someone else with that title, I often go on to explain that I’m more of a meta-digital archivist. That is, most of what I do is about policy, strategy, and standards; establishing and documenting practices, and collaborating to document and codify emerging practices. However, I’m becoming increasingly convinced that most of what I do is actually largely what digital archivist jobs should be doing.

I think the confusion about what a digital archivist should do is mostly summed up as follows;

Digital archivists should not the people who do the digital stuff. Everybody (including the digital archivists) need to pick up the skills necessary to work with digital records. Instead, digital archivists should be the people who are hired to lead the digital stuff.

I will elaborate on what I mean by this a bit more. I think my main issue with the idea of the digital archivist role is that I want to answer yes to two questions that some folks might imagine to be directly opposed to each other.

Should all archivists be able to work with digital materials? Yes. In this sense, all archivists must become digital archivists. It’s just a part of ongoing professional development. Digital records are not a niche area of material. Digital records are increasingly just a part of the materials archivists need to be able to process. I think some of Rebecca Goldman’s  tweets on this subject illustrate the point. Other fields haven’t hired digital waitstaff, digital nurses, digital journalists, or digital lawyers to deal with the challenges of professional development around technology in their fields.

Screen Shot 2014-06-12 at 11.44.35 AM

Then, does it make sense to have digital archivists as digital specialists? Yes. While everybody needs to have a basic capability, it does make sense to be cultivating leaders and specialists. In this sense, I think the digital archivists jobs are best thought of as having someone who devotes their time to continually 1) figuring out and refining digital process, workflows and tools, and  2) teaching the rest of the staff the techniques and processes they are developing. This means ideally digital archivists straddle a leadership and practice role.

Ongoing Leadership in Digital Work:  Ideally, we all become educators in this future because the only likely thing to stay constant is going to be change. We aren’t going to just establish the new “digital” practices and be done with it. The nature of digital technologies are continually shifting dramatically. That is, the shift from storing information on devices to thin client cloud set ups is frankly has big as the shift from paper to hard drives. The first sixty years of digital technologies has illustrated that there is every reason to believe that the technological mediums and nature of records  will continue to evolve frequently and we are going to need responsive practices to continually evolve with them.

An example from a different field:  I think we can look to the idea of the “School Based Technology Specialist” (SBTS) role as a way to think about this. Instead of hiring someone to be the “computer person” for each of the schools in Fairfax county school district the district created the SBTS role. The idea being that across the schools teachers need to be making better use of computing technology. So it’s not about hiring someone to be the computer person but hiring someone who is functionally an administrator to build capacity for teachers to incorporate digital technology into their practice.

In this vein, SBTS are described as trainers, liaisons, managers, troubleshooters, consultants and collaborators. I think the parallels to the digital archivists role are rather clear. Now, schools and archives are still rather different, so it doesn’t necessarily map over straight away. But still, I think the parallels are meaningful. The digital archivist role can be thought of as a leadership role for establishing practice. I think organizations would do best to think of how digital archivists can be empowered and given the authority to lead work on digital materials.

Curious for others’ thoughts on this.

Posted in Uncategorized | 3 Comments

Mecha-Archivists: Envisioning the Role of Software in the Future of Archives

The Cybermen, exemplify our worst fears about the future of technology. People literally turned into machines replaced and ruled by machines. I think this is the face of a fear of a technological future of archives.

The Cybermen, exemplify our worst fears about the future of technology. People literally turned into machines replaced and ruled by machines. I think this is the face of a fear of a technological future of archives.

I had the privilege of participating in The Radcliffe Workshop on Technology and Archival Processing a few weeks back. I was thrilled to be on a great panel with some early career historians and Maureen Callahan.

Maureen posted her talk The Value of Archival Description Considered online. I encourage you to read it. It’s super good. I was thrilled to find that, I think we are on nearly the exact same wavelength about the future of the finding aid.

There was a nice write up about the event in the Harvard Gazette. I won’t deny that I may be “a millennial who displayed affection for the word “awesome” during the panel.” However, there are some clarifications I should make.  I did not talk about obeying “cyborg overlords”, or a “mechanized shirt of armor.” In sharing some of the points of my talk I thought it would be good to focus in particular on parts of these clarifications. I think getting the language right about the future of our relationships with software is important, so here goes.

Maureen Welcomed the Robot Overloards, but with good reason!

Maureen had a few great lines in her talk (again, if you haven’t read it go do so now). One of those lines was her take on a Simpsons quote, “I for one welcome our robot overlords.” She went on to explain, in an even better line, “I don’t think that archivists are just secretaries for dead people, and I welcome as much automation as we can get for this kind of direct representation of what the records tell us about themselves.” I love this quote. When I was sitting there listening to her I was nodding so much. This is exactly the sentiment I wanted to get at.

The future of digital tools for archives is not replacing the work. It is automating the parts of the work that are not the intellectual labor. Along with that, the future of these tools is largely about taking advantage of the affordances in the nature, structure and order of digital media which give us considerable power to scale up our actions and interventions in the record.

I took the key theme from her pitch to be something like, let the algorithms and digital tools do the repetitive and less intellectual labor of the archivist, and get the archivist more involved in the intellectual labor of the archives. Specifically, in better contextualizing, explaining and describing the provenance of collections and making the decisions that require the kind of sophisticated judgment that people have and exercise. Without knowing where she was going, I touched on several similar themes in my talk. Ideas and visions of the labor relationship between the archivist of the future and the algorithms, scripts and tools that work for her and do her bidding.

Robot Overloards

The welcoming of Robot overlords

We get to wear the robots!

This lego mecha exo-suit is the vision I think we want for the future of digital tools in archives. Here, this mechanized power armor gives the Archivist super powers. Forget lifting a 30 lb box, in this suit you could move whole collections with ease. But that’s aside from the point. This kind of power tool lets you do a lot of the laborious parts of the work and get back more quickly to the intellectual labors.

This lego mecha exo-suit is the vision I think we want for the future of digital tools in archives. Here, this mechanized power armor gives the Archivist super powers. Forget lifting a 30 lb box, in this suit you could move whole collections with ease. But that’s aside from the point. This kind of power tool lets you do a lot of the laborious parts of the work and get back more quickly to the intellectual labors.

So we don’t want the dark vision of the robot master. We certainly don’t want the machines turn us into into the Borg or Cybermen, who lose their souls as they are taken over by the emotionless machine.

My vision for the future of the archivist using digital tools is less Borg and more Exo-suit.

The idea of mecha or exo-suits, illustrates a vision of technology that extends the capabilities of it’s user. That is, the kinds of tools I think we need going forward are exactly the sort of thing that Maureen was talking about. Things that let us automate a range of processes and actions.

We need tools that let us quickly work across massive amounts of items and objects by extending and amplify the seasoned judgment, ethics, wisdom, and expertise of the archivist-in-the-machine.

Fondz as a Tool Thought Experiment for Automation

I was recently working with some archivists who had a project where they had nearly 400 floppy disks containing drafts of letters, books, essays, etc. In short, digital copies of all the kinds of things you find in a collection of someone’s personal papers. I hope to write about that project in more detail in the future, but for now I just wanted to talk a little about a tool that got cooked up in the process. So, what can you do with some 19,000 documents like this? Now, you can learn a ton about a set of digital files by extracting and identifying them in automated processes. That is, what kinds of files they are, their file names, size, etc. It’s really useful data! However, in most cases, this is not at all the data that a researcher or other user who might work with the collection would want. Inevitably, users want to know where information related to x, y, or z is in a collection. That is, users care about topics and subjects, and the kinds of tools most of us have at hand don’t really do much with that.

Here you can see some of the very basic kind of information that is relatively easy to get at with existing tools, numbers of files, their size and their formats. This image shows the files processed and presented by Fondz in a particular test set come from 379 bags (in this case each bag contains a logical disk image). Collectively this includes 18,414 files in 49 formats.

Here you can see some of the very basic kind of information that is relatively easy to get at with existing tools, numbers of files, their size and their formats. This image shows the files processed and presented by Fondz in a particular test set come from 379 bags (in this case each bag contains a logical disk image). Collectively this includes 18,414 files in 49 formats.

To this end, I asked my colleague Ed Summers a while back if it would be possible to  strip out all the text from these documents, topic model it, and then use the topic models as an interface to the documents. In response, he cooked up a tool called Fondz.

For those unfamiliar, the MAchine Learning for LanguagE Toolkit (MALLET) describes topic modeling as follows. “Topic models provide a simple way to analyze large volumes of unlabeled text. A “topic” consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings.” In this case a tool like MALLET can quickly look across a large collection of texts and identify topical clusters of terms that appear near each other.

How Edsu describes Fondz on github.

How Edsu describes Fondz on github.

I really like how Ed describes Fondz, so I’ll share it here.

fondz is a command line tool for auto-generating an “archival description” for a set of born digital content found in a bag or series of bags. The name fondz was borrowed from a humorous take on the archival principle of provenance or respect des fonds. fondz works best if you point it at a collection of content that has some thematic unity, such as a collection associated with an individual, family or organization.

Example of the Fondz topic driven interface to documents in an archival collection

Example of the Fondz topic driven interface to documents in an archival collection

Above, you can see an example of Fondz in use. This is a list of the topics that Mallet identified, in each case you see the number of documents associated with the topic on the left and in the blue box you see the terms which Mallet has identified as being associated with that topic. That first one, with 776 documents, ends up being a cluster of files versions of biographical notes and CVs, the third one, with 309 topics, is materials related to a novel and a film adaptation of that novel. Mallet doesn’t know what those topics are. It just sees clusters of terms. Based on my knowledge of the collection, I’m able to identify and name those clusters.

The result of all this is a topical point of entry to explore 19,000 digital files from hundreds of floppies. It would work just as well for OCR’ed text from recent typed and printed text. I can’t show it to you in action because I don’t have a test collection that I can broadly share. (Note, anyone who has a similar collection they can broadly share contact me about it) But take my word for it. You click on one of those topics and you see a list of all the files that are associated with it and if you click on the name of one of those files you end up seeing an HTML representation of all the text inside that file. Alongside this, a future idea would be to integrate tools that do things like Named Entity Extraction (NER) to identify strings of text that look like names of people, places and locations. Indeed, there are already attempts to use NER for disambiguation in cultural heritage collections. What is particularly important here is not that we build tools that do this “right” but that we find and use tools that make things that are “good enough” in that they are useful in helping people explore and find things in collections. This isn’t about robots just doing all the work. It’s about extending and amplifying our ability to make materials available to users in ways that help them “get to the stuff.” Aside from that, there is a need to provide users with information on what actions were preformed on the collection to make it available. To that end, it’s exciting to realize that we can simply document what tools were used so that anyone can explore the potential biases of those tools in how they create interfaces to collection data.

So what does this all have to do with cyborgs and mecha? What is in some ways most interesting to me about topic modeling is that the topics themselves are actually somewhat arbitrary and meaningless. A topic in MALLET isn’t so much a topic in regular parlance as it is just a cluster of words that tend to appear together. It takes someone who knows the texts to make sense of those topics, to fiddle with the dials till they get topics that seem hang together right (in MALLET you pick how many topics you want it to look for). So Fondz will be far more useful when it integrates processes for archivists to exercise their expertise and their judgment and intervene. When they can name the topics and describe them. When they can accept or reject some of the topics, when they can rerun them.

Since the goal here is to make useful descriptions there is a potential here for topic modeling to be used instrumentally to surface connections for an archivist to find useful or not useful and to save the useful ones and describe them. Given that good processing is done with a shovel, not with a tweezers it is exciting to think about how tools like Fondz could integrate a range of techniques for computational analysis of the content of files to act as steam shovels; instruments that put the archivist in the driver’s seat to explore and work through relationships in collection materials and expose those to users.

There are a bunch of other cleaver things that Ed is doing with Fondz that warrant further discussion, but for the purpose of this post that does it. As far as take-away messages go, I’d suggest the following. The future of digital tools for digital archives is not about tools that “just work.” It’s not about replacing the work of archivists with automated processes, it’s about amplifying and extending the capabilities of an archivist to do cleaver things with somewhat blunt instruments (like topic modeling, NER, etc.) that make it easier for us to make materials accessible. Given that the nature of digital objects is a multiplicity of orders and arrangements, if we can generate a range of relatively quick and dirty points of entry to materials we can invest more time and energy in making sure that when someone gets down to the item they have breadcrumbs and information that situates and contextualizes the item in it’s collection and it’s custodial history. We need archival-mecha, tools that give archivists superpowers by amplifying their judgment, wisdom, knowledge, ethics and expertise in working with digital materials. We need to make sure we are getting the computers to do what computers do best in supporting the praxis of archival practice.


Posted in Uncategorized | 4 Comments

Einstein as Science Santa: Monumental Meanings & Wil Wheaton

Recently, Wil Wheaton posted a picture and quote on twitter and his blog (That time I met Albert Einstein) making use of the Albert Einstein Memorial at the National Academy of Science. It’s great, he is sitting on Einstein’s lap, making requests to Einstein as a kind of physics Santa. I really love how the post, and all the likes and favorites it has gotten reinforces a set of points I made about the memorial in my essay Tripadvisor rates Einstein: using the social web to unpack the public meanings of a cultural heritage site.wheaton-eisntein

I love the way this photo fits with much of the informal and playful ways that other photos of the monument work. Here is a bit of some of what I wrote in the piece on some photos of the monument on Flickr. The images are from Flickr, and the quotes from Yelp reviews of the memorial.

Most monuments in the area establish a kind of formality between visitors and the monument. Many are constructed to physically remove the subject from the reach of visitors. Others, like the nearby Lincoln Memorial establish this formality through written rules about respectful behaviour, and a request for hushed voices. Nearly all of the reviews (17 of 21) focus on elements of the informality of the monument as a key component of what makes it enjoyable. The reviewers tell us to “climb all over ‘Al’” or as another suggests “sit on his lap, or kiss his cheek”. On Flickr, photographers have captured this in images of visitors picking and rubbing his nose, kissing him, or in a few cases arguing with him. While there is no posted notices which suggest that it is ok to climb him, if you stop by the monument on any summer day you will witness a queue of visitors waiting to climb up on him and have their picture taken.

An example of how groups of tourists use the memorial to stage group photos

The pictures are themselves an important element in this experience. The image above provides an example of one of the most popular kinds of images of the memorial posted on Flickr. As one reviewer notes, “everyone needs at least one picture of themselves sitting on “Al’s” lap”. As you can see from the photograph, the scale and size of the monument makes it work as a space for staging photos. The monument is so photogenic that one reviewer suggests that it “just begs you to go sit on Uncle Al’s lap and get our picture taken”. For these reviewers a central part of the experience is the informality that the monument provides. It invites them to climb him, and leave with photographic evidence of them sitting on the world’s most instantly recognisable scientist. While everyone has photos of themselves standing in front of the Lincoln Memorial these reviewers believe “Your tour of the Mall is not complete” without having your picture taken on Einstein’s lap.

Photo: Schmidt, C., 2008. Arguing with Einstein, Available at:

It is worth taking moment to reflect on how some of the previous quotes refer to Einstein. The informality of these experiences is further communicated through a persistent use of his first name, or in some cases the diminutive form of his name, Al. This is itself a frequent component of these reviews. In using his first name, or calling him ‘Al’ the reviewers are communicating and playing with the informality of the memorial. The pervasiveness of this informality may be best evidenced in the recollections of a college student from a nearby university who ‘spent a lot of time just hanging out with ‘Al’’. The informality of the space and the fact that it is climbable leads many reviewers to discuss how it is a perfect place to bring kids. Many of the photos of the monument on Flickr show young children climbing all over him.

This level of informality is not something that all the reviewers think is necessarily a good thing. One reviewer suggests “most of the neat stuff was totally ignored by all the kids using the statue as a playground”. This reviewer goes on to suggest that the other elements in the composition of the statue, the quotations, and the map of the stars at his feet go unnoticed. From his perspective, visitors were “just jumping around”. He felt that “no one learned or read about the man memorialised”. This reviewer further suggests that it is ‘disrespectful’ to climb all over the monument, particularly, when there is no clear indication that touching or climbing the memorial is officially sanctioned by the sculptor or the National Academy of Sciences. There is defiantly credence to the questions the reviewer raises. To what extent are these visitors leaving with an understanding of the intentions behind the memorial? Certainly some visitor’s suggestions that “You can climb on the damn thing and stick pennies up his nose” take on a disrespectful tone. However, that is itself an interesting point of tension in the idea of Einstein. The more recently constructed Franklin Delano Roosevelt Memorial, which is built on a scale that would allow one to climb on him, does not invite the same kind of interaction. Popular notions of Einstein as an informal figure have translated into how people interact with the memorial. The relaxed experience Berks found in sculpting the memorial from life is very directly translated into visitor’s comments about the informality and relaxing nature of the experience of the monument.

This is just a way of saying, for those of us interested in public memory and the role of memorials really need to be watching the ways that people make use and sense of them on social media. At this point, our experiences of these spaces are increasingly going to be seen through the lens of the tweets, reviews, and photos that others have taken and shared and commented on them.

Posted in Uncategorized | Leave a comment