Mining Old News For Fresh Historcal Insight

This week I had the honor of participating in the Library of Congress’ national strategy for digital news summit. The Library gathered together a diverse mix of corporate and public archivists, representatives from public and private foundations, and librarians to discuss the digital future of news. The conversations focused on both how to preserve born digital news and how to archive old news migrating into digital forms. I was honored to have a chance to bring in my perspective as a consumer of that archived news.

I gave a short presentation about some of the ways digitized historical news enables historians to ask different kinds of questions. I think the talk has some implications for both historians and digital archivists, so I thought I would share the gist of the talk here to continue the conversation we started at the meeting.

In my mind this contributes to ongoing discussions about the role that digital tools should play in re-framing conversations about historical methodology. Since the structure of the archive plays a significant role in the structure and character of the kinds of questions a historian can ask it’s crucial for historians to be involved in helping shape these archives.

A Use Case for Historical News: Marie Curie Visits America

On May 11, 1921, the world’s most famous female scientist disembarked from a long Atlantic voyage in New York City. For the ten weeks Marie Curie toured the United States, she was greeted as an international celebrity, according to the New York Times, the “biggest hit of any celebrity who has come to New York” for quite some time. Curie was greeted with speeches and fanfare in New York, Washington DC, Pittsburgh and Chicago, gracing major news papers several times a week. Less than a year after American women won the right to vote through the 19th Amendment, Marie Curie —the only noble laureate twice over and worlds most distinguished women of science— visited the United States. Last year I decided to explore how different periodicals reported on Curie’s visit. Analysis of coverage of her visit exposes divergent ideas about the place of women in American science, society and work emerging in the early twentieth century. For our purpouses, this case also exposes some of the transformational power  databases and digital tools present for  historical inquiry.

Asking A Database Historical Questions

Picture 1

It took me six seconds to find the 1512 references to Marie Curie in the entire history of the New York Times, the Atlanta Constitution, the LA Times, the Boston Globe, the Washington Post, the Chicago Tribune and the Wall Street Journal. Now this obviously saved me a ton of time, but the implications of this search are much deeper than this. Reading the entire history of these publications for mentions of Curie would not only be impractical, it would be impossible.

If I had wanted to explore press coverage of Curie in the pre-full-text search world, I would have selected a few key dates when I would expect her to have been mentioned, gone to the library, and rolled out the microfilm. I would have found many of these articles, but the time it takes to find them requires a larger upfront commitment to exactly what I intend to explore, and how I want to explore it. With search I have the ability to quickly get a feel for different questions in different queries while simultaneously uncovering mentions of Curie on editorial pages and in periods I would not have expected to find her mentioned.

Personal Archive Tools Exponentially Increase This Transformative Power

Repositories like Proquest historical News are powerful, and their ability to allow users to explore connections between items inside their collections has a powerful effect on the kinds of questions historians can ask about their contents, but that is just the surface of the potential these databases afford. With a tool like Zotero it is possible to aggregate materials from a variety of different sources and mine them in sophisticated ways for historical insights.

After I gathered the relevant items and fulltext PDFs from Proquest I pulled a similar search through Reader’s Guide Retrospective. While readers guide retrospective did not offer seamless integration with Zotero I was able to pull out structured data for hundreds of references, and with a few clicks had submitted inter-library loan requests for fulltext scans of the most relevant articles. When I received those PDFs I was able to simply drag and drop them into Zotero to store alongside the data. As I constructed my personal archive I was then able to turn Zotero’s search capabilities on the collection to explore interesting relationships between my data.

Zotero Library

Data fields carry unexpected potential

I created a variety of saved searches from criteria in my research data. Page numbers are included in this data for a specific reason, they are crucial for citation. Beyond that purpose, page numbers also represent an important statement about the objects in my collection. While all of the articles I discovered about Curie are relevant to my analysis articles on the frontpage of a newspaper are particularly relevant to questions about how Curie was presented to the public. This field in my database, the page on which each article can be found, was included to help people find the articles in citations, but it, like many other fields in my database, also communicates an historical significance.


Facets of that significance can expose historical insight

Once I had isolated the frontpage stories about Curie I had the opportunity to further explore this subset of thirty or so articles. Zotero’s ability to visualize the collection in a timeline allowed me to quickly visualize the chronology of Curie’s appearances on the front page. From there I could use the “highlight” function to further explore the data. Based on my experience with discussions of Curie’s visit to America I decided to highlight the mention of cancer in titles, finding the word in a plurality of the frontpage studies leads to a particular historical insight.

Marie Curie’s contributions to science are impressive, but the connection between her work and a cure or treatment for cancer is tenuous. While the word cancer does not appear, in any significant fashion, across all the hundreds of article titles about her visit, it does show up in a significantly larger portion of the front page story titles. This provides tentative support for the notion that Curie’s work, and importance, was misrepresented in feminine terms, framing in the feminine role of healer instead of the masculine role of a scientist.


Implications for history and digital archives

Implications for historical methods: While it is indeed possible to count these things out without these sort of tools, the ease at which I was able to mine a large set of documents for relevant information, and historical insight, has important ramifications. As far as I am concerned, the only way that historians can overcome the issues that arise from the problem of abundance of historical materials is to begin using tools for data analysis that allow for “distant readings” of texts. This can only be accomplished if some larger issues are observed in the creation and digitization of historical records and texts.

Implications for historical archives and databases: Exposing fulltext and coherent metadata is essential, building fancy repository specific visualizations and manipulations is extravagant.  What is going to matter to historians of the future is the ability to take your data, dump it onto a tool like Zotero, and use any number of analytical tools to explore that data in relation to information from other repositories. In that light, any fancy encoding and detailed levels of information you work into your resources is of limited use if that is not carried across into other spaces. We are not going to solve the problem of abundance by digging deeply into small sets of documents encoded in TEI, were going to get there with the metadata we have, dirty OCR and the emerging universe of entity extraction.

Sunrise on Methodology and Radical Transparency of Sources in Historical Writing

hip twotone nixon pictureEarlier this week Tom Scheinfeldt, of Found History suggested that the historical profession could well be moving in a new direction. For quite sometime historians have been concerned with questions of ideology, arguments about which historical-isms are the best for a given task. Tom, suggests that new media tools (like text mining) challenge historians to consider methodological questions anew.

I think there is a great example of one of these new methodological conversations that could be emerging in the way we work with source material. Consider historian Jeremy Suri‘s article in this months Wired magazine, a brief 4 page adaptation of a paper he coauthored with political scientist Scott Sagan. Beyond being a bit pithier and coming with hip twotone images of Nixon I would imagine that most historians would suspect that the brief wired article is simply a derivative from the original 33 page article published in International Security. But Suri’s article in Wired gives the historian something very valuable that the original paper does not.

When you read the Wired article online you are only a click away from scans of many of the declassified primary sources Suri used to develop his argument. This gives the reader a radically transparent view into the source material supporting the case Suri argues. Imagine what this kind of source transparency could do if it became standard practice for historical journals.

As a thought experiment consider the implications of the David Abraham Affair. When several historians rigorously fact checked Abraham’s footnotes and turned up a host of inconsistencies he was drummed out of the historical profession. In analysis of the incident in That Noble Dream Peter Novik suggested that Abraham’s sloppiness was not a isolated case, but instead one of the only times a historians footnotes were so rigorously fact checked. This kind of double checking doesn’t happen that often largely because it is so time consuming. How many people would retrace a historians footsteps through archives scattered around the world to double check each citation? But when checking sources becomes as simple as clicking a link what do we think will turn up everyone else’s footnotes?

You might think the linked citations I just mentioned are something that will never happen. Or that this kind of change is twenty years out. But, just last week Jstor started to implement new features that bring this kind of linked connection to secondary literature and <shamelessplug> on a very basic level our work on Zotero’s ability to create smart bibliographies allows authors the ability to put their bibliographies upfront for others to quickly grab. Beyond these two projects however, our plan for the Zotero Commons will facilitate exactly this kind of radical transparency for primary source material in historical scholarship. Through a collaboration with the internet archive any author will be able to stick permanent URI’s on their cache of scanned source material. Allowing anyone to link out to an author’s primary sources.</shamelessplug>

With the commons, every professional and amature historian will be able to end their papers with. “You can find the documents cited in this paper @ Zotero Commons.” So, the question is, when it takes 15 seconds instead of 15 hours to fact check a source do we think historians will start to write differently, or otherwise change how they do their work?

Why we need to Play History

In the last few years there has been a wealth of interest in games for learning. A growing body of research on the educational value of games underlines the ways the can engage students like no previous media. There are now conferences and journals dedicated to games and learning, the MacArthur foundation last year granted 50 million dollars to different groups to build educational games, articles in Nature and Science have explored the potential for games to simulated health emergencies and elicit scientific thinking. In short there is a lot of interest and excitement about the potential for games, many of these games are under-construction and many are ready for students and teachers to start playing.With all the interest and infrastructure that has been invested in games for learning there is no comprehensive spot for connecting teachers with the resources which have now cost foundations and universities hundreds of millions of dollars. Many of these games are rapidly built, tested, and promptly shelved, often never having been played by more than a handful of students. It is clear that there is a need to connect these games with teachers. Bringing this bleeding edge technology and learning theory to the finger tips of teachers around the world through a web community.

Aggregating these games is simply not enough. Teachers are overworked, underpaid and often stretched to the limit. This project’s success is contingent on making it as easy as possible for teachers to find high quality content related to their immediate needs in only a matter of minuets. By enabling teachers to search for games by time periods, historical keywords, educational standards and associated lesson ideas the tool would be built to make it as easy as possible for teachers to integrate high quality games and simulations into their daily plans.

As more teachers begin to use the tool it will have the potential to engage other audiences. Several communities have emerged in the last few years as places for independent game developers to share their games with the public. Once Playing History reaches a critical mass of teachers and potential classrooms to play these games it can become a spot for developers to try building games for the classroom with easy distribution across the world. This has the potential for building a community where these developers respond directly to the needs of practicing teachers improving the quality and quantity of games available for theses purposes.

Once this relationship is cemented it will become a rich resource for educational researchers. Through a separate interface researchers will be able to track which games are successful at what times in what parts of the world giving them further information to inform game design.

There is something tragic in the fact that so much money is being spent to develop so many amazing games and simulations, but those resources are often lost and kept out of the hands of the teachers who could put them directly into use. With a small investment in Playing History we can connect the research and development community with the teaching community and in so doing tremendously benefit both groups.