Defining Data for Humanists: Text, Artifact, Information or Evidence?

Fred and I got some fantastic comments on our Hermeneutics of Data and Historical Writing paper through the Writing History in the Digital Age open peer review. We are currently working on revising the manuscript. At this point I have worked on a range of book chapters and articles and I can say that doing this chapter has been a real pleasure. I thought the open review process went great and working with a coauthor has also been great. Both are things that don’t happen that much in the humanities. I think the work is much stronger for Fred and I having pooled our forces to put this together. Now, one the comments we got sent me on another tangent. One that is too big of a thing to shoe horn into the revised paper.

On the Relationship Between Data and Evidence

We were asked to clarify what we saw as the difference between data and evidence. We will help to clarify this in the paper, but it has also sparked a much longer conversation in my mind that I wanted to share here and invite comments on. As I said, this is too big of a can of worms to fit into that paper, but I wanted to take a few moments to sketch this out and see what others think about it.

What Data Is to a Humanist?

I think we have a few different ways to think about what data actually is to a humanist. I feel like thinking about this and being reflexive about what we do with data is a really important thing to engage in and here is my first pass at some tools for thought about data for humanists. First, as constructed things data are a species of artifact. Second, as authored objects created for particular audiences, data can be interpreted as texts. Third, as computer processable information data can be computed in a whole host of ways to generate novel artifacts and texts which themselves open to interpretation and analysis. This gets us to evidence. Each of these approaches, data as text, artifact, and processable information, allow one to produce/uncover evidence that can support particular claims and arguments. I would suggest that data is not a kind of evidence but is a thing in which evidence can be found.

Data are Constructed Artifacts

Data is always manufactured. It is created. More specifically, data sets are always, at least indirectly, created by people. In this sense, the idea of “raw data” is a bit misleading. The production of a data set requires a set of assumptions about what is to be collected, how it is to be collected, how it is to be encoded. Each of those decisions is itself of potential interest for analysis.

In the sciences, there are some agreed upon stances on what assumptions are OK and given those assumptions a set of statistical tests exist for helping ensure the validity of interpretations. These kinds of statistical instruments are also great tools for humanists to use. However, they are not the only way to look at data. For example, most of the statistics one is likely to learn have to do with attempting to make generalizations from a sample of things to a bigger population. Now, if you don’t want to generalize, if you want to instead get into the gritty details of a particular individual set of data, you probably shouldn’t use statistical tests that are intended to see if trends in a sample are trends in some larger population.

Data are Interpretable Texts

As a species of human made artifact, we can think of datasets as having the characteristics of texts. Data is created for an audience. Humanists can, and should interpret data as an authored work and the intentions of the author are worth consideration and exploration. At the same time, the audience of data is also relevant, it is worth thinking about how a given set of data is actually used, understood and how data is interpreted by audiences that it makes its way to. That could well include audiences of other scientists, the general public, government officials, etc. In light of this, one can take a reader response theory approach to data.

Data are Processable Information

Data can be processed by computers. We can visualize it. We can manipulate it. We can pivot and change our perspective on it. Doing so can help us see things differently. You can process data in a stats package like R to run a range of statistical tests, you can do like Mark Sample and use N+7 on a text. In both cases, you can process information, numerical or textual information, to change your frame of understanding a particular set of data.

Data can Hold Evidentiary Value

As a species of human artifact, as a cultural object, as a kind of text, and as processable information data is open to a range of hermeneutic processes of interpretation. In much the same way that encoding a text is an interpretive act creating, manipulating, transferring, exploring and otherwise making use of a data set is also an interpretive act. In this case, data as an artifact or a text can be thought of as having the same potential evidentiary value of any kind of artifact. That is, analysis, interpretation, exploration and engagement with data can allow one to uncover information, facts, figures, perspectives, meanings, and traces which can be deployed as evidence to support all manner of claims and arguments. I would suggest that data is not a kind of evidence; it is a potential source of information which could hold evidentiary value.

Children's Books By The Numbers: Or Two Things I Learned From Franco Moretti

A few weeks ago I had the pleasure of reading Franco Moretti’s Graphs Maps and Trees. If you haven’t read it I highly recommend it as a truly compelling exploration of what individuals interested in the history of literature can glean by counting. After a bit of thought I am confident that some of his approaches will be quite useful in framing our understanding of children’s nonfiction.

As previously mentioned my project began in consideration of an anomaly of numbers. There are more Children’s books about Marie Curie than any other scientist. As a start to quantifying the history of science literature for children I thought it would be worth sorting out a bit more of who the popular stars are in comparison to the major players in biographies of scientists written for a more mature audience.

For a rough start I did some quick searches on the Worldcat for juvenile and non juvenile biographies about a laundry list of popular scientists and inventors and dumped the data at swivel.

Number of Children's Books About Different Scientists and Inventors

It appears that the same trend for gender in science is mirrored in race in invention. Curie is the most written about scientist for children, and George Washington Carver is the most written about inventor. But when we take the list of books for a older audience they fall far out of their top positions. What are we to do with this? The second thing I took away from Moretti is his insistence that we should be actively looking for questions we have no answer for. While this is essentially the same question I started my undergraduate thesis with I don’t really feel I am any more qualified to answer it.

Number of Biographies of Scientists and Inventors Written For An Adult Audience

I have a few ideas but I need to spend a bit more time fleshing them out. Stay tuned for more. In the mean time, what do you think could explain this phenomena? In the next few weeks I will post some of my thoughts on this and hopefully pull together some more robust numbers about these books. I am working on a way to export a CSV file from my Zotero collection that should help me isolate when Curie and Carver became the most written about scientist and inventor for kids

But in the mean time, why is there such a large market for children’s books about Carver and Curie for a young audience, and why does that market dry up when those children grow up?