Defining Data for Humanists: Text, Artifact, Information or Evidence?

Fred and I got some fantastic comments on our Hermeneutics of Data and Historical Writing paper through the Writing History in the Digital Age open peer review. We are currently working on revising the manuscript. At this point I have worked on a range of book chapters and articles and I can say that doing this chapter has been a real pleasure. I thought the open review process went great and working with a coauthor has also been great. Both are things that don’t happen that much in the humanities. I think the work is much stronger for Fred and I having pooled our forces to put this together. Now, one the comments we got sent me on another tangent. One that is too big of a thing to shoe horn into the revised paper.

On the Relationship Between Data and Evidence

We were asked to clarify what we saw as the difference between data and evidence. We will help to clarify this in the paper, but it has also sparked a much longer conversation in my mind that I wanted to share here and invite comments on. As I said, this is too big of a can of worms to fit into that paper, but I wanted to take a few moments to sketch this out and see what others think about it.

What Data Is to a Humanist?

I think we have a few different ways to think about what data actually is to a humanist. I feel like thinking about this and being reflexive about what we do with data is a really important thing to engage in and here is my first pass at some tools for thought about data for humanists. First, as constructed things data are a species of artifact. Second, as authored objects created for particular audiences, data can be interpreted as texts. Third, as computer processable information data can be computed in a whole host of ways to generate novel artifacts and texts which themselves open to interpretation and analysis. This gets us to evidence. Each of these approaches, data as text, artifact, and processable information, allow one to produce/uncover evidence that can support particular claims and arguments. I would suggest that data is not a kind of evidence but is a thing in which evidence can be found.

Data are Constructed Artifacts

Data is always manufactured. It is created. More specifically, data sets are always, at least indirectly, created by people. In this sense, the idea of “raw data” is a bit misleading. The production of a data set requires a set of assumptions about what is to be collected, how it is to be collected, how it is to be encoded. Each of those decisions is itself of potential interest for analysis.

In the sciences, there are some agreed upon stances on what assumptions are OK and given those assumptions a set of statistical tests exist for helping ensure the validity of interpretations. These kinds of statistical instruments are also great tools for humanists to use. However, they are not the only way to look at data. For example, most of the statistics one is likely to learn have to do with attempting to make generalizations from a sample of things to a bigger population. Now, if you don’t want to generalize, if you want to instead get into the gritty details of a particular individual set of data, you probably shouldn’t use statistical tests that are intended to see if trends in a sample are trends in some larger population.

Data are Interpretable Texts

As a species of human made artifact, we can think of datasets as having the characteristics of texts. Data is created for an audience. Humanists can, and should interpret data as an authored work and the intentions of the author are worth consideration and exploration. At the same time, the audience of data is also relevant, it is worth thinking about how a given set of data is actually used, understood and how data is interpreted by audiences that it makes its way to. That could well include audiences of other scientists, the general public, government officials, etc. In light of this, one can take a reader response theory approach to data.

Data are Processable Information

Data can be processed by computers. We can visualize it. We can manipulate it. We can pivot and change our perspective on it. Doing so can help us see things differently. You can process data in a stats package like R to run a range of statistical tests, you can do like Mark Sample and use N+7 on a text. In both cases, you can process information, numerical or textual information, to change your frame of understanding a particular set of data.

Data can Hold Evidentiary Value

As a species of human artifact, as a cultural object, as a kind of text, and as processable information data is open to a range of hermeneutic processes of interpretation. In much the same way that encoding a text is an interpretive act creating, manipulating, transferring, exploring and otherwise making use of a data set is also an interpretive act. In this case, data as an artifact or a text can be thought of as having the same potential evidentiary value of any kind of artifact. That is, analysis, interpretation, exploration and engagement with data can allow one to uncover information, facts, figures, perspectives, meanings, and traces which can be deployed as evidence to support all manner of claims and arguments. I would suggest that data is not a kind of evidence; it is a potential source of information which could hold evidentiary value.

Techies You Decide! You’re either a Feminist or a Misogynist

I got caught up reading Margaret Robertson’s great post today, In Which I don’t try to write like a man. She describes how she has self-censored herself. How she has tried to frequently go out of her way to de-gender herself in her writing on games.

Here is a particularly good quote:

It’s taken me a while to recognise that a big part of why I don’t post things like this is because I’m *scared*. Actually scared. Actually worried that I’ll terminally undermine my credibility. And that’s because the degree of abuse you can attract is of a different order from the generality of internet rough-and-tumble

This depressed me. This feeling of depression took me back to reading Skud’s post, On being Harassed. (Seriously, if you haven’t read Skud’s post go read it now, and some of the links.)

See, I work on open source, but I work on it in libraries and the digital humanities. I also do things with games, but it’s humanities research. In both cases, I end up spending my time on the web hanging with feminists like myself. In general, I think folks in the digital humanities respond rather well to issues around gender and technology. For example, I think the What Do Girl’s Dig conversation that Bethany kicked off was really productive. Heck, it became a book chapter. With that said, we are working on it. I think DH folks do a rather good job in realizing that conversations about technology come pre-loaded with gender problems.

If you read Robertson’s post, and the comments, and Skud’s post I think this becomes rather self-evident. You are either a expressed feminist or you are a witting or un-witting misogynist. I just wanted to make where I stand clear, and invite anyone else who wants to make this clear to say so as well.

Mysogony or Feminism: The Choice is Yours

But I’m an equalist!!!!111!! No, you’re not. If you are an equalist you are a feminist. The situation is as follows. Society is normative. Society is anti-feminist. That is just how power works. You can choose to recognize this. If you do, the result is that you need to think very carefully about what you are going to do to try to help make sure that your actions don’t further exacerbate the problem. Otherwise you can accept that you are an unwitting accomplice in perpetuating the status quo. Seriously, go read about some of the psychological research on stereotype threat. (For those unaware of stereotype threat research, the gist is that you can quantify the effects of gender and race stereotypes effects on academic achievement on tests.)

This is Not Novel But It Needs to be Restated

The purpose of this post is not to make a new or novel point. I make no claim to be breaking new ground. I just think we need more people in tech, more men in particular, who will explicitly and unambitious state that they are feminists. There are plenty of people out there waiting to shout women down and the more people willing to clearly state that this is a problem the better we will all be.

This is not a women’s issue. I want to live in a more just society. That is why I am a feminist. If you want to live in a more just society then you’re a feminist too. It upsets me when I am reminded of just how unkind and abrasive the web, technology and gaming communities are to women. I feel rather strongly that the world needs more people in technology, men in particular, who are willing to clearly state that they are feminists. To me that means being someone who is willing to think through and second guess my own actions. It also means that I consciously try to advocate on behalf of women in technology.

So, which side are you on? Remember, you get to choose, but choosing not to choose is also choosing a side.

Ancient Wisdom from the Forums: Failures of Collective Intelligence

A while back, I wrote about how the shame you are supposed to feel when someone uses Let Me Google That For You illustrates how finding answers to your questions on the knowledge base that is the internet has become a distinct literacy. That sort of thing is really an example of how making use of collective intelligence for work and life is becoming something we expect people to be able to do.

I thought this XKCD from a few days back gets at the same idea.

The collective intelligence point is also evident in what you see when you mouse over the comic on XKCD. “All long help threads should have a sticky globally-editable post at the top saying ‘DEAR PEOPLE FROM THE FUTURE: Here’s what we’ve figured out so far …'”

Like the answer is on the tip of our collective tongue

Discussion threads are not simply records of conversations, they are part of the global knowledge base. When we get so close, like finding the thread, finding the same question, but can’t find the answer, we experience something a bit like the feeling of having a word on the tip of your tongue. At some other moment of time someone else had this problem, and if someone had just answered it for them it would be answered for me too.