Computer Scientist: “You can’t do that with Topic Modeling.”
Humanist: “No, I can because I’m not a scientist. We have this thing called Hermeneutics.”
Computer Scientist: “…”
Humanist: “No really, we get to do what we want, we read texts against each other, and then there is this hermeneutic circle grounded in intersubjectivity.”
Computer Scientist: “Ok, but you still can’t make a claim using this as evidence.”
Humanist: “I think we are going to have to agree to disagree here, I think we have different ideas about how evidence works.”
While watching the tweets from the Digital Humanities Topic Modeling meeting a few weeks ago I started to feel the above dialog play out. I wasn’t there, and I am not trying to pigeonhole anyone here. I’ve seen this kind of back and forth happen in a range of different situations where humanities types start picking up and using algorithmic, computational, and statistical techniques. What of all this counts for what? What can you say based on the results of a given technique? One way to resolve this is to say that humanists and scientists should have different rules for what counts as evidence. I am increasingly feeling the need to reject this different rules approach.
I don’t think the issue here is different ways of knowing, incompatible paradigms, or anything big and lofty like that. I think the issue at the heart of this back and forth dialog is about two different contexts. This is about what you can do in the generative context of discovery vs. what you get can do in the context of justifying a set of claims.
Anything goes in the generative world of discovery
If something helps you see something differently then it’s useful. If you stuff a bunch of text into Wordle and see a word really big that catches you by surprise you can go back to the texts with this different way of thinking and see why that would be the case. If you shove a bunch of text through MALLET and see some strange clumps clumping that make you think differently about the sources and go back to work with them, great. You have used the tool to spark a different way of seeing and thinking.
If you aren’t using the results of a digital tool as evidence then anything goes. More specifically, if you aren’t trying to attribute particular inferential value to a particular process that process is simply producing another artifact which you can then go about considering, exploring, probing and analyzing. I take this to be one of the key values of the idea of “deformance.” The results of a particular computational or statistical tool don’t need to be treated as facts, but instead can be used as part of an ongoing exploration. With this said, the moment you turn from exploration and theorizing to justifying an interpretation the whole game changes.
Justification is About Argument and Evidence
If you want to use something as evidence then it is really important that you can back up the quality of that evidence in supporting the specific claims you want to make. In the case of topic modeling, you need to make judgment calls about how many topics to look for, and you make the call about which texts from which sources go into the mix to generate your topics. If you want to talk about these topics as evidence to support particular inferences then you better be able to justify your reasons for those decisions, or be able to explain what you did with your data to warrant the interpretation you are forwarding. You are going to also need to explain how different decisions for different inputs could have resulted in different results. (I am mostly going off of the discussion in and around Ben Schmidt’s When you have a MALLET, everything looks like a nail.
The net result here, is that if you want to use the results of something like topic modeling as evidence you really need to have a good understanding of exactly what you can and can’t say based on how the tool produced your evidence. Importantly, there are a lot of different roads to go down when you start working with data as evidence, but in any event, you do need to be able to justify your decisions and defend against alternative explanations. Ultimately this is where validity of inferences lives. Validity is always about the quality of the inferences you draw and your ability to defend against alternative explanations.
It’s the Scientists that Realized they were Humanists
At the heart of this remains some issues around what it means to do the humanities or to do science. (Fred and I got into this a bit in our Hermeneutics of Data essay). I still hear this persistent fear of people using computational analysis in the humanities bringing about scientism, or positivism. The specter of Cliometrics haunts us. This is completely backwards.
Scientists, at least the sharp ones, have given up on their holy grail. They have given up on the null hypothesis. The sophisticated ones have realized that what they do is really just argument and evidence too. When it comes to justification time, you need to carefully build an argument grounded in evidence and defend it against alternate explanations. If you want a great recent example of this sort of argument and evidence grounded in statistics I would suggest both Nate Silver’s Simple Case for Obama as the Favorite or if you want a natural science example, read about this paper on arctic sea ice. Both are great examples of defending against different interpretations of evidence.
What you can get away with depends on what you are doing
When we separate out the the context of discovery and exploration from the context of justification we end up clarifying the terms of our conversation. There is a huge difference between “here is an interesting way of thinking about this” and “This evidence supports this claim.” Both scientists and humanists make both of these kinds of assertions. In general, I think the fear of the humanities becoming scientific is largely based on an outmoded idea on the part of humanists as to what we have come to understand happens in science. At the end of the day, both are about generating new ideas and then exploring evidence to see to what extent we can justify our interpretations over a range of other potential interpretations.