Discovery and Justification are Different: Notes on Science-ing the Humanities

Computer Scientist: “You can’t do that with Topic Modeling.”

Humanist: “No, I can because I’m not a scientist. We have this thing called Hermeneutics.”

Computer Scientist: “…”

Humanist: “No really, we get to do what we want, we read texts against each other, and then there is this hermeneutic circle grounded in intersubjectivity.”

Computer Scientist: “Ok, but you still can’t make a claim using this as evidence.”

Humanist: “I think we are going to have to agree to disagree here, I think we have different ideas about how evidence works.”


While watching the tweets from the Digital Humanities Topic Modeling meeting a few weeks ago I started to feel the above dialog play out. I wasn’t there, and I am not trying to pigeonhole anyone here. I’ve seen this kind of back and forth happen in a range of different situations where humanities types start picking up and using algorithmic, computational, and statistical techniques. What of all this counts for what? What can you say based on the results of a given technique? One way to resolve this is to say that humanists and scientists should have different rules for what counts as evidence. I am increasingly feeling the need to reject this different rules approach.

I don’t think the issue here is different ways of knowing, incompatible paradigms, or anything big and lofty like that. I think the issue at the heart of this back and forth dialog is about two different contexts. This is about what you can do in the generative context of discovery vs. what you get can do in the context of justifying  a set of claims.

Anything goes in the generative world of discovery
If something helps you see something differently then it’s useful. If you stuff a bunch of text into Wordle and see a word really big that catches you by surprise you can go back to the texts with this different way of thinking and see why that would be the case. If you shove a bunch of text through MALLET and see some strange clumps clumping that make you think differently about the sources and go back to work with them, great. You have used the tool to spark a different way of seeing and thinking.

If you aren’t using the results of a digital tool as evidence then anything goes. More specifically, if you aren’t trying to attribute particular inferential value to a particular process that process is simply producing another artifact which you can then go about considering, exploring, probing and analyzing.  I take this to be one of the key values of the idea of “deformance.” The results of a particular computational or statistical tool don’t need to be treated as facts, but instead can be used as part of an ongoing exploration. With this said, the moment you turn from exploration and theorizing to justifying an interpretation the whole game changes.

Justification is About Argument and Evidence
If you want to use something as evidence then it is really important that you can back up the quality of that evidence in supporting the specific claims you want to make. In the case of topic modeling, you need to make judgment calls about how many topics to look for, and you make the call about which texts from which sources go into the mix to generate your topics. If you want to talk about these topics as evidence to support particular inferences then you better be able to justify your reasons for those decisions, or be able to explain what you did with your data to warrant the interpretation you are forwarding. You are going to also need to explain how different decisions for different inputs could have resulted in different results. (I am mostly going off of the discussion in and around Ben Schmidt’s When you have a MALLET, everything looks like a nail.

The net result here, is that if you want to use the results of something like topic modeling as evidence you really need to have a good understanding of exactly what you can and can’t say based on how the tool produced your evidence. Importantly, there are a lot of different roads to go down when you start working with data as evidence, but in any event, you do need to be able to justify your decisions and defend against alternative explanations. Ultimately  this is where validity of inferences lives. Validity is always about the quality of the inferences you draw and your ability to defend against alternative explanations.

It’s the Scientists that Realized they were Humanists
At the heart of this remains some issues around what it means to do the humanities or to do science. (Fred and I got into this a bit in our Hermeneutics of Data essay).  I still hear this persistent fear of people using computational analysis in the humanities bringing about scientism, or positivism. The specter of Cliometrics haunts us. This is completely backwards.

Scientists, at least the sharp ones, have given up on their holy grail. They have given up on the null hypothesis. The sophisticated ones have realized that what they do is really just argument and evidence too. When it comes to justification time, you need to carefully build an argument grounded in evidence and defend it against alternate explanations. If you want a great recent example of this sort of argument and evidence grounded in statistics I would suggest both Nate Silver’s Simple Case for Obama as the Favorite or if you want a natural science example, read about this paper on arctic sea ice. Both are great examples of defending against different interpretations of evidence.

What you can get away with depends on what you are doing

When we separate out the the context of discovery and exploration from the context of justification we end up clarifying the terms of our conversation. There is a huge difference between “here is an interesting way of thinking about this” and “This evidence supports this claim.” Both scientists and humanists make both of these kinds of assertions. In general, I think the fear of the humanities becoming scientific is largely based on an outmoded idea on the part of humanists as to what we have come to understand happens in science. At the end of the day, both are about generating new ideas and then exploring evidence to see to what extent we can justify our interpretations over a range of other potential interpretations.

Apparently When Girls Adopt Technology it Ceases to be Technology

I was excited to read Geek Masculinity and the Myth of the Fake Geek Girl. I saw the image macro at the top, and thought, “neat, another image macro like successful black man that turns stereotypes on their head.” Sadly, this is not the origin of what I came to find is called “Idiot Nerd Girl.”

Reviewing the Idiot Nerd Girl images is a little bit painful. Just another reminder of how far we all have to go. As I’ve suggested before, I think everybody gets to chose if they are a misogynist or a feminist and clearly these are produced by misogynists.

Setting that aside, I saw this one and just felt compelled to dig into one particular genre of these images. The one’s that define what gamers are and are not.

For reference  The Sims is the most successful video game. Ever. Of all time. Do you know why it is successful? There are several reasons. First, it’s amazing. The Sims is, by almost all accounts, an innovative and engaging game. Will Wright has described it as his greatest achievement. The Sims also succeed where so many games have failed. There are a lot of women who like to play The Sims. Now, there are women who like any and all games. However, in the case of The Sims, there were a lot of women who liked to play it. Importantly, it’s not that men don’t like playing The Sims. There are a lot of men playing The Sims. So it isn’t a game for girls or a game for women it is more accurately a game that is largely gender neutral in terms of audience.

So a girl likes to play The Sims. This apparently means it’s no longer a game, and she isn’t a gamer. Why? I bet there are a million reasons, (it’s not hard enough, or it’s not competitive, etc.) and I know all of them are trash. The Sims doesn’t count as a game (the logic that makes this image work) because a lot of women like it. That’s it. When girls take to technology in many men’s eyes that technology simply ceases to be technology. That’s the case now at least. It wasn’t always that way.

Science is for Girls and Classics is for Boys

Wait, I got that backwards. Right? We need to get more women involved in science! Yes, we do. But there was a time when this was all reversed. The same arguments folks use to support the idea that girls can’t do science were previously used to argue that they couldn’t cut it in classics.

In The Science Education of American Girls: A Historical Perspective historian of education, Kim Tooley, documents  “the structural and cultural obstacles that emerged to transform what, in the early nineteenth century, was regarded as a “girl’s subject” into something that became defined as innately masculine. It is a great book. I highly recommend it. The essential point here is that all the reasons for why something is for girls and something else is for boys are basically meaningless.  Science was for girls until girls until it had social capitol. At that point, science had always inherently been something that boys were good at. (I’m being a little hyperbolic, but I think the point generally stands).

Idiot Nerd Girl is an Ideology

She is just the most current in a history. Hegemonic masculinity defines computing, defines science, defines whatever, as the things that women aren’t interested in doing. When women become interested in something, that thing either no longer counts (in a situation like The Sims) or the girls are just “pretending” and don’t actually get it. Blergh…

This kind of thing is often just below the surface. It is just so striking that each Idiot Nerd Girl image is such a clear textbook case of the contradictions on display. The meme makers are so unaware that they wear their contradictions on their sleeves.


how it works

Thankfully, at this point, it looks like Idiot Nerd Girl is being widely reclaimed.

Do Less More Often: An Approach to Digital Strategy for Cultural Heritage Orgs

Everybody is trying to do too much at once. Find the low hanging fruit and pick it. Get the boxes off the floor. Release early and release often. Put things out there and find out how you should be doing things. I think this idea cuts across all parts of digital cultural heritage work. Everything from, collecting, processing, arranging, preserving, making available, and exhibiting can be re-framed in this mindset. This was the primary sentiment I put forward in my Keynote talk at the Connecticut Digital Initiatives Forum. At some point I might sit down and write this out, but I figured I would share it here.

Also, here are the slides in case you would prefer to see the presentation instead of sitting through my yammering.

I went up to talk viewshare, but was then also delighted/dismayed to be asked to give the Keynote.  I think it went well, and  I was apparently on TV across the great state of Connecticut.

Born Digital Primary Sources for History: A Partial List

Historians refer to records and artifacts that record or register traces of the past on them primary sources. For a very long time, those sources have been analog things. Physical objects and artifacts made up of atoms. The artifacts historians tend to work with (letters, photographs, diaries, notebooks, newspapers, blueprints, etc) are increasingly being replaced in our lives by digital things (bits encoded on various storage media). I often find people working with historical sources lack an expansive imagination of how diverse the universe of born digital primary sources are.

So I thought it might be useful to start enumerating some examples of the broad array of things that fall into the category of born digital primary sources. I’ve really enjoyed Ian Bogost’s lists in Alien Phenomology. He has this great riff about how important lists are in Bruno Latour’s work. Lists can do a great job at communicating the diversity that exists within a category of objects.

So here we go, born digital primary sources for history include but are not limited to:

  • photos on flickr
  • presidential emails
  • lolcats
  • the stuxnet virus
  • COBOL, Java, and Python
  • sensor data
  • a ROM of Super Mario Brothers
  • the source code of Ninja Gaiden 2
  • images collected by the curiosity rover
  • the software on the curiosity rover
  • instagram’s interface
  • a digital image of the declaration of independence
  • yelp reviews of the Statue of Liberty
  • punch cards from the 1890 census
  • the plug board of an enigma machine
  • Windows 95
  • The Google homepage as it appeared on February 21st, 2002 at noon GMT
  • The drudge report
  • Amazon’s recommendation engine
  • Git, Github & Github’s blog
  • Benoit Mandelbrot’s 8-inch floppy disks
  • MARC records

What would you add?