Trevor Owens

Discovery and Justification are Different: Notes on Science-ing the Humanities

Computer Scientist: “You can’t do that with Topic Modeling.”

Humanist: “No, I can because I’m not a scientist. We have this thing called Hermeneutics.”

Computer Scientist: “…”

Humanist: “No really, we get to do what we want, we read texts against each other, and then there is this hermeneutic circle grounded in intersubjectivity.”

Computer Scientist: “Ok, but you still can’t make a claim using this as evidence.”

Humanist: “I think we are going to have to agree to disagree here, I think we have different ideas about how evidence works.”

*****

While watching the tweets from the Digital Humanities Topic Modeling meeting a few weeks ago I started to feel the above dialog play out. I wasn’t there, and I am not trying to pigeonhole anyone here. I’ve seen this kind of back and forth happen in a range of different situations where humanities types start picking up and using algorithmic, computational, and statistical techniques. What of all this counts for what? What can you say based on the results of a given technique? One way to resolve this is to say that humanists and scientists should have different rules for what counts as evidence. I am increasingly feeling the need to reject this different rules approach.

I don’t think the issue here is different ways of knowing, incompatible paradigms, or anything big and lofty like that. I think the issue at the heart of this back and forth dialog is about two different contexts. This is about what you can do in the generative context of discovery vs. what you get can do in the context of justifying a set of claims.

Anything goes in the generative world of discovery
If something helps you see something differently then it’s useful. If you stuff a bunch of text into Wordle and see a word really big that catches you by surprise you can go back to the texts with this different way of thinking and see why that would be the case. If you shove a bunch of text through MALLET and see some strange clumps clumping that make you think differently about the sources and go back to work with them, great. You have used the tool to spark a different way of seeing and thinking.

If you aren’t using the results of a digital tool as evidence then anything goes. More specifically, if you aren’t trying to attribute particular inferential value to a particular process that process is simply producing another artifact which you can then go about considering, exploring, probing and analyzing. I take this to be one of the key values of the idea of “deformance.” The results of a particular computational or statistical tool don’t need to be treated as facts, but instead can be used as part of an ongoing exploration. With this said, the moment you turn from exploration and theorizing to justifying an interpretation the whole game changes.

Justification is About Argument and Evidence
If you want to use something as evidence then it is really important that you can back up the quality of that evidence in supporting the specific claims you want to make. In the case of topic modeling, you need to make judgment calls about how many topics to look for, and you make the call about which texts from which sources go into the mix to generate your topics. If you want to talk about these topics as evidence to support particular inferences then you better be able to justify your reasons for those decisions, or be able to explain what you did with your data to warrant the interpretation you are forwarding. You are going to also need to explain how different decisions for different inputs could have resulted in different results. (I am mostly going off of the discussion in and around Ben Schmidt’s When you have a MALLET, everything looks like a nail.

The net result here, is that if you want to use the results of something like topic modeling as evidence you really need to have a good understanding of exactly what you can and can’t say based on how the tool produced your evidence. Importantly, there are a lot of different roads to go down when you start working with data as evidence, but in any event, you do need to be able to justify your decisions and defend against alternative explanations. Ultimately this is where validity of inferences lives. Validity is always about the quality of the inferences you draw and your ability to defend against alternative explanations.

It’s the Scientists that Realized they were Humanists
At the heart of this remains some issues around what it means to do the humanities or to do science. (Fred and I got into this a bit in our Hermeneutics of Data essay). I still hear this persistent fear of people using computational analysis in the humanities bringing about scientism, or positivism. The specter of Cliometrics haunts us. This is completely backwards.

Scientists, at least the sharp ones, have given up on their holy grail. They have given up on the null hypothesis. The sophisticated ones have realized that what they do is really just argument and evidence too. When it comes to justification time, you need to carefully build an argument grounded in evidence and defend it against alternate explanations. If you want a great recent example of this sort of argument and evidence grounded in statistics I would suggest both Nate Silver’s Simple Case for Obama as the Favorite or if you want a natural science example, read about this paper on arctic sea ice. Both are great examples of defending against different interpretations of evidence.

What you can get away with depends on what you are doing

When we separate out the the context of discovery and exploration from the context of justification we end up clarifying the terms of our conversation. There is a huge difference between “here is an interesting way of thinking about this” and “This evidence supports this claim.” Both scientists and humanists make both of these kinds of assertions. In general, I think the fear of the humanities becoming scientific is largely based on an outmoded idea on the part of humanists as to what we have come to understand happens in science. At the end of the day, both are about generating new ideas and then exploring evidence to see to what extent we can justify our interpretations over a range of other potential interpretations.

Digital Tools, History

Published by

tjowens

Responses

Thomas Padilla

November 19, 2012 at 6:50 pm

Thanks for this post Trevor – I was waiting for someone to key in on this topic! I mentioned it briefly in my recap, but this articulation is light years ahead of mine.

I think a significant part of the tension (in these conversations generally), can be located in the desire to move from discovery to justification – and the anxiety attendant to learning how to make that shift. The blackbox-like quality of some of these approaches can be pretty difficult to dispel. We were lucky at the Topic Modeling Workshop to have really great presentations/conversations with Blei, Mimno, and Boyd-Graber.

It is clear that humanities can and does benefit from incorporation of scientific approaches, yet Ive struggled to articulate what value humanities offer to our colleagues in the sciences. Id be interested to hear your thoughts.

LikeLike

Reply
tjowens

November 19, 2012 at 10:35 pm

Glad you liked the post Thomas. The black box thing is a big issue and it is going to only get bigger in both the sciences and the humanities as more of our work happens in silico.

As far as what the humanities offers, that is likely a whole different post. I would just briefly mention that there is some great work in this regard that happens under the banner of Mixed Methods research. For example, Jennifer Green’s books, <a href=”http://www.amazon.com/Methods-Social-Inquiry-Research-Sciences/dp/0787983829″” rel=”nofollow”>Mixed Methods and Social Inquiry does a very nice job in working through how and why one would want to bring together qualitative and quantitative approaches to study the social world.

LikeLike

Reply
Ted Underwood

November 20, 2012 at 11:31 am

Great post, Trevor. I thoroughly agree. An enormous amount of confusion could be short-circuited if we skipped “quantitative or qualitative?” (which rarely clarifies much) and instead asked “is this for discovery or justification?”

Part of the problem, though, is that humanists haven’t traditionally distinguished these stages of research. So your apparently modest proposal may not actually be so modest.

LikeLike

Reply
Natalia

November 20, 2012 at 8:05 pm

This could just be a feature of whom I happen to follow and the selective (in)attention with which I eyed my tweetstream during the workshop, but I felt as though the dialogue you’ve ventriloquized here was presented in reverse–i.e., it was the humanists who were wary of using topic modeling to make an argument, not the computer scientists.

And I agree with Thomas’s sense that this has to do with the black box nature of using a tool like MALLET with only the barest sense of what it’s doing, which is how most humanists (me included) would approach it. I, and some of the presenters that I saw, felt very comfortable using topic modeling for discovery. We might not quite understand why we’re getting the results we’re getting, but they just leads us back to specific texts, and once we’re reading specific texts with our own eyeballs, we pretty much feel like we know what we’re doing again.

I feel like I’d want to understand topic modeling inside and out before I would ever attempt to use the topic model itself to make an argument, though. Not just the math, but a convincing theory of what the math has to do with the structure of (English) language, which I’m not altogether sure is something that has been nailed down by anyone. “It works” is not persuasive to me; I need to understand why; I need to understand what it is, exactly, that’s “working.” And that depth of understanding (that is, the understanding I would need to attempt justification) would involve an enormous learning curve for me, to which I’d be loath to devote my precious research time if topic modeling already does a pretty good job of getting me to discovery without it, and if I already have good humanistic methods for using that discovery phase for something else.

LikeLike

Reply
1. Ted Underwood
  
  November 27, 2012 at 12:40 pm
  
  I think Natalia’s observations here are shrewd. It’s partly a triage problem. If X works for discovery, but it would take a long time to justify it as evidence — then just use it for discovery! you can find other evidence easily enough.
  
  But for the record, there is a sort of straightforward theory explaining why topic modeling works. Crudely, it’s this:
  
  http://en.wikipedia.org/wiki/Distributional_semantics#Distributional_Hypothesis
  
  There are some problems with that as a theory of semantics, which is why I prefer the looser word “discourse” instead of “topic.” But basically it boils down to the same thing. It’s just … things that occur in the same contexts gotta have something in common.
  
  LikeLike
  
  Reply
SDL

November 21, 2012 at 6:54 pm

Wow. I’ve been trying to put these thoughts into words for a year now. Thanks for doing it for me! Excellent post, excellent clarification, excellent framework for discussing data and tools.

LikeLike

Reply
Deformative Digital Archaeology « Electric Archaeology

November 22, 2012 at 10:42 am

[…] then, digital archaeologists are digital humanists too? Trevor Owens has a recent post that sheds useful light on the matter. Trevor draws attention to the purpose behind one’s use […]

LikeLike

Reply
Andrew Piper

November 22, 2012 at 11:48 am

Fantastic and very helpful piece. It seems worth noting that there is a useful bibliographic history to this problem. Textual interpretation could go equally astray when the interpreter did not understand the conditions of how the bibliographic text was produced either.

The most famous case is D.F. McKenzie’s critique in “Making Meaning” of Wimsatt and Beardsley’s famed “Intentional Fallacy” piece. Reading a paperback reprint led them to make claims that did not align with the material record of the text’s past. In my own field, the Goethe scholar Erhard Bahr finally had to write an essay begging people (especially students) not to cite the Hamburger Ausgabe if they were talking about The Sorrows of Young Werther — because it only contained the second edition which was significantly different from the first and claims about the one were not necessarily transferable to claims about the other.

The history of literary interpretation is deeply tied to the knowledge of technological reproduction, something that was largely forgotten during the days of new criticism and its various theoretical aftermaths to which McKenzie and his material text followers were responding. The rise of bibliography as a modern discipline seems to have led to the rise of an immaterial literary criticism.

Understanding the digital conditions of making textual artefacts will be no less important, but this is by no means a novel problem.

LikeLike

Reply
Lev Manovich

November 26, 2012 at 5:27 pm

Great post!

we need to educate people in digital humanities about two basic approaches of contemporary statistics: exploratory vs. explanation:

http://en.wikipedia.org/wiki/Exploratory_data_analysis

I think what you are describing is the same thing as exploratory data analysis, or at least close:

“”In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics in easy-to-understand form, often with visual graphs, without using a statistical model or having formulated a hypothesis. Exploratory data analysis was promoted by John Tukey to encourage statisticians visually to examine their data sets, to formulate hypotheses that could be tested on new data-sets.”

This is also very interesting and relevant:

“Tukey’s championing of EDA encouraged the development of statistical computing packages, especially S at Bell Labs. The S programming language inspired the systems ‘S’-PLUS and R. This family of statistical-computing environments featured vastly improved dynamic visualization capabilities, which allowed statisticians to identify outliers, trends and patterns in data that merited further study.”

In my lab we have working on developing visualization techniques for exploring large image and video collections, and I always try to stress the idea of “exploratory” in my lectures.

LikeLike

Reply
Reading’s Black Boxes

November 27, 2012 at 10:05 am

[…] again the history of books can provide a useful framework. As Trevor Owens writes in a great recent blog post, as humanists gradually begin to adopt more computational analytical tools there are risks in not […]

LikeLike

Reply
Evaluating Digital Work in the Humanities « Electric Archaeology

December 2, 2012 at 7:32 pm

[…] though likely more, for creating typologies of DH work. The first – let’s call it the Owens dimension, in honour of Trevor’s post on the matter- extends along a continuum we could call ‘purpose’, from ‘discovery’ through […]

LikeLike

Reply
Literary History, the Future: Kemp Malone, Corpus Linguistics, Digital Archaeology, and Cultural Evolution | Replicated Typo

December 17, 2012 at 12:32 pm

[…] Natalia Cecire recently remarked, it’s one thing to use topic modeling as a tool for discovery, where you then read specific texts […]

LikeLike

Reply
John Laudun

December 17, 2012 at 1:31 pm

Nice post. Jonathon Goodwin and I are facing this exact same fork in our own attempts to understand a paradigm shift in the small humanities field of folklore studies. We’re working with topic models produced by Mallet and with a limited set of citational information, but it’s very interesting to try to describe/discern the move from discovery to inference.

I have to say I am particularly fond of the phrase “generative world/context of discovery.” That will appear in our work somewhere, of that I am sure.

LikeLike

Reply
Topic Modeling: New Software and a Wrap-up of our NEH-Sponsored Workshop | Maryland Institute for Technology in the Humanities

December 18, 2012 at 3:54 pm

[…] Some questions by Trevor Owens (who was not at the workshop, although many of the commenters on this post were) […]

LikeLike

Reply
Sapping Attention: Keeping the words in Topic Models

January 10, 2013 at 12:30 pm

[…] than more machinely learned ones. *I spun around a post for a while trying to respond to Trevor Owens' post about the binary of "justification" and "discovery" by saying that really only justification […]

LikeLike

Reply
Digital Humanities & Cultural Heritage, or, The Opposite of Argumentation ← dh+lib

January 22, 2013 at 12:31 pm

[…] Sample’s declaration of “an insurgent humanities.” Trevor Owens, in his cogent post about the tweets from #dhtopic, captured something of this seeming divide, noting the assumed disjunction between the exploratory […]

LikeLike

Reply
Keeping the words in Topic Models : Global Perspectives on Digital History

January 22, 2013 at 3:18 pm

[…] spun around a post for a while trying to respond to Trevor Owens’ post about the binary of ”justification” and “discovery” by saying that really only […]

LikeLike

Reply
learning to read. again. | fredgibbs

January 31, 2013 at 4:24 pm

[…] should know by now, looking at visualizations of texts is a form of exploring and should be taken not as analysis, but exploration. One might respond, then, by saying that a document matrix on dendrogram that shows word […]

LikeLike

Reply
Learning by Doing: Labs and Pedagogy in the Digital Humanities : Global Perspectives on Digital History

February 13, 2013 at 10:14 am

[…] and the computing process itself. Labs give digital humanists a science-y legitimation that, whether we admit it or not, we find appealing. Labs aren’t necessary for doing digital humanities research, but in terms of […]

LikeLike

Reply
Text Analysis of 2012 Digital Humanities Job Adverts part 2 « Electric Archaeology

February 28, 2013 at 2:43 pm

[…] and ‘United States’), but this is only meant to be rough and ready, ‘generative‘, as it were (and note also that a network visualization is not necessary for the analysis. […]

LikeLike

Reply
Approaching Digital History | Michael J. Kramer

April 21, 2014 at 2:11 pm

[…] Trevor Owens, “Discovery and Justification are Different: Notes on Science-ing the Humanities,” 19 November 2012, http://www.trevorowens.org/2012/11/discovery-and-justification-are-different-notes-on-sciencing-the-…/. […]

LikeLike

Reply
Proxem » La lettre du 27 octobre : en finir avec la surcharge informationnelle des mails ?

October 27, 2014 at 9:03 am

[…] des possibilités techniques des nouveaux outils, et la culture de certains humanistes qui ne s’embarrasseraient pas trop de preuves empiriques. Enfin, Michael Jordan, spécialiste du machine learning, a confié à IEEE […]

LikeLike

Reply
Resource: Text Data from the Archive | Digital Humanities Now

April 16, 2015 at 1:00 pm

[…] be highly accurate, but I’d argue that “just good enough” is all you need to begin exploring your […]

LikeLike

Reply
Approaching Digital History 2.0 | Michael J. Kramer

September 21, 2015 at 3:48 pm

[…] Trevor Owens, “Discovery and Justification are Different: Notes on Science-ing the Humanities,” 19 November 2012, http://www.trevorowens.org/2012/11/discovery-and-justification-are-different-notes-on-sciencing-the-… […]

LikeLike

Reply
Designing Deformance – CONVERGE

May 1, 2017 at 2:14 am

[…] Owens, T. 2012, ‘Discovery and Justification are Different: Notes on Science-ing the Humanities,’ http://www.trevorowens.org/2012/11/discovery-and-justification-are-different-notes-on-sciencing-the-… […]

LikeLike

Reply
Syllabus: Approaching Digital Humanities, Winter Quarter 2018 – Michael J. Kramer

June 15, 2018 at 4:16 pm

[…] Trevor Owens, “Discovery and Justification are Different: Notes on Science-ing the Humanities,” … […]

LikeLike

Reply
Syllabus: Approaching Digital Humanities, Spring 2019 – Michael J. Kramer

October 8, 2020 at 11:14 pm

[…] Trevor Owens, “Discovery and Justification are Different: Notes on Science-ing the Humanities,”&… […]

LikeLike

Reply
Syllabus—Approaching Digital History – Michael J. Kramer

March 27, 2023 at 3:40 pm

[…] Trevor Owens, “Discovery and Justification are Different: Notes on Science-ing the Humanities,… […]

LikeLike

Reply