Deforming reality with Word Lens

If you haven’t checked it out already Wordlens is an amazingly cool iPhone app that will automatically translate text on the fly, as you see it.

I’ve had it on my phone for about a month now, but I find that the things it messes up are far more interesting than the things it gets right. Messes up is really the wrong term here. The best parts of wordlens happen when you point it at things you arn’t supposed to point it at or that arn’t in the language you are supposed to be translating.

When you hold it up and pan around your environment it is like the software is uncovering the hidden meanings in your environment. For example, I pointed it at some of the congressional buildings on my walk home and was told that “NEICAH” was apparently “IN”. 
It is a jarring experience to walk around and see these words keep poping up, as if they emerge out of the environment. After using it for a bit you get a handle for what kinds of things you can trick it into thinking are text.

You want to have some clear horizontal lines, but beyond that you want a visual space with some clear visual breaks in it. For example, a flower bed worked great. I couldn’t help thinking that it would be really neat if they would create some explicit vocabulary packs that were focused on this off purpose use. If instead of simply translating text the Wordlens developers gave us a few more fun ways to try and deform and uncover hidden meaning and jokes in reality.

Eventually, I went out and bought the Spanish to English pack. I wanted to see what kind of things it would see when it was working off a English vocabulary. That is when I realized that the Wordlens developers had already given us everything we need. Just flip it on to try and turn Spanish into English and refuse to show it any Spanish and you have your self something between a decoder ring and a reading machine that you can turn to deform any text or potential text for fun and profit. Ok, no profit, but lots of fun. Possibly insight. You can see some of the results of that in the gallery. I most enjoyed what happened when I turned it on some of my books. The following examples are Wordlens attempting to translate books with English language titles from Spanish into English.

Wordlens can be Snarky and Potentially Insightful

I thought some of these were rather funny. When exposed to the Spanish to English filter Debates in the Digital Humanities became “DEBATES IN THE OR DIGITAL ROYALTY.” Something that is particularly humorous given discussion of the digital humanities cool kids table. It felt a little bit like Mark’s “Hacking the Accident” moment. The machine is mangling the text and that deformed text provokes thought and consideration.

Observing the User Experience became Observing Was User experience. Which is in fact totally true. Observers are themselves users observing other users.

Wordlens seems to disagree with Latour’s Actor Network Theory, which it calls “THE Actor-N ERRORS THEORY.” Or I suppose this might actually be a totally different book, one written by Bruno Brassr called “Reassembling Read Social” in which we are introduced to the brand new Actor-NERRORS Theory.

I have saved the best for last. In what seemed particularly topical, Steve Ramsey’s Reading Machines becomes Reading Machetes. Even better, when we flip to the back of the book we learn that it is part of the Mythical Theory Reiterate Studies series. Based off his “CREEP” essay “Toward an Algorithmic Criticism.” From there I think I lose it a bit. Something about his “Thai Wrath.” With that said, I love that literary computing becomes Liberary computing which I assume is a mixture of liberation and library. Importantly, the back of Reading Machetes mentions the GNU operating system, liberary computing at its best.  It is also apparently “Trying” to “Shame” other scholars for their “LETHARGY” Criticism. Ha!

Here is a gallery of a few more images:

 

Studying Discourse Online is Studying Designed Experience

Young people participating in fan fiction forums are learning English as a second language. People arguing about Preist tallents in the World of Warcraft forms are participating in informal science learning and reasoning. Hip hop discourse in online forums can help us engineer financial literacy into learning environments. Folks participating in forums for RPG Maker are learning to take and give criticism. Everywhere you look researchers are studying discourse online, but we don’t necessarily know that much about how that discourse is shaped by the people that build and administrate the software that enables that discourse. As I’ve mentioned, this is the subject for a research project I am working on, I wanted to take a moment to share a few early examples and ideas I have on how this might be working.

Discourse on the Web is a Result of Designed Experience

For starters, discussions on the web are the result of designed experience, you shouldn’t study them without taking into account the functionality of the software that enables them. The designers and administrators of those spaces have set them up to enable particular kinds of communication and to ensure that other kinds of interaction do not occur.

For example, here is how Derek Powazek explained the role of software tools in Design for Community: the art of connecting real people in virtual places:

This is all about power. Giving your users tools to communicate is giving them the power. But we’re not talking about all the tools they could possibly want. We’re talking about carefully crafted experiences, conservatively proportioned for maximum impact. ( Powazek, xxii)

So How Do Forum Designers and Administrators Shape Discourse?

So, what do the folks who manage, run, and build web forums think about their end users? Further, how do their theories about the goals, motivations, and desires of those users shape the way that they enable them to interact with each other. One of the places I am looking for answers to these questions is in guidebooks for web forum administrators. I should give a more full rundown of what books I am looking at, but I thought it would be fun to share some of the kinds of examples I have found of how the books are talking about users and the resulting implications for design that they suggest. I am still just at the beginning of this research project, but I wanted to share some of these examples for comment. The following are a few preliminary examples. I will share more examples as they show up, but wanted to put these out there for anyone to react to.

Explicit Public Rules

The most obvious way that community managers influence the content which people share on these sites is through enforcing explicit rules. Practically all of the books in this genre I have read so far explain the importance of having and enforcing these kinds of explicit rules. Here Patrick O’Keefe explains the importance of rules:

Respect is the cornerstone of a good environment. You create a respectful community by requiring that everyone treat everyone else with the respect they deserve. You do this by having written policies and by actively enforcing those policies. (O’Keefe, 219)

Using Design to Filter Who Participates

In  primary lessons for design is to “bury the post button.” He suggests the more effort that is required to get to the point where someone can post a comment will result in higher quality discussion.

Why would this be? because, in this case, the multiple clicks it takes to read the whole story are actually acting as a great screening mechanism. Users who are looking for trouble or aren’t really engaged in your content will be put off by the distance. They’ll drift away. But the users who are engaged by the content and interested in the results of the conversation will stick with it.(53)

In Community Building Secret Strategies for Successful Online Communities on the Web Amy Jo Kim gives very similar advice:

“What you want to do is create appropriate hurdles for member contributions, particularly those that extend the public space within your community…It’s up to you to figure out the restrictions that best meet the needs of your members and support the kind of community you are trying to create. (Kim, 71)

Aside from any explicit rules designers of these community spaces are using design as a filter. It is a kind of soft power that shapes the way that we interact with each other online and anyone studying interactions online should think about how the design of the space might be acting as filter

Tricking users and distorting reality

Explaining that “Creativity never hurts when you’re trying to get major league idiots off your community.” O’Keefe provides a few creative ideas.

Sometimes referred to as global ignore, you can incorporate a function that lets the banned user log in but then makes this user go unseen to all users of your community. The banned user cannot receive private messages, and if he tries to send them, they don’t reach the intended users. He can still make his posts, but only he ( and maybe you and your staff) can see the posts– no one else. Basically, in his eyes, the site works as is intended. He will just think that everyone is ignoring him and go away. (O’Keefe, 215)

In this case, an administrator can let a user think they are participating in the conversation when no one else can see what they are saying. Worse than being silenced, the user still thinks they are part of the conversation.

In short, the designed experience of web community spaces is not something that can be read in any straightforward fashion. At the very least, to say something about a community you need to understand the explicit guidelines and rules. But beyond this, without understanding the intentions and tactics of developers and administrators it is going to be difficult to know how exactly they are implicitly shaping the structure and nature of the discourse. It’s my intention to try and work through this relationship between designers, administrators and users in my project.

What are some other examples of ways designers and administrators shape discourse online?

User Stories as a Genre of Digital Humanities Scholarship

There has been a good bit of discussion about how building things can be thought of as a hermeneutic process. Building things can be the crux of a methodology for at least part of this thing we are calling the digital humanities. The more I have thought about this the more I have started to wonder if there might be another piece of the software development process that could find an even more natural fit with existing practices in humanities scholarship. Specifically, if there might be hybrid forms of writing, something between software documentation and scholarly articles that might serve as the basis for formalizing building as scholarship. Each of the design deliverables that Dan Brown discusses in Communicating Design could serve as the basis for a new mode of scholarly communication. The idea here is that the writing involved in the production and use of software and tools for creating knowledge could serve as the raw materials for scholarship.

Personas and User Stories
User stories and personas strike me as potential forms of software development writing that could be bent into humanities writing. Most user-centered design approaches start with creating personas for users. That is, coming up with what it is that someone wants to do and their background and experiences that need to be taken into account to design something that will let them do what they want to do. User stories are very similar, in this case generally explaining how a particular tool helps a user accomplish their goals. For example, these are some of the Zotero student and faculty user stories I would share with people who were interested in training Zotero users.

How Personas and User Stories Could Become Methodological Scholarship
Most user stories and personas focus on how someone who wants to buy a book on amazon, or how accomplish some other clearly defined task. I am not saying that we should think of these as scholarship. However, in the case of building software for humanities scholars the goal is generaly not a simple discrete task. We are trying to create tools and interfaces that help scholars produce insight and knowledge. That means that a) it is far more difficult to define success and b) the possibilities for deep thought, explorations of context, and considerations of the nature of knowledge production come into the mix. In short, when the goal of a particular software tool is to facilitate the production of knowledge there is good reason to believe that the kinds of thinking that go into using and designing that tool could be good fodder for a kind of scholarly writing and communication. This is partly what Fred Gibbs and I are trying to get at in our feeling that we need to write a lot more about methods in Towards a Hermunitics of Data.

Extent Software Stories are Just as Useful
This isn’t really just about building software, it is also something that we need more of for even just using off the shelf software. User stories have the posibility of becoming the methodological texts of the digital humanities. What did you do, or what do you want to let someone do? What will doing that let you know? This is already happening, for example Cameron Blevins’s Topic Modeling Martha Ballard’s Diary and Rob Nelson’s Mining the Dispatch are personal narratives of research methods. In a less technicaly intense example it is the same kind of thing I tried to do in Mining old News for New Historical Insight. These are simultaneously necessary for establishing the validity of any claims we ultimately make in our research, but they are also essential as a kind of new research methods literature. This kind of work is particularly important, because as Ian Bogost suggests, “technologies themselves make tacit, low-level assumptions that can’t be seen in the light of day.”

I would love there to be a place to put this stuff and find it
My links to examples go all over the web, software documentation, out to blogs, etc. These are great places for us to put these things, but there is a part of me that wishes that we could pool together a bit and aggregate this kind of writing about use, method, interpretation, with tools. Furthermore, it would be great if there was more review and dialog about this kind of writing. I think we are still in the infancy for what hybrid forms of the writing involved in software development and scholarly writing could become.

So what do you think? Should we start thinking about the writing involved in the creation of software as the same kind of  hermeneutic process full of the deep thinking about meaning, context, and interpretation that we put in with scholarly writing writ-large? If so, how do we get from where we are to what these hybrid forms might look like?

The Digital Humanities Are Already on Kickstarter

I have been talking with a lot of historians, librarians, archivists and curators about the possibility of using Kickstarter to fund digital humanities and digital library, archive, and museum projects. If you are unfamiliar, Kickstarter is a site and tool that anyone can use to fundraise for creative projects.

The Open Utopia project is a great example of a successful DH project on Kickstarter

In several of my conversations with humanists about Kickstarter I have heard back, “but isn’t Kickstarter a place for art projects, not for humanities projects”. The answer to that question is no. Kickstarter is a place for creative projects, specifically, discrete projects in which something is made. For folks on the DIY side of the digital humanities, an attitude frequently on display at events like THATCamp, this is not a problem. If you want to make things then Kickstarter is a great tool.

Best of all, we don’t need to even think about what digital humanities projects on Kickstarter would look like. They are already there. I took the liberty of putting together a short list of projects that i think fit squarely in the areas that I have seen people at previous THATCamps working in.

7 successful Digital Humanities-ish Kickstarter Projects

  1. The Teaching Teachers to Teach Vonnegut project from the Kurt Vonnegut Memorial Library, raised 2,200$ to create and host a free workshop for Indiana high school teachers interested in incorporating the writings of Kurt Vonnegut in their curriculum. They even used as matching funds for a NEH grant.
  2. The Open Utopia: A New Kind of Old Book raised more than 4,000$ to create an open-source, open-access, multi-platform, web-based edition of Thomas More’s Utopia.
  3. </archive> raised more than 900$ to create  an open archive of urban experience built from the street. Using unique QR code tags collaborators can make their personal experiences of the city accessible in physical space.
  4. Open Goldberg Variations – Setting Bach Free raised more than 20,000$ to create a new score and studio recording of J.S. Bach’s Goldberg Variations placed in the public domain.
  5. The Nature of Code Book Project raised over 31,000$ to write and self publish a book on “the unpredictable evolutionary and emergent properties of nature in software.”
  6. Kevin Ballestrini, a classics professor, has raised more than 2,000$ to create an educational card game.
  7. Smarthistory raised more than 11,000$ to create a slate of educational videos for it’s art history website.

The moral of the story here is that Kickstarter is not something that could be useful for funding digital humanities projects, Kickstarter is already something that is useful for funding digital humanities projects.

Importantly, Kickstarter is not a magic button that prints Internet money. If you do decide to use it to raise some funds you should go out and read from the copious amounts of advice on successful Kickstarter campaigns. (See for example this, or this, or this, or this)

If you have project ideas that you want to share and workshop consider posting them in the comments for feedback from other digital humanists.

When did we become users?

We live in an era of user experience of user centered design. We have a range of usernames for everything from Facebook to our banking websites. We tacitly sign End-user License Agreements as we click our way around the web. We know what to do because we read User Guides to figure out how to get our software to do what we want.

In short, we are all users.

The user has become such a central way of being that scholars are now reading the idea of the user into the past. In How Users Matter you can read about the users of everything from the Model-T, to Vaccines, to electric razors, to Minimoogs, to contraceptives.

The idea of the user as a way of being is so omnipresent that it is easy to forget that the idea of us as users has a history.

There must, in fact, be a historical moment at which we became users.

So when did we become users?

I don’t have an answer here. I’ve screwed around (hermeneutically) with a few online historical datasets and I would like to invite you (the user) to help interpret, consider and suggest next steps.

Asking a question to a graph

For starters I figured I would see how our various names have fared in the books of the 20th century. Below you can see a chart of the terms user, producer, consumer, and customer as they appear in the corpus the culturomics folks have given us to play with in Google n-gram. I am not a statistician. I will be the first to admit that I do not completely grok the details of their FAQ and supplemental documents. With that said, the naive interpretation of this graph shows the term user beating out producer and consumer in our lexicon in the lat 60s and beating out consumer in the early 80s. Does this tell us anything interesting? Despite all the limitations that come from this sort of data, are there any claims that this at least suggests to you? Are there other terms you think should be included in this? Please link any interesting related n-grams you generate in the comments.

user, producer, consumer, customer in google n-gram

Here is, more or less, the same trending line for user in the Time magazine corpus.

Chart of "user" in Time Mag

Colocating the User

Oh numbers, how you mislead! I can’t forget the drug users.

Thankfully, the really neat thing about Mark Davies corpora is that he lets you dig in and see what words are collocated within a specified number of words of the term you are searching for.

For example, when I search for user in the Time Magazine corpus I can find that “Drug” appears within 4 words of user 32 times. Beyond that, we can see which decades those locates happen in.  Below are the collocates for nouns within 4 words of the word user.  Beyond this we also find a bunch of other cool stuff. Again, as I am far from confident in making assertions about the implications of this kind of data, so I thought I would share it here, offer my naive read of it, and invite you (the user) to tell me what you think the data suggests. Here is the sheet of data I’ve lightly coded as either drug or technology uses of the term user. If you want to recreate this, just do a search for collocates of nouns either four before or four after the word user. You can see what that looks like in a search in the image at the bottom of this post. To talk about these results I have coded them into my own categories, those that have to do with drugs and those that have to do with technology. There are a few at the bottom that I haven’t categorized but which I would most likely call technology uses of the term. I have sorted them first by my categories and second by their frequency. As a last step I have flagged the cells in the sheet with two hits as a dark green and with more than that with a light green to draw attention to the patterns in the data.

What are users using?

The rise of user is also rise of drug user

Throughout the chart users are associated with the general idea of drugs and the specific terms for a range of individual drugs. This would be the user in the “Users are losers” construction. In any event, at least in the case of Time Magazine, the the growth around the term user happened for both drugs and tech at the same time.

The first technology related term that shows up is telephone

The first tech term to show up in this data is telephony. The first thought this suggests the user may have may have less to do with the rise of computing and more to do with the rise of networks. It may well be that we need the concept of the user to describe technology based networks.

Some open questions

  1. How to periodize the history of the user? I have provided a few pieces of evidence. It would seem that this evidence suggests….. If you have other examples of what this evidence might look like I would be thrilled to hear it. Are there other places one would look? Are their other explanations for this evidence?
  2. Was our relationship to technology different before we became users? Or, is the word the only thing that is new here? This is really the crux of the issue. Is this change in language simply an arbitrary neologism? Does the idea of us as users of technology shape our way of thinking about tools and technology? Has it changed how we think about technology? Lastly, what would the evidence look like that would help us answer this question and where would we find it?

Asside: if you want to recreate the search I did for collocates of nouns within four words of the term user it would look like this.

What my search looked like: Click image for bigger pic


On Writing, Making and Mining: Digital History Class Projects

This is the forth post in a multi-post series reflecting on the digital history course I taught last semester at American University. For more on this you can read initial post about the course, the course syllabus, my posts on the value of a group public blog on how technical to get in a digital history course and on how the students content will continue to be a part of future version of the course.

I am a big fan of the idea that building and making is a hermeneutic. Part of what makes the idea of the digital humanities particularly nifty is the idea that we can embrace building tools, creating software, designing websites and a range of maker activities as an explicit process of understanding. Because of this, and in light of my feelings about the necessity for students to develop technical competency, I knew I wanted students in my class to work on a digital project.

With that said I gave my students a choice.

Everyone had to write proposals for both a digital and print project. For print projects they  proposed papers that either used digital tools to make sense of a set of texts or proposed interrogating something that was itself “born digital.” For digital projects students were required to create some kind of digital resource, a blog, a wiki, a podcast, an interactive map, a curated web exhibit, a piece of software, etc.

When I mentioned the structure of this assignment to Tom Sheinfieldt he suggested that I would be receiving 20 papers. One paper from every student. We’ll get back to what I got once I explain my justification for including writing as an option.

Three reasons writing in Digital History is new

Here are three reasons to justify using the limited time in a digital history course to work on writing projects.

The case for writing about mediums

Historians are trained to work with particular kinds of materials and to ask questions which are (to some extent) based on the nature of those materials.  Historical understanding fundamentally requires us to understand how the nature of a given medium shapes and effects the traces of the past it has on it. This requires us to know to think about communication in a letter as a different voice from a speech, and further to recognize that the transcript of a speech is not necessarily  what was said, and does not include information about how it was said. It also requires us to approach different media on the terms on which they were used and the terms on which they function. For an example of some of this kind of work in photography I would strongly suggest Trachtenberg’s Reading American Photographs. Similarly, there is a extensive tradition in “reading” and interpreting everything from tree rings in environmental history, to Long Island parkway bridges in the history of technology, to forks and spoons in Bancroft award winning works of American History. This is all to highlight that there is a long tradition of understanding objects in context in history.  I really want my students to become, to borrow from Matt Kirshembalm borrowing from William Gibson “aware of the mechanisms” they are intrepreting. In this capacity I want my students to do extensive research using and interpreting born digital materials.

The case for writing about data

While history has a long history of working with deeply understanding the medium on which traces from the past are recorded, in my experience, much of that history tends to be focused on close reading. Taking a few examples and digging deeply into understanding them.  In the sciences the question is what do you do with a million galaxies, in the humanities it is what do you do with a million books? In both cases the answer is that we need ways conceptualize and refine ways to do distant reading or at least a hermeneutics of screwing around. In class we looked at a range of examples in this space, nGram, CHOA, tools like Voyer and even things as simple as Wordle.

The case for writing as part of building

Many my students wanted to go into public history. I want them to take the opportunity to deeply explore and reflect on how systems can be created to support their work. Here I am very much in the build things camp, but a big part of building is critically reflecting on what is built. For example, writing about the web presence of a war memorial on Flickr, Yelp, and Tripadviser can offer substantive insights into what and how we should make tools and platforms to support public history. I feel quite strongly that we need a body of design and development literature that deeply engages with analyzing, evaluating, digital humanities projects.

So did I get 20 papers?

I am thrilled to report that many of the students jumped at the opportunity to develop digital skills and build out web projects. In the end I received ten papers and ten digital projects. Several students who built digital projects made comments like “I decided to step outside my comfort zone,” and I was thrilled to see them do exactly that. I think the fact that we worked with so many relatively easy to use platforms for getting web projects up and out there (ie wordpress.com, omeka.net, google my maps, etc.) played a role in getting these projects up and out there. You can browse on the projects page of the site. Both the papers and the digital projects turned out great. From the proposals to the final projects I think you can really see development toward some of the core ideas. With that said there was one interesting trend that I am curious about getting other peoples thoughts on.

No one touched text mining/text analysis:

I thought that some of the students would take the opportunity to use tools like Vouyer or even something as simple as Wordle to work with some of the texts they are already working with in their research. Or, similarly, that some students would use some of the online corpra we looked at to explore some of their research interests in this kind of environment. To take these tools, or to take some of the corpora we were working with and use them to do some historical research. We talked about this a fair bit but no one took these up as a project idea. Instead, all of the papers students worked on explore born digital issues. Don’t get me wrong, students wrote very cool papers, for example, looking at the web presence of different war memorials and examining Fallout’s idea of the wasteland in the context of the history of apocalyptic writing. Further, the web projects turned out great too.

For whatever reason, no one wanted to try to work with the tools like Vouyer or Wordle, and no one took on the opportunity to write something up using Google nGram or Mark Davies Corpus of American English or Time Magazine corpus. In future iterations of this course I imagine I might require everyone to write a short post using at least one of these sorts of things with a set of texts. Thinking about primary source material is data sets is one of the most important things for historians to wrap their heads around.

Is text mining more radical than building for historians?

Students were excited to create digital projects. Students were excited to write about born digital source material. However, no one touched text mining or anything remotely related to distant reading.  Now it is possible that I just didn’t make this sound interesting enough. With that said, we did in fact have a great conversation about distant reading, we did cover some of the very easy to use tools and corpra early in the semester and everyone clearly got it. It makes me think that while in digital humanities conversations the idea of building as a hermeneutic is a hot topic that, at least in the case of digital history, distant reading may well be even more radical. In my own reflection, the kind of data mind set that one needs to develop and deploy in this sort of research feels more distant than the idea that we learn through building.

LMGTFY, Shame, and Collective Intelligence

Let me Google that for you (lmgtfy) is a snarky way to respond to someone asking an obvious question. It was created “for all those people that find it more convenient to bother you with their question rather than google it for themselves.” Lmgtfy has become a relatively popular way to respond in any number of web forums, but more broadly, I think it speaks to the kind of literacy that search is beginning to represent.

To break this down a little bit, when someone responds to your post asking how to pivot tables in excel, or how to tie a bow hitch on a web forum by posting a link to lmgtfy you are being told that the question you asked does not require a human to answer it. It has already been answered on the Internet and with a very simple search query, as demonstrated here, you could have found that answer. At the core of the idea of lmgtfy is the notion that a savoy digital citizen should be able to make specific assumptions about the kind of knowledge the web puts at their fingertips. Lmgtfu is supposed to be a shaming experience, and the possibility of that shame is predicated on a kind of literacy of collective intelligence.

Collective What?
Collective intelligence is a mushy term, in this case I am referring to Pierre Lévy’s notion. In Collective Intelligence (1997) Lévy proposed a vision for the kinds of changes the internet could generate in culture. Lévy suggested that in online culture “The distinctions between authors and readers, producers and spectators, creators and interpreters will blend to form a reading-writing continuum, which will extend from machine and network designers to the ultimate recipient each helping to sustain the activity of others.” (p.121) I think the shame lmgtfy is intended to evoke demonstrates a limited form of this collective intelligence.

Now lets be clear, while proponents of the idea of collective or distributed intelligence and cognition are often accused of proposing some magic brain in the sky that’s not what I’m referring to. Instead, the idea is that parts of the thinking process are always mediated by tools, pen and paper, print media, computer, or mobile device, each is embedded in the cognitive process of individual agents.

On one level, this is rather obvious. Many are advocating that search and google mean that trivia and facts are less important than the ability to find and interpret information. The point I am focusing on here is that there are few key elements involved relating to thinking like a search engine and generalizing from your experience to understand if the specific question is something the Internet should know.

Thinking like Google and Thinking like the Crowd
Who was president in 1832? What’s the best way to steam carrots? How does !important works in css?  Which iPhone 4 case is best? Where can I find some good Indian food in Fairfax, VA? All of these questions have relatively straight forward online answers available. In each case, we have developed a sense of specific, limited, notions of collective intelligence and our internal representation of the kind of information that should be out there to help make a given decision. The successful individual searcher has internalized a representation, a map, of both the way a database organizes information (search terms, where google maps data comes from, etc) as well as what kind of people would share that information (the kinds of folks that review restaurants on Yelp, the extent to which a given problem would be shared, the biases of reviewers of bargain hunters on a given do-it-yourself home improvement forum). Effectively, information literacy is developing this model. In essence this is about knowing three things.

  1. Knowing what kinds of knowledge should be out there on the web. (This is a assumption about the generality of your problem and the nature of information that is put online)
  2. Knowing what kind of search query will get you there. (This is about understanding a bit about how search works, knowing what kind of keywords will get you where you need to go)
  3. Knowing what the limitations of that kind of information are both in terms of kinds of questions one can ask and the biases of the sources one encounters. (This is the interpretive part, and it is once again about your theory for why someone would post this information online)

At the core, each of these are about developing 1) a sense of how computers, and more specifically databases and search engines, structure and organize information and 2) a sense for the kind of people that share specific information in a given context.

Knowing online is internalizing  the machine that is us/ing us
The two points, internalizing a sense of how a computer searches and internalizing a sense of what things people should have shared online to be searched is effectively internalizing a working model of the internet and it’s users in your mind. It is not that the internet is itself an intelligence, but instead that we are constantly updating our mental model of the web and its users through our own search experiences.

The following example of interpreting ratings on Yelp offers a furhter demonstration of how I am thinking of this and also offers a place to consider the idea of general notions of competence and their relationship to individual sites.

Site Specificity and Domain Generality of Collective Intelligence Heuristics
Like all knowledge domains there are idiosyncrasies of competence that are narrow and specific which are nested within broader notions of competence. For example, try this word problem on any Yelper. You want a sandwich, your Yelp search pulls up a restaurant with 4 stars and a restaurant with 5 stars. Which is the better restaurant? Answer: Insufficient information, I need to know how many total reviews there are for each establishment. In short, if the 5 star restaurant has that rating as the result of 3 reviewers and the 4 star restaurant has its score as the result of 124 reviewers it is likely that the 4 star restaurant is well established, and hey, your a Yelper, you know that for every 10 reviewers out there who give a great restaurant 5 stars there will always be a few snarks out there that feel like they can only give a 5 star review once every six months. Now, even if you are not a Yelper, but you are familiar with how reviews work on Amazon, you might have come to the same set of conclusions. In all likelihood the Yelper would have a better sense of how to read individual reviews, and reviewers profiles, in the process of making restaurant decisions. However, the individual with experience with Amazon’s similar system of reviews would transfers and translates that experience into a more general competence about interpreting online ratings and reviews.

Going to the Library of Congress

For just about the last four years I have had the distinct pleasure to work on Zotero and a range of other projects at the Center for History and New Media. It has been an amazing experience and opportunity, and I am grateful to CHNM’s senior staff for all the opportunities they have provided me to hone my skills related to this thing we now call the digital humanities. My time at the Center has shaped the way I think about software and scholarship.

I am very excited to bring this experience into my new position as an information technology specialist with the National Digital Information Infrastructure and Preservation Program (NDIIP) in the Office of Strategic Initiatives at the Library of Congress. I will be specifically working with the technical architecture team. I have been following NDIIP for a while, and not only are they working an array of important and fascinating projects, but everyone I have met who is associated with the program is fantastic.

I am still going to be around George Mason University. Over the years at CHNM I have been thrilled to have the opportunity to collaborate with so many of the folks in the History and Art History program, both through projects at CHNM and through my coursework in the MA program. While I won’t be on campus every day, I will still be around once a week for courses as I continue to work on my doctoral studies in the College of Education and Human Development.

I have a few weeks before I start my new position, and I find I have to pinch myself every once and a while. Growing up just outside of Milwaukee, I never imagined that I could end up working at a place like the Library of Congress. I couldn’t be more excited about the future.

New Omeka Zotero Plugin, or “penut butter in my chocolate”

You know those reese’s commercials where two people crash into each other on a street corner. One eating a chocolate bar and the other gulping down handfuls of peanut butter right out of the jar. They collide and mix the peanut butter and chocolate together, and then realize how fantastic the combination is. Well the open source scholarly software equivalent of that happened today. Thanks to Jim Safley for the launch of the new Zotero Import Plugin for Omeka. He did a great job of explaining it on the omeka blog, but I wanted to take a few moments to explain why getting some Omeka on your Zotero and some Zotero in your Omeka is such a neat thing.

Zotero Just Became a Publishing Platform
There are a lot of scholars with tons of interesting materials inside their Zotero libraries. For example, I have 120 tifs of postcards from my book on fairfax county inside my Zotero library. Zotero’s website has become a great platform for sharing and collaborating with folks to build out those collections, but it’s not really a platform for publishing them. Further, it is definitely not a platform for showcasing the often fascinating image, audio and video files associated with those items. By instaling this plugin on an Omeka site and pointing it at the collection you want to publish you can quickly migrate the content. You can then play with and customize an Omeka theme  and push out a great looking extensible online exhibit.

Omeka Just Got A Tool For Restricting And Structuring Data Entry
On the other side, folks interested in building an Omeka archive just got a very potent way to manage building their collections. One of Omeka’s strengths is its highly flexible data model. It’s ability to let you create item types and manage data schemas is fantastic. With that said, there are times when you actually don’t want all of that flexibility. It can be a bit overwhelming, particularly when you have a large group of people trying to do data entry and add files. Now, if Zotero’s default item types work for your archive you can simply have anyone who is going to add to the archive install Zotero and join your group. In this capacity, Zotero becomes a drag-and-drop UI for adding items and files to an Omeka exhibit. Once everything is in you can simply import all the info into your Omeka exhibit.

Outreach and Scholarly Software

A few months ago I had the distinct pleasure of sharing some of my experiences and thoughts on outreach and community building for scholarly software projects with the One Week One Tool team as part of the first two days of the summer institute. I was excited at the prospect of sharing my experience, but intimidated by the fact that so many of the participants had already done a considerable amount of work in this field. I like to think our outreach conversation went well!

With a little bit of time distancing myself from the actual event I thought I would work through some of the ideas I put forward to the group. My goal here is twofold, first I would like to share some of our discussion of software and community with folks who were not a part of the event and second I would be interested to hear from the participants about how my ideas did or didn’t resonate with the work they engaged in on anthologize. So below you will find 5 principles and 5 roles I see as critical to scholarly software outreach. I don’t by any means claim to have invented anything here, just trying to share my thoughts on practices. If you’re interested in a more extensive run down of the project I would suggest Tom’s interim report.

Trevor’s 5 Outreach Principles

  1. Outreach sounds like it starts at the end, but it should be a bit more ever present. It starts with a conception of audiences. Who are the end users? Who can I get to promote this to those end users? Make this part of your upfront planing process.
  2. Understand outreach as a value proposition for your end users. The more time and energy it takes to get your tool to do whatever it is supposed to do the better it’s payoff should be. Your competing for the end users attention and time. Those are scarce resources. Your tool’s site should establish the problem the tool solves and why this is a great way to solve that problem in as concise a way as possible.
  3. If possible, leverage existing communities. At this point, whatever you are trying to work on has already been worked on. Are there some interesting open standards that have some solid work behind them? Are there some abandoned tools who’s users you could pick up? If you can offer a clear value proposition to an existing group you don’t need to start from scratch.
  4. Spending time convincing people who convince people to use your tool can be far more effective than spending time convincing people to user your tool. This has always been the strategy on Zotero. The most effective parts of our outreach have been developing workshops to train folks to use the tool. There is a value proposition here too. If a tool really makes it, being able to teach someone how to use useful software is a credential.
  5. Look more reputable. People are scared that software will eat their data. Aside from making sure that doesn’t happen, you should try your hardest to make people feel like you are going to stick around for a while. This means that the design of your site matters. Things that look slick, that have active news feeds, that identify who funded them and what the plan for the future is are going to make folks more comfortable. Having a solid reputation is a great goal, but you need to start somewhere, you might even consider connecting the project with things people trust.

Trevor’s Five Components of Outreach

I like to think about outreach as building and engaging with existing communities of software users, evangelists, and potential code contributors. In this view outreach involves at least five very different roles/tasks. Other folks might cut these up and organize them differently, in a big project there could be a range of folks taking on these roles, in a small project they might all fall to the same person.

  1. Usability: If people are going to use the software you need to get a sense of how folks understand it. In my experience, starting with something like user personas in the initial design phases and pushing out functional software for public use and testing is great. In other cases it might make sense to do much more extensive testing.
  2. Marketing and publicity: This part is what a lot of folks think of as core outreach activity. Spreading the word about your tool, trying to get it mentioned in the right places, giving talks at the kinds of meetings and conferences your potential users go to, getting coverage of your tool in the kinds of publications your potential users read. This is great stuff, you can’t ignore it. On the most basic level its about crafting a message and getting it out there through a blog or news section. If you pitch your story right you could get picked up by some real big deal blogs and generate some traffic to your tool.
  3. User Guides and Documentation: After taking a quick look at your homepage the next stop for a lot of users is going to be your documentation. I think one of the biggest problems I see in different projects is that folks think of documentation as technical information and not as part of their outreach efforts. No matter how much you invest in getting people to visit your site, how much you put into getting an attractive homepage, it can all be for naught if you don’t make it as easy as possible for people to figure out how to do what they want to do with your too. The key point here is that documentation is not about describing what the software does it is about telling people how they can do what they want to do with your tool. While you might think the homepage of your site is the front, remember that web search means any page could be the first page someone sees.
  4. User Advocacy: Once a project has users it also has people outside of the development team that want the tool to do something. Now, a lot of tools don’t ever get here, but when they do it is critical to try to role their interests and voice into the project. If you want users to stay around and become advocates you need to need to have someone advocating for their needs.
  5. Sustainability: It’s a great word, and in the world of grant funded work it is, for good reason, THE big idea. If anyone is going to fund your project they want to know that they are not signing up to provide you with funding till the end of time. It is easy to gesture toward a user community as part of a sustainability plan, but it is much more difficult to turn people into users, users into promoters and coders, and those folks into a network that ultimately ensures a sustainable future for a tool.

So what do you think? Is this a reasonable way to think about outreach and scholarly software? Are there things missing from the picture? Are there things in here that you think shouldn’t be viewed as part of Outreach? For folks who participated in the One Week event, how did or didn’t these ideas come into play with Anthologize?

Works that shaped my ideas on users, software and community
  1. Brown, D. (2006). Communicating Design: Developing Web Site Documentation for Design and Planning. New Riders Press.
  2. Garrett, J. J. (2002). The Elements of User Experience: User-Centered Design for the Web. Peachpit Press.
  3. Jin, L., Robey, D., & Boudreau, M. C. (2007). Beyond Development: A Research Agenda for Investigating Open Source Software User Communities. Information Resources Management Journal, 20(1), 68-80.
  4. Krishnamurthy, S. (2005). Launching of Mozilla Firefox – A Case Study in Community-Led Marketing. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.687