In reading the debate over whether or not literature is data I was very confused, possibly because the answer to that question seems so obvious to me.  If we think of a book, for example, as being a work of literature, then we have to acknowledge that literature is at least sometimes data, since we can store all the relevant information about the book on a computer*.  In fact, any piece of literature that can be kept on a hard drive is data.  The real question is whether we can create computer programs that provide meaningful and relevant insights about the works we feed them.

That an algorithmic analysis of literature can be fruitful has already been demonstrated by distant reading, which has already revealed, for example that the frequency of the word “the” changes throughout literary periods[1], or that in the 19th century, the Irish were four times more likely than the English to appear in court on trial for their lives[2].  Insights such as these are certainly relevant to their respective fields, so they augment research in the humanities, and by extension augment reality.

 

*Ok, we can usually store all relevant information about the book.  If the physical structure of the book is important, then things become more difficult, but I would argue that even in these special cases we either can or will soon be able to fully digitize them.