Is Literature Data?

Generally, whether or not literature is data depends on your definition of data. If one is to classify data simply as information that can be quantified or analyzed in some way, then literature would absolutely fit that definition. Data is not just scientific observations, mathematical figures, or sets of graphs – media can be considered data as well. Music, literature, even paintings – one can perform all sorts of analyses on these works to generate data, both quantitative and qualitative. Marche’s article refers to the analysis of literature as data as “distant reading.” While he argues that this type of approach to reading ruins the experience as we know it, I believe that it is instead a different, valuable sub-discipline of literature. Distant reading, or macroanalysis, allows one to have a multidimensional understanding of a work. Its context in a larger literary ecosystem (period in time, cultural significance, etc.) can be understood by treating the book on a more holistic level. One can understand writing styles, forms, and conventions by looking at literature objectively; temporarily staying away from subjective plot or thematic analyses and looking at the mechanical details of literature opens it up to an entirely different type of scholarship, namely digital humanities. This additional perspective on the same work should be welcomed and valued. The projects studied in the course improve the quality of literature scholarship – they are tools we can use to gain another perspective beyond the scope of unassisted brainpower alone. Especially with larger volumes, using tools to perform distant reading can almost instantly compile word patterns, trends, and more and present them in such a way as to facilitate our digestion of the information. In this sense, these projects augment reality. They give us “superpowers” of analysis. They allow us to access an entire history of literature and academia instantly, which would be otherwise impossible.  The most obvious value in using digital tools to analyze literature as data is that it allows us to handle large volumes of information much more easily and efficiently.

Literature as data?

The question of literature being data brings up a lot of controversy depending on who you ask. To answer this question, one must define what data is. Data does not really have a set definition, and can vary depending on what kind of data you are talking about. In terms of data being a quantitative set of points that can be analyzed, I argue that everything is data. Therefore literature is data. But I don’t believe that treating literature as data is ‘the end of the book as we know it’ as Stephen Marche believes. Treating literature as data, or distant reading literature and other forms of writing adds to  the experience one can attain from reading. But distant reading and treating literature as data is completely optional, a practice that one can abstain from if they chose. In that way, books can have a multidimensional character to them that can allow a literary scholar to simply analyze the text of a novel, while a digital scholar could analyze the word count and frequency of that same novel. Both people could come to significantly different conclusions of the meaning of the novel, but this just adds to the creativity the author put into it rather than taking away anything. Distant reading and treating literature as data can only add to the experience of reading, and can give us a grasp of ideas that could not have been discovered with just human brainpower. Tools like Google N-Gram, or text analysis of ‘JK Rowling’ novels use the idea of distant reading and the data in literature to elucidate complex patterns that show real meanings. Projects like these, especially Google N-Gram, augment scholarship by analyzing sets of data that are so large and impossible for one person or even universities of people to analyze by themselves. Through digitizing and searching over 6% of all literature ever published, N-Gram gives us insights into times of history when record-keeping only took place in literature. It allows us a holistic insight into periods of history, that could only be achieved in the past by reading as many books from that time as possible. Now we have libraries upon libraries at our finger tips.