In my opinion, data encompasses all things that can be considered information. Information can vary in type and scope, and by looking at it from different angles we can reach different conclusions about its nature. Distant reading, or “macroanalysis” focuses on understanding beyond the minutia of individual works but rather more general understanding of a larger class – such as a genre or time period [1]. Just as close, detailed reading has it’s merits in understanding the implications of a particular work, macroanalysis can give understanding to it’s context. Take for example the use of macroanalysis to identify J.K Rowling as the author of the crime novel “The Cuckoo’s Calling” [2]. The book was released under a pseudoname, but by comparing it to Rowling’s other books using macroanalysis techniques, like comparing word length and adjacency. Projects like Google N-Grams and roadtrip maps are useful because they provide visual context to a large amount of data. As a result, we can see relationships that would not so easily be spotted in close reading. In the n-gram project, we can see the relationship and uses of words across time periods in literature. We can make conclusions based on the use and disuse of a word over time, like the rise in the use of cities during the industrial period. Projects like these augment scholarship in a scope sense. They allow us to step further back and approach genres rather than particular pieces of literature. I don’t think they necessarily augment reality – but provide a new way of visualizing it.
[1] http://www.matthewjockers.net/2011/07/01/on-distant-reading-and-macroanalysis/
[2] http://www.telegraph.co.uk/culture/books/10178344/JK-Rowling-unmasked-as-author-of-acclaimed-detective-novel.html
Is literature data? Yes and no.
From a layman’s perspective, no. The fact is that nobody except critics of digital humanities has ever seriously considered this question. We were all brought up believing that every literature piece we read and see, whether fiction, non-fiction, poetry or prose, is an artistic piece that reflects the author’s intentions, aspirations and quite possibly hidden philosophical ideas. To consider a literature piece just an accumulation of written language symbols, alphabetical letters, individual soundwaves or paint strokes seems absurd to many of us, and fairly so. Literature is in a sense not data because merely analyzing this art in terms of scientific means takes away the most significant aspects of a literature piece: the author’s artistic and creative elements. These factors just cannot be simply “measured” and summarized like it can be done with computer algorithms. In fact, a specific literature work might garner thousands upon thousands of different interpretations, each with it’s own unique analytic aspects, whereas data, with its ultra-clear structure and quantitative properties, might merely yield one result. Stephen Marche, in his critique Literature is not Data, even goes as far as to claim that “the story of literature [regarded as] data is a series of…failures.” [1] Therefore, in this perspective, literature cannot be stringently considered to be data.
We also, however, have to accept the fact that literature works, though fraught with myriads of interpretations, are on the very basic level still a construction of numerous individual aspects that ultimately come together, and that those aspects can be analyzed one by one for finding principle and relationships in a literature piece. Franco Moretti advocated the notion of “distant reading”, which means that we should not interpret written literature in terms of studying specific texts, but “by aggregating and analyzing massive amounts of data.” [1] This may seem a radical proposal, but it does have some meaning attached to it and is a valuable exercise to engage in. Literature pieces themselves have been segmented by people into parts in terms of their many meanings, such as different genres, chapters, settings, plot details, persons (protagonist, antagonist etc.), archetypes, symbols and the list goes on. Furthermore, as books and the like appear more prolifically around us nowadays, it becomes increasingly harder to read and analyze all written text. Data accumulation software and websites such as Google Ngram [2] and Understanding Shakespeare [3] have opened up opportunities for people to achieve their research objectives without having to skim through the information from piles of books. With these new abilities, the digital humanities and its many revolutionary aspects (i.e. distant reading) have practically “augmented” scholarship and reality alike by creating new paths for research, interpretation and exploration both for classical and newly-emerging literature works.
References:
[1] Marche Stephen, Literature is not Data: Against Digital Humanities, http://lareviewofbooks.org/essay/literature-is-not-data-against-digital-humanities/.Accessed Oct. 2, 2013
[2] Google Ngram Viewer, Google Inc., http://books.google.com/ngrams. Accessed Oct. 2, 2013.
[3] Understanding Shakespeare, http://www.understanding-shakespeare.com/. Accessed Oct. 2, 2013.
Generally, whether or not literature is data depends on your definition of data. If one is to classify data simply as information that can be quantified or analyzed in some way, then literature would absolutely fit that definition. Data is not just scientific observations, mathematical figures, or sets of graphs – media can be considered data as well. Music, literature, even paintings – one can perform all sorts of analyses on these works to generate data, both quantitative and qualitative. Marche’s article refers to the analysis of literature as data as “distant reading.” While he argues that this type of approach to reading ruins the experience as we know it, I believe that it is instead a different, valuable sub-discipline of literature. Distant reading, or macroanalysis, allows one to have a multidimensional understanding of a work. Its context in a larger literary ecosystem (period in time, cultural significance, etc.) can be understood by treating the book on a more holistic level. One can understand writing styles, forms, and conventions by looking at literature objectively; temporarily staying away from subjective plot or thematic analyses and looking at the mechanical details of literature opens it up to an entirely different type of scholarship, namely digital humanities. This additional perspective on the same work should be welcomed and valued. The projects studied in the course improve the quality of literature scholarship – they are tools we can use to gain another perspective beyond the scope of unassisted brainpower alone. Especially with larger volumes, using tools to perform distant reading can almost instantly compile word patterns, trends, and more and present them in such a way as to facilitate our digestion of the information. In this sense, these projects augment reality. They give us “superpowers” of analysis. They allow us to access an entire history of literature and academia instantly, which would be otherwise impossible. The most obvious value in using digital tools to analyze literature as data is that it allows us to handle large volumes of information much more easily and efficiently.
Marche, Stephen. “Literature is not Data: Against Digital Humanities.” Los Angeles Review of Books. 28 Oct 2012: n. page. Web. 2 Oct. 2013.
Literature as Data: Expanding our Literary Experiences
October 2nd, 2013 | Posted by in Uncategorized - (0 Comments) The argument of whether literature is data depends upon the definition of data. Data can be viewed in a negative connotation, in a way that removes the artistic and creative elements and turns something into a quantitative subject. It can also be viewed simply as a form of information, from which we can establish interpretations and analyses that we can learn from. In his article Literature is not Data: Against Digital Humanities, Stephen Marche makes several bold statements claiming the introduction of digital books have brought about the “…end of the book as we know it”. However, digitizing literature offers us an additional medium through which literature can be experienced, analyzed, and interpreted in different ways. It is not the end of the book as we know it, but rather the expansion of the book as we know it. Literature has always been taken apart—quotes, syntax, characters, plots, symbols, themes, and more have been discussed, interpreted, debated, and given meaning since their origin. Digitizing books doesn’t put an end to this kind of thinking, but rather provides tools that allow us to go even further in depth. Digital tools, such as n-gram, allow us to compare thousands of types of literature in seconds. Computer algorithms allow us to delve into details that would take years of collecting and studying to analyze in as little as a few seconds.
Literature has always been data—we have always learned from it and always used it as a tool to examine different elements of writing, human psyche, cultural reflections, and more. Just because there are new means to examining this data doesn’t mean that the old form is nonexistent. The book as we know it today still exists, and personally I prefer reading a hard copy of a book. I can choose not to associate with the digital tools that are being developed and experience books in the more nostalgic form. However, the fact that those digital tools exist provides the opportunity for me to expand my knowledge and understanding of a book, if I choose. Digital tools open the window for books to be examined on a large scale, to go in depth with details, such as word choice, while covering a wide sample, which could range from several works to an entire era’s worth of literature.
Marche, Stephen. “Literature is not Data: Against Digital Humanities.” Los Angeles Review of Books. 28 Oct 2012: n. page. Web. 2 Oct. 2013. .