Lit 80, Fall 2013
Header

Is Literature Data?

October 4th, 2013 | Posted by Zhan Wu in Uncategorized - (0 Comments)

Is literature data? Yes and no.

From a layman’s perspective, no.  The fact is that nobody except critics of digital humanities has ever seriously considered this question. We were all brought up believing that every literature piece we read and see, whether fiction, non-fiction, poetry or prose, is an artistic piece that reflects the author’s intentions, aspirations and quite possibly hidden philosophical ideas. To consider a literature piece just an accumulation of written language symbols, alphabetical letters, individual soundwaves or paint strokes seems absurd to many of us, and fairly so. Literature is in a sense not data because merely analyzing this art in terms of scientific means takes away the most significant aspects of a literature piece: the author’s artistic and creative elements. These factors just cannot be simply “measured” and summarized like it can be done with computer algorithms. In fact, a specific literature work might garner thousands upon thousands of different interpretations, each with it’s own unique analytic aspects, whereas data, with its ultra-clear structure and quantitative properties, might merely yield one result. Stephen Marche, in his critique Literature is not Data, even goes as far as to claim that “the story of literature [regarded as] data is a series of…failures.” [1] Therefore, in this perspective, literature cannot be stringently considered to be data.

We also, however, have to accept the fact that literature works, though fraught with myriads of interpretations, are on the very basic level still a construction of numerous individual aspects that ultimately come together, and that those aspects can be analyzed one by one for finding principle and relationships in a literature piece. Franco Moretti advocated the notion of “distant reading”, which means that we should not interpret written literature in terms of studying specific texts, but “by aggregating and analyzing massive amounts of data.” [1] This may seem a radical proposal, but it does have some meaning attached to it and is a valuable exercise to engage in. Literature pieces themselves have been segmented by people into parts in terms of their many meanings, such as different genres, chapters, settings, plot details, persons (protagonist, antagonist etc.), archetypes, symbols and the list goes on. Furthermore, as books and the like appear more prolifically around us nowadays, it becomes increasingly harder to read and analyze all written text. Data accumulation software and websites such as Google Ngram [2] and Understanding Shakespeare [3] have opened up opportunities for people to achieve their research objectives without having to skim through the information from piles of books. With these new abilities, the digital humanities and its many revolutionary aspects (i.e. distant reading) have practically “augmented” scholarship and reality alike by creating new paths for research, interpretation and exploration both for classical and newly-emerging literature works.

References:

[1] Marche Stephen, Literature is not Data: Against Digital Humanities, http://lareviewofbooks.org/essay/literature-is-not-data-against-digital-humanities/.Accessed Oct. 2, 2013

[2] Google Ngram Viewer, Google Inc., http://books.google.com/ngrams. Accessed Oct. 2, 2013.

[3] Understanding Shakespeare, http://www.understanding-shakespeare.com/. Accessed Oct. 2, 2013.

 

Literature as data?

October 3rd, 2013 | Posted by Sheel Patel in Uncategorized - (0 Comments)

The question of literature being data brings up a lot of controversy depending on who you ask. To answer this question, one must define what data is. Data does not really have a set definition, and can vary depending on what kind of data you are talking about. In terms of data being a quantitative set of points that can be analyzed, I argue that everything is data. Therefore literature is data. But I don’t believe that treating literature as data is ‘the end of the book as we know it’ as Stephen Marche believes. Treating literature as data, or distant reading literature and other forms of writing adds to  the experience one can attain from reading. But distant reading and treating literature as data is completely optional, a practice that one can abstain from if they chose. In that way, books can have a multidimensional character to them that can allow a literary scholar to simply analyze the text of a novel, while a digital scholar could analyze the word count and frequency of that same novel. Both people could come to significantly different conclusions of the meaning of the novel, but this just adds to the creativity the author put into it rather than taking away anything. Distant reading and treating literature as data can only add to the experience of reading, and can give us a grasp of ideas that could not have been discovered with just human brainpower. Tools like Google N-Gram, or text analysis of ‘JK Rowling’ novels use the idea of distant reading and the data in literature to elucidate complex patterns that show real meanings. Projects like these, especially Google N-Gram, augment scholarship by analyzing sets of data that are so large and impossible for one person or even universities of people to analyze by themselves. Through digitizing and searching over 6% of all literature ever published, N-Gram gives us insights into times of history when record-keeping only took place in literature. It allows us a holistic insight into periods of history, that could only be achieved in the past by reading as many books from that time as possible. Now we have libraries upon libraries at our finger tips.