Sure. But it isn’t just data. Literature, like anything that can leave an impression on someone, is greater than the sum of its parts. But as powerful as it is, literature is certainly made of information, in the same way that people are made of cells. A novel is just a sequence of words and symbols, information that was printed and distributed so that someone could read it and be moved by it. It wouldn’t even be fair to say a particular book is a novel in the truest sense. When my friend talks with me about The Hobbit, he isn’t referring to the battered tome gathering dust on my bookshelf back home. He is referring to the story those weathered pages conveyed to me, as another set of pages did for him. The Hobbit itself is just a very specific idea, an object without a body. Transmission of information is literature’s modus operandi, its only way to exist.
So of course data is an integral part of literature. We can study literature as information without compromising its deeper meaning just as we can study biology without compromising our identities as people. New ways of viewing literature, like distant reading, allow us to view this information in ways that are only just now becoming possible. We can observe trends across entire genres, which has already been attempted has never truly be feasible before. Even the most well-read scholar could only possibly have personally read a fraction of any major genre. This scholar can only claim to know about the genre in general by learning what other scholars think about its remaining items, filtered through all of their biases and misrecollections. This isn’t true knowledge of the genre, just a copy of a copy of a copy. Scholars make their best inferences based on what they know. When they are drawing from a sufficiently large data pool, we can assume that their conclusions are reasonable enough. But with distant reading we may at last be able to conclusively say what characterizes a gothic novel, or the zeitgeist of the 1920’s, or the answers to any number of other questions that we can’t properly conceive of just yet.
We can now identify an author as a pen name for J.K. Rowling through forensic analysis. This probably seems cold and calculating to some, but it is just a new way of looking at information that people notice all time. Most people can do an impression of a friend’s style of speaking, or correctly identify a book’s author by reading a few passages. We recognize linguistic trends. We don’t do this in an effort to break the world down into data, but there are patterns in language that we notice without any particular effort. Now that we are casting a more rigorous analytical eye to literature, we may be able to finally put a name to those ineffable qualities that make up style.
The process of distant reading may seem inane now, since it’s producing things like statistics on the usage of the word “the”. But it shows that patterns exist which we have not been able to see before, and that we can develop the tools to make sense of them. As we get more familiar with the process of examining large bodies of literature, we should expect to find fascinating patterns which we may not have thought to look for 15 years ago, or even have the vocabulary to describe today. The conception of literature as data is not a threat to literary tradition. It is a tool for augmenting reality.