As we began our work on the computational side of things, we realized that using quantitative tools for qualitative purposes is not as simple or trustworthy as it might originally seem. What happens when the temporal distance from our period of focus causes a lack of quality data? How can we use modern tools for pre-modern research? What information are we able to garner from computational analysis?
To further ground ourselves in this area, we started exploring some works that concern digital humanities research:
- Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry, vol. 45, no. 3, Mar. 2019, pp. 601–39. journals.uchicago.edu (Atypon), https://doi.org/10.1086/702594.
- Moretti, Franco. Distant Reading. Verso, 2015. https://www.bibliovault.org/BV.landing.epl?ISBN=9780226612973
- Moretti, Franco. Literary Lab – Stanford University. https://litlab.stanford.edu/LiteraryLabPamphlet6.pdf.
- Underwood, Ted. Distant Horizons: Digital Evidence and Literary Change. University of Chicago Press, 2019. University Press Scholarship, https://doi.org/10.7208/chicago/9780226612973.001.0001
Franco Moretti was the person to coin the term “distant reading.” He wanted to take a broad approach to literature to collect patterns that traditional “close reading” might have missed. As he did a lot of his work at the forefront of the digital humanities field, not all of it is very computationally advanced. Compared to Ted Underwood’s work, Moretti’s various analyses seemed to lean more on the side of caution than upon the bleeding edge of data science. He included some essays in the book responding to critiques of his work, which brings an important light to the newness and unsureness of the field and its methodology.
In one of his essays, Moretti creates a character map from Hamlet, and notes the difficulty in choosing how to measure the strength of relationships between people. While a machine might just decide based on how often characters are on the stage together, or how much they talk to each other, a literary scholar might take into consideration the friendliness of such relations. Moretti also explores the interconnectivity of the connections, and looks at what happens when you remove someone, such as the titular character or his friend Horatio. His famous maps were created by hand—a far cry from machine learning—but represent a form of analysis that may be more technically simple, but can be more easily backed up by evidence.
We have a tendency to see numbers and data and assume they are more trustworthy or better than what may be construed more as an opinion. But, even when creating numerical or quantitative evaluations of data, we are bringing in our own biases and dispositions which make it difficult to make a value judgement on what type of research and data analysis is better than another. Moretti himself acknowledges the difficulties in deciding—even as humans—on how to make distinctions, such as the connections between characters in a play, within the ambiguity of literature, as he discusses in a Stanford Literary Lab pamphlet on operationalizing. Regardless, Moretti has done important work for the digital humanities field and raises many important questions in literary studies at large. As opposed to later researchers, such as Underwood, Moretti seems to be asking more about what we can do and how we can explore the possibilities of what distant reading can do, instead of starting with a baseline assumption of the authority of these computational methods.
In another groundbreaking work on the digital humanities, Underwood contends that distant reading techniques, specifically statistical models, can uncover “genuinely new objects of knowledge” for yet-unnamed historical patterns (Underwood 4). He stresses the importance of complementing traditional practices like close reading with the newer computational methods instead of fearing that the latter will ever displace the former. For Underwood, “consensus becomes elusive” when formulating broad historical frameworks in traditional literary scholarship, but distant reading can garner evidence for pinpointing and assessing historical trends because it allows a large number of texts to be analyzed together (Underwood 9). His work focuses particularly on the modeling of genre, the probability of a certain text to be classified as fiction or nonfiction. See the Limitations in Digital Humanities Research page for a brief comment on one of his studies in this book.
For a perspective from scholars who oppose aspects of the digital humanities, we read Nan Z. Da’s critique of computational literary studies (CLS), specifically those papers which she thinks lack or misrepresent results. While our investigation does not focus specifically on literature as a genre, we are interested in applying computational tools to examine metaphorical and ethical language. According to Da, current tools for CLS are “just fancier ways of talking about word frequency changes” (Da 607), which leads scholars to be “prone to fallacious overclaims or misinterpretations of statistical results…[by] making claims based purely on word frequencies without regard to position, syntax, context, and semantics” (Da 611). In the context of our work, we attempted to account for semantics and context via word embeddings, and we are careful not to claim any decisive interpretation–only speculative guidance for more exploration.
Moreover, Da assumes that computational inquiry is only powerful when it involves hypothesis testing: “The power of a statistical test comes from having meaning and setting up a null/alternative hypothesis that’s informative” (Da 619). Due to her perceived inability of CLS tools to test a meaningful hypothesis beyond word frequency, she concludes that not all patterns found in unknown data are “automatically worthy of attention” and that “literature—in particular, reading literature well—is that cut-off point” for the effectiveness of computational methods (Da 639). In response, we think that the process of data exploration should focus on examining any and all patterns that may emerge because we should not approach the data with assumptions about what is “worthy” or not. Just because the tools and approaches are not so well-developed now does not mean that they won’t improve in the future given continued research and development. The fact that the humanities is so complex is precisely why one should seek and test new methods.
These works are just a sampling of the field, and there are many other pieces of scholarship in this area, some of which produce more reputable results than others. As such, we have brought a critical lens to the extent of what the computational methods we used can do, and what they cannot. The digital humanities are a new and emerging field for which there are not clear procedural guidelines to follow. Thus, we have had to figure out a lot as we go and take our results with a grain of salt. All in all, we found distant reading techniques to be more helpful for identifying specific texts and directions for completing close reading, rather than for make sweeping claims vis-à-vis a time period which we are quite far removed from and do not have the complete data of. In any case, we will be excited to see how this field grows and changes over time.