Skip to content

Topic Modeling

What is Topic Modeling?

The topic model result provides more information about the themes in the seventeenth century. A topic includes a cluster of words that frequently co-occur together. With the assumption that each document has a finite number of topics, we can use algorithms to classify the words into defined topics. 

There are many algorithms available to perform Topic Modeling, including Mallet, LDA, SeedLDA. For this project, we applied the LDA(Latent Dirichlet Allocation) package in R. In LDA Algorithm, we made two assumptions:

    1. Distributional hypothesis: similar topics make use of similar words
    2. Statistical mixture hypothesis: documents talk about several topics for which a statistical distribution can be determined (Dirichlet Distribution)

As a result of those assumptions, LDA also presented certain limitations including;

    1. Dirichlet topic distribution cannot capture correlations
    2. There are only a fixed number of topics to be harvested

This raised an interesting question of the rule of computational tools to digital humanities research. While statistical models capture insights more efficiently than a human, it is also prone to bias and error due to model limitations. Therefore, we hope to combine qualitative insight from reading with the quantitative and computational results.

Topic Modeling Result

Figure 1: top tokens, 1660 – 1666                                                              Figure 2: topic model result, 1600 – 1700

From the quantitative topic modeling result with qualitative understanding from reading, we constructed a topic lexicon including:

  • Religion/spirituality: household (of God), temple, destiny, creator, genesis, prophecy, God, Scripture, Holy, sacred, Determinism, Spirit, deity, creation, moral, Bible, Doctrine
  • Monarchy: King, Queen, Loyalty, court
  • Wealth: goodly, wealthy, grandeur, chaste, wigs, rich, ruffles, coaches, luxury, fine, trifles, cakes, nuts
  • Science: nature, materialism, Atheism, science, reason, matter, vitalism, will, secular