Student Report on Linguistics, Humanities, and Data Sciences: Their Intersections and Implications

Reported by Yongkun Vicky Wu, class of 2026

This talk by Yachao Sun, Xiaofei Pan, and Ge Lan on Linguistics, Humanities, and Data Sciences: Their Intersections and Implications is part of the Third Space Lab (TSL) Brown Bag Lunch Research Talk. The program is broadly associated with research projects related to languages, cultures, and intercultural communication.

This research talk given by Prof Yachao Sun, Xiaofei Pan, and Ge Lan was divided into four parts: introduction to the project, the Data+X research, the Stanza paper, and to the progress of the project and call for collaboration.

The goal of the project, according to Prof Pan, is to build an institutional learner corpus of CSL writing at DKU in order to address empirical questions of Chinese learning and to archive student learning outcomes. To achieve this, researchers need to collect writing samples from students and conduct relevant interviews. The project is based on interdisciplinary research (linguistics, humanities, and data science) and corpus and text analysis using computational techniques. It addresses the under-explored topic of whether and how translingual practices can facilitate the teaching and learning of CSL writing. It is worth noting that the research applies mixed methods, including both qualitative and quantitative approaches.

One computational tool, Stanza, was specifically mentioned and applied in this research. Due to its recency, accessibility, ease of use, and reported accuracy, the researchers used it to analyze Chinese writing, a field that few research attended to. Stanza is measured in terms of its precision and recall rates. The researchers of this project found that despite inaccuracies in some circumstances, Stanza has great potential to assist the application of Chinese language research.

Finally, the professors called for future collaboration from DKU faculty and students. As this is a long-term project, researchers aim to build a larger corpus for students who learn Chinese as a second language. The audience was encouraged to share the project in their classrooms and invite students to participate. Potentially, when the data is large enough to build an ideal corpus, teachers can use the corpus for teaching and research purposes.

In general, the audience showed great interest in this project, especially the application of the computational tool, Stanza, and the potential corpus focusing on DKU students.