Posted on 25/04/2018 by Human Data Science
in: Categories: Colloquium.

03/05/2018 – Digital Humanities and Text Mining: Stylistic and Intertextual Analysis of Large Corpora

Paul’s presentation and code can now be found the MSDSlab Github page. IMG_20180503_155420437

Original post:

Thursday 03/05/2018 at 15:30 in room B1.09

Paul Vierthaler, a university lecturer at Leiden University in the Digital Humanities, will discuss the methodological approaches he takes in his research on late Imperial Chinese literature. Paul studies the relationships among historical and fictional documents written in late Ming and early Qing China (1550 to 1700) at the corpus level. To do this, he uses a variety of methods developed by linguists, computer scientists, and biologists. In his talk, Paul will cover stylometric analysis and an intertextuality detection algorithm based on the bioinformatics algorithm BLAST (Basic Local Alignment Search Tool). While this talk will ground the methodology in specific research questions, he will mainly focus on describing his approach to blending information retrieval with literary studies.

This talk will start 30 minutes later than our regular starting time!

Preparation: These are some suggested, but not essential, readings:

A piece on an early iteration of the project Paul will be discussing:http://culturalanalytics.org/2016/05/fiction-and-history-polarity-and-stylistic-gradience-in-late-imperial-chinese-literature/
A discussion of Vector Space Models: Peter Turney and Patrick Pantel, “From Frequency to Meaning: Vector Space Models of Semantics,” Journal of Artificial Intelligence Research 37 (2010): 141-188
A discussion of using PCA to study Stylometery: JNG Binongo and MWA Smith, “The Application of Principal Component Analysis to Stylometry,” Literary and Linguistic Computing 14 (1999): 445-466
A blog post on the Programming Historian introduces Stylometry and how its done in Python: https://programminghistorian.org/lessons/introduction-to-stylometry-with-python

Human Data Science (HDS)

Colloquium

03/05/2018 – Digital Humanities and Text Mining: Stylistic and Intertextual Analysis of Large Corpora

Thursday 03/05/2018 at 15:30 in room B1.09