Gerben Zaagsma (C2DH, University of Luxembourg) invited Ralf Futselaar and me to organise a workshop ‘Introduction to Text Mining in R and RStudio’ at the University of Luxembourg.
Machine readable text corpora are increasingly available to humanities researchers, as well as a growing variety of digital techniques to extract information from this data. The purpose of this introduction to Text Mining and Word Embedding Models in R was to familiarize participants of the workshop with these techniques and to help them incorporate them in their methodological workflow. The workshop outlined the basic steps and methodological considerations the participants needed to start their own Text Mining research project.
We used a digitized version of Arthur Conan Doyle’s Sherlock Holmes novels and stories as our test dataset. We discussed downloading, loading, and pre-processing textual data within the R environment. We also payed attention to different analytical text mining techniques, ranging from relative simple word counts and weighting schemes to analyzing keywords in context and training and using Word Embedding Models.