Lots of my research involves building data-sets from web texts. For my PhD research I built a 57-million-word corpus from NZ’s parliamentary debates. The NZ Parliamentary Language Corpus (Version 2; 2016) is annotated to allow comparisons between speakers, political parties and over time.
In my research I’ve built corpora for a number of research projects. For example, for the Mapping LAWS project I’ve built multiple corpora of military, political, activist, academic, media and other discourse about autonomous weapons.
I also teach web scraping and corpus building and have built corpora to provide timely and relevant data-sets for students to analyse in computer labs.
Skills & Tools: Web scraping; Data wrangling; Corpus construction; Python; MySQL.