they/them (en) hen/hen (sv)
I'm a PhD student at the Department of Computing Science, in association with the Graduate School for Gender Studies, at Umeå University, supervised by Henrik Björklund and Jenny Björklund. My thesis in the field of computer science, more specifically within NLP.
My thesis work explores gendered biases, trans and nonbinary inclusivity, and queer representation within Natural Language Processing (NLP) through a feminist and intersectional lens. The thesis tackles this in three key areas: the ways in which "gender" is theorized and operationalized by researchers investigating gender bias in NLP; gendered associations within datasets used for training language technologies; and the representation of queer (particularly trans and nonbinary) identities in the output of both low-level NLP models and large language models (LLMs). Throughout, I demonstrate that nonbinary people and genders are erased by bias in NLP tools and datasets, but also by research/ers attempting to address gender biases. Via a case study, I explore ways to mitigate some of this disparity in one foundational part of a "classic" NLP pipeline, part-of-speech tagging.
Completed Papers (organized by topic)
"Gender" in Gender Bias
- Theories of "Gender" in NLP Bias Research (2022) FAccT'22 Devinney, Hannah; Björklund, Jenny; Björklund, Henrik
- "Although gender is in reality not a binary...": Investigating the limits of nonbinary acknowledgement in Natural Language Processing bias research papers" (2023) Algorithms for Her? 2 (talk only) Devinney, Hannah
Gender Biases in Data
Model Output and Representation
Planned Papers for inclusion in the thesis:
- Applying Heterogeneous Supervised Topic Models (HSTMs) to a corpus of news articles (in English, largely but not entirely from US-based outlets) concerning LGBTQ+ topics/persons, training the HSTMs on the expected polarity (pro/anti-queer sentiment) of the news outlet that published the article. Two timespans of articles will be used, to explore shifts in the latent topics underlying these polarities. We also hope to see if older anti-queer sentiment data is sufficient to accurately predict labels for new anti-queer sentiment data, which has implications for how we develop bias and toxicity detection tools.
- Exploring gendered themes and pronoun handling in LLMs (in English). This will be a comparative study of model output across several LLMs, focusing on two key items: their ability to consistently and correctly use pronouns, particularly neopronouns; and the ways in which users of different pronouns are represented. We will prompt the LLMs for narrative output, varying methods of specifying the pronouns that the subject of the story uses, and take a mixed methods approach to analyzing the resulting narratives.
Papers not appearing in the thesis: