Article Text

Download PDFPDF
134 Automatic identification of intimate partner violence victims from social media
  1. Mohammed Al-Garadi,
  2. Abeed Sarker,
  3. Yuting Guo,
  4. Elise Warren,
  5. Yuan-Chi Yang,
  6. Sangmi Kim
  1. Emory University, Atlanta, USA


Statement of Purpose During the COVID-19 pandemic, public health measures to control the spread of coronavirus (e.g., stay-at-home orders) have led to substantial increases in intimate partner violence (IPV), threatening the safety and health of victims and their children. In response to this and to prepare for future public health crises, this research aims to develop a model to automatically identify IPV victims’ reports on Twitter using artificial intelligence, namely natural language processing (NLP) and machine learning.

Methods/Approach Using a list of IPV-related keywords (e.g., ‘partner abuse’), we collected publicly available tweets. Four annotators manually coded each tweet to indicate self-report of IPV or not (Cohen’s kappa = 0.86). We used a total of 6,348 annotated tweets to develop NLP models. We experimented with deep learning algorithms and state-of-the-art transformer-based models (e.g., BERT, RoBERTa) and evaluated the models based on F1-score.

Results The RoBERTa model achieved an overall accuracy of 95% (F1-score 0.76 and 0.97 for IPV and non-IPV, respectively). The word importance analyses showed that our developed model was not biased towards the posters’ gender or ethnicity while making classification decisions. We identified 1,803 IPV tweets (4.6% of assessed tweets ) using the developed model; the analysis on these tweets found several tweets related to abusive relationships (9.23%), threatening (3.5%), sexual assault (2.77%), and child abuse (2.21%).

Conclusion Our NLP pipeline can automatically collect and categorize tweets to identify IPV victims at scale and with high accuracy.

Significance We showed that a non-conventional source (Twitter) can be used to obtain actionable IPV-related insights during the COVID-19 pandemic when data collection through surveys, medical, or police reports were restricted. Using the developed NLP pipeline, we can potentially reach out and provide IPV victims with non-contact interventions, which can be used beyond the COVID-19 pandemic

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.