Detecting Adverse Drug Side Effect

In this work, we ask whether social media data provides a useful supplementary data source for detecting adverse drug side effects, when taken in additional to spontaneous reports provided by, for example, care providers.

Our contributions are:

  • We show that discussion forums are an alternative data source of unsolicited user data that occurs in large quantities. This data source can overcome the under-reporting problems with the traditional data pathway in which time-poor doctors may not lodge reports of adverse drug events with the regulatory agency such as the TGA (Therapeutic Goods Administration).
  • We show that an information extraction pipeline, together with a domain ontology (SNOMED and MedDRA) can be used to mine the discussion forum data. We developed and evaluated novel methods for extracting drug, adverse event, disease, and symptoms from forum data. We also developed methods to normalise these concepts to their corresponding entries in official ontologies especially MedDRA (Medical Dictionary for Regulatory Activities) which is the most used ontology in pharmacovigilance both by pharmaceutical companies and regulatory agencies.
  • We developed a system called CADEminer based on the methods developed above. It allows to determine whether or not an adverse event reported in social media posts is novel or unknown to the drug or it is already listed in drug labels. It provides statistics of the reported adverse events while grouping various ways of expressing the same adverse event automatically.  
  • We make available the CADEC data set, through CSIRO Data Portal, for research in extraction of adverse drug side effects from medical forums. This data set has already triggered further Data61 publications, through a collaboration with Lizhen Qu [29].
  • We conclude that discussion forums are a useful source of knowledge.  They do pose interesting challenges however, including ambiguity in how adverse events are expressed in lay people language and their mapping to the expert terminology, and most importantly difficulty in establishing the causality between the drugs and adverse events.
  • This work was a collaboration between our team, AeHRC, and the TGA.


  • Text and Data Mining Techniques in Adverse Drug Reaction Detection S Karimi, C Wang, A Metke-Jimenez, R Gaire, C Paris. ACM Computing Surveys 47 (4), Article 56
  • Karimi, Sarvnaz; Metke Jimenez, Alejandro; Kemp, Madonna; Wang, Chen. CADEC: A Corpus of Adverse Drug Event Annotations. Journal of Biomedical Informatics. 2015; 55:73–81.
  • Metke Jimenez, Alejandro; Karimi, Sarvnaz. Concept Identification and Normalisation for Adverse Drug Event Discovery in Medical Forums. In: The First International Workshop on Biomedical Data Integration and Discovery; October 17-21, 2016; Kobe, Japan. CEUR; 2016. 18-24.
  • Wang, Chen; Karimi, Sarvnaz. Parallel Duplicate Detection in Adverse Drug Reaction Databases with Spark. In: EDBT 2016; 15-18 March 2016; Bordeaux, France. ACM; 2016. 551-562.
  • Metke Jimenez, Alejandro; Karimi, Sarvnaz; Paris, Cecile. Evaluation of Text-Processing Algorithms for Adverse Drug Event Extraction from Social Media. In: SIGIR International Workshop on Social Media Retrieval and Analysis; July 11, 2014; Gold Coast, Australia. ACM; 2014. 15-20.
  • Wang, Chen; Karimi, Sarvnaz. Causality driven data integration for adverse drug reaction discovery. In: Big Data 2014; 3-4 April, 2014; Melbourne, VIC. Australia. Health Informatics Society of Australia (HISA); 2014. 44-45.
  • Wang, Chen; Karimi, Sarvnaz. Differences between social media and regulatory databases in adverse drug reaction discovery. In: SIGIR International Workshop on Social Media Retrieval and Analysis (SoMeRA 2014); 11 July, 2014; Gold Coast, QLD. Australia. ACM; 2014.