Recent Publication Highlights

Here we highlight some of our recent or significant publications (for a full list of publications, please see the CSIRO Research Publications Repository data for the team):

2025

Mong Yuan Sim, Wei Emma Zhang, Xiang Dai, and Biaoyan Fang. 2025. Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Model. In Findings of the Association for Computational Linguistics: ACL 2025 (CORE A*)
Vincent Nguyen, Sarvnaz Karimi, Willow Hallgren and Mahesh Prakesh. 2025. Question answering in Climate Adaptation for Agriculture: Model Development and Evaluation with Expert Feedback. In Findings of the Association for Computational Linguistics: ACL 2025 (CORE A*)
Necva Bolucu, Jordan Pennells, Huichen Yang, Maciej Rybinski, and Stephen Wan. “An Evaluation of Large Language Models for Supplementing a Food Extrusion Dataset”. Foods, 2025.
Anuradha Wickrammarachchi, Shakila Tonni, Sonali Majumdar, Sarvnaz Karimi, Sulev Koks, Brendan Hosking, Jordi Rambla, Natalie A Twine, Yatish Jain, Denis C Bauer. “AskBeacon – Performing genomic data ex-change and analytics with natural language”. Bioinformatics, 2025.
Shakila Mahjabin Tonni, Pedro Faustini, and Mark Dras. “Graded Suspiciousness of Adversarial Texts to Humans”. Computational Linguistics, 2025.
Chao Wang, Hehe Fan, Huichen Yang, Sarvnaz Karimi, Lina Yao, and Yi Yang. 2025. Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR, CORE A*)

2024

Xiang Dai, Sarvnaz Karimi, Abeed Sarker, Ben Hachey and Cecile Paris. “MultiADE: A Multi-domain Benchmark for Adverse Drug Event Extraction”. Journal of Biomedical Informatics, 2024.
Kateryna Kasianenko, Shima Khanehzar, Stephen Wan, Ehsan Dehghan, Axel Bruns. 2024. Detecting Online Community Practices with Large Language Models: A Case Study of Pro-Ukrainian Publics on Twitter. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP, CORE A*).
Biaoyan Fang, Xiang Dai, Sarvnaz Karimi. 2024. Understanding Faithfulness and Reasoning of Large Language Models on Plain Biomedical Summaries. In Findings of the Association for Computational Linguistics: EMNLP 2024 (CORE A*).
Xiang Dai, Sarvnaz Karimi, Biaoyan Fang. 2024. A Critical Look at Meta-evaluating Summarisation Evaluation Metrics. In Findings of the Association for Computational Linguistics: EMNLP 2024 (CORE A*).
Maciej Rybinski, Wojciech Kusa, Sarvnaz Karimi, and Allan Hanbury. “Learning to Match Patients to Clinical Trials Using Large Language Models”. Journal of Biomedical Informatics, 2024.
Wei Liu, Stephen Wan, Michael Strube. 2024. What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL, CORE A).
Yulia Otmakhova, Shima Khanehzar, Lea Frermann. 2024. Media Framing: A Typology and Survey of Computational Approaches Across Disciplines. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL, CORE A*)
Biaoyan Fang, Ritvik Dinesh, Xiang Dai, Sarvnaz Karimi. 2024. Born Differently Makes a Difference: Counterfactual Study of Bias in Biography Generation from a Data-to-Text Perspective. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL, CORE A*).
Crispin Almodovar, Fariza Sabrina, Sarvnaz Karimi, and Salahuddin Azad. “LogFiT: Log anomaly detection using fine-tuned language models”. IEEE Transactions on Network and Service Management, 2024.
Necva Bolucu, Maciej Rybinski, Xiang Dai, and Stephen Wan. “An adaptive approach to noisy annotations in scientific information extraction”. Information Processing & Management, 2024.

2023

Maciej Rybinski, Vincent Nguyen, and Sarvnaz Karimi. 2023. A Self-Learning Resource-Efficient Re-Ranking Method for Clinical Trials Search. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management.
Maciej Rybinski, Stephen Wan, Sarvnaz Karimi, Cecile Paris, Brian Jin, Neil Huth, Peter Thorburn, and Dean Holzworth. 2023. SciHarvester: Searching Scientific Documents for Numerical Values. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3135-3139.
Vincent Nguyen, Sarvnaz Karimi, Maciej Rybinski and Zhenchang Xing. 2023. MedRedQA for Medical Consumer Question Answering: Dataset, Tasks, and Neural Baselines. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. Bali, Indonesia.
Vincent Nguyen, Sarvnaz Karimi and Zhenchang Xing. “DeBEIR: A Python Package for Dense Bi-Encoder Information Retrieval”, Journal of Open Source Software 8 (87), 5017. 2023.
Xiang Dai, Sarvnaz Karimi, Stephen Wan. 2023. Rethinking the Role of Entity Type in Relation Classification. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. Bali, Indonesia.
Hubert D Zajac, Dana Li, Xiang Dai, Jonathan F Carlsen, Finn Kensing, Tariq O Andersen, “Clinician-facing AI in the Wild: Taking Stock of the Sociotechnical Challenges and Opportunities for HCI”, ACM Transactions on Computer-Human Interaction, 2023.
Necva Bölücü, Maciej Rybinski, Stephen Wan. 2023. Impact of Sample Selection on In-Context Learning for Entity Extraction from Scientific Writing. In Findings of the Association for Computational Linguistics: EMNLP 2023.
Biaoyan Fang, Trevor Cohn, Timothy Baldwin, Lea Frermann. 2023. More than Votes? Voting and Language based Partisanship in the US Supreme Court. In Findings of the Association for Computational Linguistics: EMNLP 2023.

2022

Maciej Rybinski, Liam Watts, and Sarvnaz Karimi. 2023. A2A-API: A Prototype for Biomedical Information Retrieval Research and Benchmarking. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3318-3322. 2022.
Vincent Nguyen, Maciej Rybinski, Sarvnaz Karimi, and Zhenchang Xing. “Search like an expert: Reducing expertise disparity using a hybrid neural index for COVID-19 queries.” Journal of Biomedical Informatics 127 (2022): 104005.
Xiang Dai, Ilias Chalkidis, Sune Darkner, and Desmond Elliott. 2022. Revisiting Transformer-based Models for Long Document Classification. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7212–7230, Abu Dhabi, United Arab Emirates.

2021

Maciej Rybinski, and Sarvnaz Karimi. “Will Sorafenib Help? Treatment-aware Reranking in Precision Medicine Search.” In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3403-3407. 2021.
Maciej Rybinski, Sarvnaz Karimi, and Aleney Khoo. “Science2Cure: A Clinical Trial Search Prototype.” In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2620-2624. 2021.
Rybinski, Maciek; Dai, Xiang; Singh, Sonit; Karimi, Sarvnaz; Nguyen, Anthony. Family History Extraction from Electronic Health Records. Journal of Medical Internet Research. 2021; 9(4):e24020.
https://doi.org/10.2196/24020
Singh, Sonit; Karimi, Sarvnaz; Ho-Shon, Kevin; Hamey, Len. Show, Tell and Summarise: Learning to Generate and Summarise Radiology Findings from Medical Images. Neural Computing and Applications. 2021; 33:7441–7465.
https://doi.org/10.1007/s00521-021-05943-6
Malko, Anton; Paris, Cecile; Duenser, Andreas; Kangas, Maria; Molla-Aliod, Diego; Sparks, Ross; et al. Demonstrating the Reliability of Self-Annotated Emotion Data. In: The Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access (CLPsych 2021); Virtual. 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021); 2021. 10p.
https://doi.org/10.18653/v1/2021.clpsych-1.5
Yufei Wang, Can Xu, Huang Hu, Chongyang Tao, Stephen Wan, Mark Dras, Mark Johnson, and Daxin Jiang. “Neural rule-execution tracking machine for transformer-based text generation.” Advances in Neural Information Processing Systems 34 (2021): 16938-16950.
Yufei Wang, Ian Wood, Stephen Wan, Mark Dras, and Mark Johnson. “Mention flags (MF): Constraining transformer-based text generators.” In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 103-113. 2021.
Ian Wood, Mark Johnson, and Stephen Wan. “Integrating lexical information into entity neighbourhood representations for relation prediction.” In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3429-3436. 2021

2020

Chen, Caron; Paris, Cecile; Reeson, Andrew. The impact of social ties and SARS memory on the public awareness of 2019 novel coronavirus (SARS‑CoV‑2) outbreak. Scientific Reports. 2020; 10:Article 18241.
https://doi.org/10.1038/s41598-020-75318-9
Ghafari, Seyed Mohssen; Beheshti, Amin; Joshi, Aditya; Paris, Cecile; Mahmood, Adnan; Yakhchi, Shahpar; et al. A Survey on Trust Prediction in Online Social Networks. IEEE Access. 2020; 8:144292-144309.
https://doi.org/10.1109/ACCESS.2020.3009445
Grobler, Marthie; Mahawaga Arachchige, Chamikara; Abbott, Jacob Edward; Jeong, Jongkil Jay; Nepal, Surya; Paris, Cecile. The importance of social identity on password formulations. Personal and Ubiquitous Computing. 2020; 1(1 1):25.
https://doi.org/10.1007/s00779-020-01477-1
Hassanzadeh, Hamed; Karimi, Sarvnaz; Nguyen, Anthony. Matching Patients to Clinical Trials Using Semantically Enriched Document Representation. Journal of Biomedical Informatics (JBI). 2020; 105:103406.
https://doi.org/10.1016/j.jbi.2020.103406
Joshi, Aditya; Sparks, Ross; Karimi, Sarvnaz; Yan, Sheng-lun Jason; Chughtai, Abrar Ahmad; Paris, Cecile; et al. Automated Monitoring of Tweets for Early Detection of the 2014 Ebola Epidemic. PLoS ONE. 2020; 15(3):e0230322.
https://doi.org/10.1371/journal.pone.0230322
Joshi, Aditya; Sparks, Ross; McHugh, James; Karimi, Sarvnaz; Paris, Cecile; MacIntyre, C Raina. Harnessing Tweets for Early Detection of an Acute Disease Event. Epidemiology. 2020; 31(1):90-97.
https://doi.org/10.1097/EDE.0000000000001133
Nugroho, Robertus; Paris, Cecile; Nepal, Surya; Yang, Jian; Zhao, Weiliang. A Survey of Recent Methods on Deriving Topics from Twitter: Algorithm to Evaluation. Knowledge and Information Systems. 2020; 62(7):2485-2519.
https://doi.org/10.1007/s10115-019-01429-z
Rybinski, Maciek; Karimi, Sarvnaz; Nguyen, Vincent; Paris, Cecile. A2A: A Platform for Research in Biomedical Literature Search. BMC Bioinformatics. 2020. 27.
https://doi.org/10.1186/s12859-020-03894-8
Rybinski, Maciek; Xu, Jerry; Karimi, Sarvnaz. Clinical Trial Search: Using Biomedical Language Understanding Models for Re-Ranking. Journal of Biomedical Informatics. 2020. 9.
https://doi.org/10.1016/j.jbi.2020.103530
Sparks, Ross; Jin, Brian; Karimi, Sarvnaz; Paris, Cecile; MacIntyre, Raina. Real-time monitoring of events applied to syndromic surveillance. Quality Engineering. 2020; 1(1 1):1-23.
http://hdl.handle.net/102.100.100/367467?index=1
Tay, Wenyi; Zhang, Xiuzhen; Karimi, Sarvnaz. Beyond Mean Rating: Probabilistic Aggregation of Star Ratings based on Helpfulness. Journal of the Association for Information Science and Technology. 2020; 71(7):784-799.
https://doi.org/10.1002/asi.24297
Biddle, Rhys; Joshi, Aditya; Liu, Shaowu; Paris, Cecile; Xu, Guandong. Leveraging Sentiment Distributions to Distinguish Figurative From Literal Health Reports on Twitter. In: The Web Conference; April 2020; Taipei, Taiwan. ACM; 2020. 1217-1227.
https://doi.org/10.1145/3366423.3380198
Dai, Dai; Karimi, Sarvnaz; Hachey, Ben; Paris, Cecile. An Effective Transition-based Model for Discontinuous NER. In: Annual Conference of the Association for Computational Linguistics; Seattle, Washington. ACL; 2020. 5860-5870.
http://hdl.handle.net/102.100.100/422575?index=1
Ghafari, Seyed Mohssen; Beheshti, Amin; Joshi, Aditya; Paris, Cecile; Yakhchi, Shahpar; Jolfaei, Alireza; et al. Dynamic Deep Trust Prediction for Social Internet of Things. In: International Conference on Advances in Mobile Computing & Multimedia; Chiang Mai, Thailand. New York: ACM; 2020. 11-19.
https://doi.org/10.1145/3428690.3429167
Jena, Amit; Engelke, Ulrich; Dwyer, Tim; Raiamanickam, Venkatesh; Paris, Cecile. Uncertainty Visualisation: An Interactive Visual Survey. In: Pacific Visualization Symposium (PacificVis); Tianjin, China. IEEE; 2020. 201-205.
http://hdl.handle.net/102.100.100/391816?index=1
Kuo, Nicholas; Harandi, Mehrtash; Fourrier, Nicolas; Walder, Christian; Ferraro, Gabriela; Suominen, Hanna. An Input Residual Connection for Simplifying Gated Recurrent Neural Networks. In: 2020 International Joint Conference on Neural Networks (IJCNN); UK. IEEE; 2020. 1-8.
https://doi.org/10.1109/IJCNN48605.2020.9207238
Kuo, Nicholas; Harandi, Mehrtash; Fourrier, Nicolas; Walder, Christian; Ferraro, Gabriela; Suominen, Hanna. An Input Residual Connection for Simplifying Gated Recurrent Neural Networks. In: International Joint Conference Neural Networks; Glasgow (UK). IEEE; 2020. 1-6.
http://hdl.handle.net/102.100.100/367002?index=1
Kuo, Nicholas; Harandi, Mehrtash; Fourrier, Nicolas; Walder, Christian; Ferraro, Gabriela; Suominen, Hanna. M2SGD: Learning to Learn Important Weights. In: Workshop on Continual Learning in Computer Vision 2020; Seattle, US. Computer Vision Foundation; 2020. 236-237.
http://hdl.handle.net/102.100.100/365619?index=1
Liu, Fanzhen; Xue, Emma; Wu, Jia; Zhou, Chuan; Hu, Wenbin; Paris, Cecile; et al. Deep learning for community detection: Progress, challenges and opportunities. In: International Joint Conference on Artificial Intelligence; Yokohama, Japan. IJCAI Organization; 2020. 7.
http://hdl.handle.net/102.100.100/369079?index=1
Power, Robert; Robinson, Bella; Dennett, Amanda; Jin, Brian; Paris, Cecile. Understanding the Mood of Social Media Messages. In: Hawaii International Conference on System Sciences (HICSS); Hawaii. csiro; 2020. 10.
https://doi.org/10.24251/HICSS.2020.300
Wu, Tina; Zhang, Rongjunchen; Ma, Wanlun; Wen, Sheng; Xia, Xin; Paris, Cecile; et al. What risk? I don’t understand. An Empirical Study on Users’ Understanding of the Terms Used in Security Texts. In: Asia CCS 2020; October 2020; Taipei, Taiwan. ACM; 2020. 248-262.
https://doi.org/10.1145/3320269.3384761
Xu, Chang; Paris, Cecile; Nepal, Surya; Sparks, Ross; Long, Chong; Wang, Yafang. DAN: Dual-View Representation Learning for Adapting Stance Classifiers to New Domains. In: The 24th European Conference on Artificial Intelligence; Spain. IOS; 2020. 2260-2267.
https://doi.org/10.3233/FAIA200353
Xu, Chang; Paris, Cecile; Sparks, Ross; Nepal, Surya; Vander Linden, Keith. Assessing Social License to Operate from the Public Discourse on Social Media. In: Nuria Bel and Chengqing Zong, editor/s. 28th International Conference on Computational Linguistics: Industry Track; Barcelona, Spain. International Committee on Computational Linguistics; 2020. 149-159.
https://doi.org/10.18653/v1/2020.coling-industry.14
Zhang, Shiwei; Zhang, Xiuzhen; Lau, Jey Han; Chan, Jeffrey; Paris, Cecile. Discovering Relevant Reviews for Answering Product-related Queries. In: International Conference on Data Mining; 8-11 November 2019; Beijing, China. IEEE; 2020. 2374-8486.
http://hdl.handle.net/102.100.100/368351?index=1
Zhang, Shiwei; Zhang, Xiuzhen; Lau, Jey Han; Chan, Jeffrey; Paris, Cecile. Less is More: Rejecting Unreliable Reviews for Product Question Answering. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases; 14 September 2020; Ghent Belgium. Springer; 2020. 16.
http://hdl.handle.net/102.100.100/368348?index=1
Zhang, Shiwei; Zhang, Xiuzhen; Lau, Jey Han; Chan, Jeffrey; Paris, Cecile. Less is More: Rejecting Unreliable Reviews for Product Question Answering.. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML – PKDD); September 14-18; online. Cham: Springer; 2020. 567-583.
https://doi.org/10.1007/978-3-030-67664-3_34

2019

Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cécile Paris, and C. Raina Macintyre. 2019. Survey of Text-based Epidemic Intelligence: A Computational Linguistics Perspective. ACM Computing Surveys Vol 52, Issue 6, Article 119 (October 2019), 19 pages. DOI: https://doi.org/10.1145/3361141
Aditya Joshi, Ross Sparks, James McHugh, Sarvnaz Karimi, Cecile Paris, C Raina MacIntyre (2019) Harnessing tweets for early detection of an acute disease event, Epidemiology, Wolters Kluwer Health, 2019. In Press.
Adith Iyer, Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, Figurative Usage Detection of Symptom Words to Improve Personal Health Mention Detection. To appear in The Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019). Florence, Italy. Association for Computational Linguistics.
Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris and C Raina MacIntyre (2019) A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics, To appear in The Proceedings SIGBioMed Workshop on Biomedical Natural Language Processing (BioNLP) at ACL 2019, Florence, Italy, July 2019.
Wang, Y., Johnson, M., Wan, S., Sun, Y., and Wang, W. (2019) How to best use Syntax in Semantic Role Labelling. To appear in The Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019). Florence, Italy. Association for Computational Linguistics.
Nezami, O., Dras, M., Wan, S., Paris, C., and Hamey, L. (2019) Automatic Recognition of Student Engagement using Deep Learning and Facial Expression. To appear in The Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019). Würzburg, Germany. September.
Nezami, O., Dras, M., Wan, S., Paris, C., and Hamey, L. (2019) Towards Generating Stylized Image Captions via Adversarial Training. To appear in The Proceedings of the 16^th Pacific Rim International Conference on Artificial Intelligence (PRICAI 2019). Nadi, Fiji. August.

2018

Aditya Joshi, Xiang Dai, Sarvnaz Karimi, Ross Sparks, Cecile Paris, C Raina MacIntyre (2018) Shot Or Not: Comparison of NLP Approaches for Vaccination Behaviour Detection, In the Proceedings of the Social Media Mining for Health Applications (SMM4H) Workshop at EMNLP 2018, Brussels, Belgium, November 2018.

2017

Nguyen, V., Karimi, S., Falamaki, Sara; Molla-Aliod, D.; Paris, C.; Wan, S. (2017) CSIRO at 2017 TREC Precision Medicine Track. In the Proceedings of the Text Retrieval Conference (TREC), Gaithersburg, MD, USA, Nov 2017. NIST.
Kim, M.; Xu, Q.; Qu, L.; Wan, S.; Paris, C. (2017) Demographic Inference on Twitter using Recursive Neural Networks. In the Proceedings of the 55th annual meeting of the Association for Computational Linguistics (ACL); July 31 – August 2, 2017; Vancouver, Canada. Association for Computational Linguistics; 2017. 471–477.
Kim, S., Kageura, K., McHugh, J., Nepal, S., Paris, C., Robinson, B., Sparks, R., and Wan, S. (2017) Twitter Content Eliciting User Engagement: A Case Study on Australian Organisations. In the Proceedings of the 26^th International World Wide Web Conference. Perth, Australia, April, Association for Computing Machinery. 2017.
Kim, S., Paris, C., Power, R., and Wan, S. (2017) Distinguishing Individuals from Organisations on Twitter. In the Proceedings of the 26^th International World Wide Web Conference. Perth, Australia, April, Association for Computing Machinery. 2017.

2016

Wang, Y.; Wan, S. & Paris, C. (2016) The Role of Features and Context on Suicide Ideation Detection. In the Proceedings of the Australasian Language Technology Association Workshop 2016, 2016, 94-102
Kim, S. M.; Wan, S. & Paris, C. (2016) Detecting Social Roles in Twitter. In the Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, Association for Computational Linguistics, 2016, 34-40
Kim, M. S.; Wan, S.; Paris, C.; Brian, J. & Robinson, B. (2016) The Effects of Data Collection Methods in Twitter. In the Proceedings of the First Workshop on NLP and Computational Social Science, Association for Computational Linguistics, 2016, 86-91
Jayasinghe, G.; Jin, B.; Mchugh, J.; Robinson, B. & Wan, S. (2016) CSIRO Data61 at the WNUT Geo Shared Task. In the Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), The COLING 2016 Organizing Committee, 2016, 218-226
Kim, S. M.; Wang, Y.; Wan, S. & Paris, C. (2016) Data61-CSIRO systems at the CLPsych 2016 Shared Task. In the Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology. Association for Computational Linguistics, 2016, 128-132

2015

B. O’Dea, S. Wan, P.J. Batterham, A.L. Calear, C. Paris, H. Christensen (2015) Detecting suicidality on Twitter. In Internet Interventions 2 (2), 183-188.