An Automated system based on Cross-Linguistic Natural Language Processing to analyse and understand emerging trends and issues on Social Media Platforms

September 1st, 2024

R&D Focus Areas:
Social licence, Communication and engagement

Lead Organisation:
Queensland University of Technology (QUT)

Funding:
Future Energy Exports CRC (PhD Project)

Status:
Active

Start date:
2022

Completion date:
2025

Project summary description:
This project aims to build an automated system to analyse and understand people’s perceptions, sentiments, and behaviour towards emerging trends and issues on social media platforms with respect to Australia, considered as a hydrogen export industry. This project aims to collect a multilingual dataset to understand global perspectives, creating a general-purpose system applicable to various industries and platforms.

This project attempts to comprehend people’s behaviour towards Australia as a hydrogen export industry. It involves gathering multilingual Twitter (now named X) data, using the academic Twitter API to compile a dataset spanning a decade (2013-2022) and containing 30 million tweets in English, Japanese, Korean, and Hindi.

The main challenge is the multilingual nature of the corpus, requiring traditional natural language processing techniques such as tokenization, named entity recognition, topic modelling, and sentiment analysis to develop new algorithms.

Ultimately, the project aims to design novel classification techniques to identify relevant data on hydrogen energy in multiple languages and extract key themes.

Further information:
https://www.fenex.org.au/connect/

 

September 2024