August 2020

Publications:

Sharif Abuadbba: Here’s sharing of our accepted paper at SRDS 2020 – Ranked “A” in the area of reliable distributed systems. It explores the boundaries of two recent distributed machine learning techniques called Federated Learning and Split Learning. We have some interesting findings of their limitations in limited IoT environment. It is the first study that tackles the challenges of deploying the training process of FL and SL into real IoT devices with scaling. We also make the available implementation public for researchers to explore various applications.
Watch the youtube demo here https://lnkd.in/gmnviE6
Huy Quoc Le, Dung Hoang Duong, Ha Thanh Nguyen Tran, Viet Cuong Trinh, Thomas Plantard, Willy Susilo, Josef Pieprzyk; ‘Lattice Blind Signature with Forward Secrecy’, ACISP 2020, Best Paper Award
Nazatul H. Sultan, Vijay Varadharajan, Seyit Camtepe, and Surya Nepal, ‘An Accountable Access Control Scheme for Hierarchical Content in Named Data Networks with Revocation’, Conference: 25th European Symposium on Research in Computer Security (ESORICS) 2020, Conference Rank: A
Zhi Zhang, Yueqiang Cheng, Dongxi Liu , Surya Nepal , Zhi Wang, and Yuval Yarom, ‘PThammer: Cross-User-Kernel-Boundary Rowhammer through Implicit Accesses’, 53rd IEEE/ACM International Symposium on Microarchitecture. Accepted.
Yansong Gao, Minki Kim, Sharif Abuadbba, Yeonjae Kim, Chandra Thapa, Kyuyeon Kim, Seyit A. Camtepe, Hyoungshick Kim, and Surya Nepal, ‘End-to-End Evaluation of Federated Learning and Split Learning for Internet of Things’, The 39th International Symposium on Reliable Distributed Systems (SRDS), 2020 Accepted.
Binanda Sengupta, Akanksha Dixit and Sushmita Ruj, ‘Secure Cloud Storage with Data Dynamics Using Secure Network Coding Techniques’, IEEE Transactions on Cloud Computing. Accepted.
Nadeem Ahmed, Regio A. Michelin, Wanli Xue, Sushmita Ruj, Robert Malaney, Salil S. Kanhere, Aruna Seneviratne, Wen Hu, Helge Janicke, Sanjay Jha, ‘A Survey of COVID-19 Contact Tracing Apps’, IEEE Access, Accepted.

Students:

4 students have submitted their theses since the end of June: Congratulations to Peter Eze, Nan Sun, Jishan Giti and Ahmad Salehi Shahraki.
Let’s meet one of our students:

I’m Nan Sun, a Ph.D. student at Deakin University and Data 61. My thesis, entitled “Data-driven cybersecurity incident prediction and discovery.” When a cybersecurity threat is detected, there is a high possibility that severe losses have already been generated, such as data leakage, financial losses, and even reputation damages. Proactively predicting or discovering cyber incidents relied on perceived indicators of cybersecurity threats that can fill the gap, which motivates us to investigate and develop effective methods and systems to discover and predict cybersecurity incidents based on crowdsourced data. In the past three years, we started with the literature review, and we summarized the data-driven modeling methodology adopted in this fast-growing area. In consonance with the phases of the methodology through the literature review, we firstly propose an innovative method that begins with data collection, follows by information representation on the knowledge graph and settles with security recommendations based on systematic data analysis. To further unearth “buried treasure” hidden behind the security-related data, we design a new cybersecurity information retrieval schema that supports the automatic indexing and searching based on the pragmatics of the context and corresponding hidden metadata. We implemented an interactive, cybersecurity domain search engine based on the proposed schema. The fetched results are presented to users with interpretations and security expert knowledge based on supplementary data analytics and visualizations techniques.

It is an excellent opportunity for me to research in Data 61. Before pursuing a Ph. D. study, Data 61 provided me the opportunity to do a summer project, which also encouraged me to pursue a Ph.D. Without the resources provided by Data 61, I would not have come this so far. I am also grateful to Data61, for helping me develop my insights into various research fields, especially in cybersecurity and data science. Thanks to Dr. Marthie Grobler for her administrative support. Thanks to Dr. Seyit Camtepe for deepening my understanding of cybersecurity with useful discussion and specific comments on my project.

Media Activities

Arindam Pal, “This new phishing detection system can decode fraudsters” is published in the Data61 Algorithm magazine. https://algorithm.data61.csiro.au/this-new-phishing-detection-system-can-keep-up-with-fraudsters/

Events

The 2020 Workshop on Human Centric Software Engineering & Cyber Security (HCSE&CS-2020) will be co-hosted with the 35th IEEE/ACM International Conference on Automated Software Engineering and will now be taking a virtual format, from 21 to 25 September 2020.
The Human Centric Security team was successful in being awarded a Cutting Edge Science and Engineering Symposium award for 2019/2020. The jointly hosted symposium “Advances in personalised healthcare and wellbeing support technologies (OzDHI2020)” will be organised by the Precision Health Future Science Platform, CSIRO’s Data61 and CSIRO’s Health and Biosciences. Originally scheduled for 20 May 2020, the Symposium has been rescheduled for the 19/5/2021. For more details, please email ozdhi@csiro.au.
DSS is annually partnering with DSTG to host a national Cyber Security Summer School. This event brings together high profile local and international speakers on the topic of cyber security. After due consideration and anticipating a lower than expected number of attendees due to the impact of COVID-19 on participants, the organising committee and sponsors have agreed that the Cyber Security Summer School (CSSS2020) will not be continuing as planned on 26 and 27 March 2020. We aim to postpone CSSS2020, but the date and relevant details will be confirmed at a later stage. For more information visit http://research.csiro.au/csss.

For more information on our events, visit : link

Projects

Highlight on one of our projects: Machine Learning Algorithms for Detecting Phishing Websites, Arindam Pal, Data61, in collaboration with professors Sanjay Jha and Alan Blair, and their PhD student Rizka Purwanto, all of whom are from University of New South Wales, Sydney.

Phishing is the fraudulent attempt to obtain sensitive information such as usernames, passwords, and credit card details by disguising oneself as a trustworthy entity in an electronic communication. Typically carried out by email spoofing and instant messaging, it often directs users to enter personal information at a fake website, which matches the look and feel of the legitimate site. Phishing is an example of social engineering techniques being used to deceive users. Users are often lured by communications purporting to be from trusted parties such as social networking websites, auction sites, banks, online payment processors, and IT administrators.

Attempts to deal with phishing incidents include legislation, user training, public awareness, and technical security measures, due to phishing attacks frequently exploiting weaknesses in current Internet security. The number of phishing attacks has grown significantly in the past few years. The Anti-Phishing Working Group (APWG) recorded a significant increase of unique phishing attacks from 2017 to 2019, causing considerably high financial loss of about $3 billion USD per year in the United States alone. The number of attacks is likely to increase in the future with the availability of phishing toolkits and algorithms which ease the process of phishing. The dynamics in phishing behaviours bring challenges in implementing a robust and accurate phishing detection for the long term.

To mitigate the negative impacts of phishing, security software providers, financial institutions, and academic researchers have studied various approaches to build an automated phishing website detection system. These methods include the use of blacklists and detecting phishing websites by investigating the website content, URL, and web-related features.

Our goal is to design new algorithms and systems to detect and prevent phishing websites before they can do any harm to the users.

What is the crux of the problem?

Phishers and fraudsters send emails to unsuspecting users with a link to a popular website, such as a bank (ANZ) or an online retail store (Amazon). When the user clicks on the link, (s)he is taken to a website with a login page, which looks almost like the original website. When the user enters his username and password, (s)he is shown an error message. The fraudsters store the username and password in a database and use it to steal identity and do online frauds. The problem in detecting such sites is that fraudsters change the URL and the email addresses very frequently (typically within hours), so it’s very difficult to keep a list of all phishing sites and update it frequently, before they become obsolete.

What have we done to solve the problem?

We have designed and implemented several novel machine learning algorithms to solve this problem. The first uses file compression to distinguish phishing websites from legitimate websites. The second uses computer vision-based algorithms to classify websites using their screenshots.

What is the science behind it?

These are our contributions for the compression-based algorithm PhishZip.

We introduce a systematic process of selecting meaningful words which are associated with phishing and non-phishing websites by analysing the likelihood of word occurrences and calculating the optimal likelihood threshold. These words are used as the predefined compression dictionary for our compression models.
We develop a software called PhishZip which performs phishing website detection using the DEFLATE compression algorithm. To the best of our knowledge, this work is the first to use compression algorithms to perform phishing website classification. Unlike machine learning based models, performing classification by leveraging compression algorithms does not require training the models nor require performing HTML parsing. Thus, classification with compression algorithms is faster and simpler.
We propose the use of compression ratio as a novel machine learning feature which is robust and easy to extract. Compression ratio measures the distance or cross-entropy between the predicted website and phishing/non-phishing website content distribution. The high compression ratio is associated with low cross-entropy, which indicates that the content distribution is similar to the common word distribution in phishing and non-phishing websites.

PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites, Rizka Purwanto, Arindam Pal, Alan Blair and Sanjay Jha, IEEE Conference on Communications and Network Security (CNS 2020), Avignon, France.

PhishTank: https://www.phishtank.com

Seminars

We are organising monthly free seminars in collaboration with Cyber Security CRC opened to all, on Cyber security technical topics, inviting top experts from around the world as guest speakers.

Visit our SAO seminar page for more information. https://research.csiro.au/cybersecurity-quantum-systems/our-sao-seminars/