Privacy-preserving Cloud Data MaaS

Funded by ARC (LP160101766)
Partners: Professor Xun Yi, A/Prof Ibrahim Khalil, Emeritus Prof Jennifer Seberry, Prof Elisa Bertino
Duration: 2017 – 2020

With the advent of cloud computing, there has been increasing interest in the paradigm of data mining-as-a-service, where a company lacking of expertise of computation resources outsources its mining needs to the cloud. The outsourced data and mined patterns are considered private property of the corporation and must be protected against outsiders, including the cloud service provider .

The discovery of association rules, classification rules and clusters among huge amounts of data is useful to business intelligence. However, most of small companies are often lacking of expertise of data mining techniques but concerning the privacy of their business data. The outcomes of this project will help such corporations to cut costs on data mining and privacy protection, and thus focus on their core business.

Research on how to outsource data mining tasks to the cloud with data privacy just emerged in recent years. There exist plenty of challenges and opportunities in the new research area. The innovations of this project, such as new models for data privacy and practical privacy-preserving data mining algorithms for cloud data, will have the potential to impact on the future research in private cloud computing.

Current solutions for outsourcing mining needs to the cloud are mainly based on ?-support or ?-privacy. The basic idea is to add fake items or transactions into the original data on the basis of ?-anonymity before the data is uploaded onto the cloud. There are four challenging problems with the current solutions. Current data privacy model based on ?-anonymity offers a low-level data privacy only. The property of indistinguishing an item from other ? − 1 items does not guarantee a high-level data privacy. Current approaches for k-support or ?-privacy are as complicated as data mining algorithms. If a user can perform ?-support or ?-privacy, he is often capable to perform data mining on his data locally. Current solutions achieve ?-support and ?-privacy by the user adding fake data into the original data. The cloud mines the modified data and the mined results cannot be accurate. Current solutions mainly consider how to outsource the association rules mining task to the cloud. Other tasks, such as classification and clustering, have not yet been taken into account.

This project aims solving these problems by using Intel Software Guard Extensions (Intel SGX) technology. Intel SGX is a set of extensions to the Intel architecture that aims to provide integrity and confidentiality guarantees to security sensitive computation performed on a computer where all the privileged software (kernel, hypervisor, etc) is potentially malicious. Our basic approach is that the user encrypts his data, stores the encrypted data in the cloud and outsources the mining tasks to multiple servers in the cloud. The servers cooperate to mine the encrypted data with Intel SGX technology without knowing the original data and return the mined patterns to the user, such that only the user can recover the encrypted data and mined patterns.