For an increasing number of usage scenarios, data analytics and machine learning demonstrate effectiveness in handling real-world tasks with little human involvement. The demand for constructing such kinds of systems rises rapidly but the actual construction of such a system requires lots of effort. It is not only because such systems are often data and compute intensive, which is a challenge for decision making in a timely manner, but also because there are always unexpected situations in real-world applications and evaluating data analytics outcomes or machine learning models learned from known data under new situations as well as under various policy frameworks is non-trivial. Our research intends to meet these demands by studying solutions of the following problems:
We investigate processes of machine learning models construction through reviewing existing practices and building concrete data analytics applications. We develop software systems to make these processes efficient and manageable, and validate our solutions by applying them to concrete applications in our target areas to measure effectiveness.
Our team has solid skills and track records on software engineering, distributed systems and parallel computing research, which are fundamental to this topic.
 H. Wu, C. Wang, Y. Fu, S. Sakr, L. Zhu, K. Lu, HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud, 33rd International Conference on Massive Storage Systems and Technology (MSST 2017)
 X. Zhou, L. Chen, Y. Zhang, D. Qin, L. Cao, G. Huang, C. Wang, Enhancing online video recommendation using social user interactions,The VLDB Journal, 2017 (in press)
 D. Wu, S. Sakr, L. Zhu, H. Wu, Towards Big Data Analytics across Multiple Clusters. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid ’17), 2017
 D. Wu, L. Zhu, Q. Lu, S. Sakr, HDM: A Composable Framework for Big Data Processing, IEEE Transactions on Big Data , 2017 (in press)
 C. Wang, S. Karimi, Parallel Duplicate Detection in Adverse Drug Reaction Databases with Spark, 19th International Conference on Extending Database Technology (EDBT), 2016.
 C. Wang, S. Karimi, Causality driven data integration for adverse drug reaction discovery, Health Informatics Society Australia (HISA) Big Data Conference 2014.
 K. Ye, Z. Wu, C. Wang, B. B. Zhou, W. Si, X. Jiang, A. Y. Zomaya, “Profiling-Based Workload Consolidation and Migration in Virtualized Data Centers,” in IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 3, pp. 878-890, March 2015.
 X. Liu, C. Wang, B. B. Zhou, J. Chen, T. Yang and A. Y. Zomaya, “Priority-Based Consolidation of Parallel Workloads in the Cloud,” in IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 9, pp. 1874-1883, Sept. 2013.
 J. Chen, C. Wang, B. B. Zhou, L. Sun, Y. C. Lee, and A. Y. Zomaya. 2011. Tradeoffs between Profit and Customer Satisfaction for Service Provisioning in the Cloud. In Proceedings of the 20th international symposium on High performance distributed computing (HPDC ’11). ACM, New York, NY, USA, 229-238.
 C. Wang, Y. Zhou, A Collaborative Monitoring Mechanism for Making a Multitenant Platform Accountable. USENIX HotCloud 2010.
Chen Wang (firstname.lastname@example.org)
Architecture & Analytics Platforms, Data61, CSIRO