Gartner has stated “Through 2015, 80% of outages impacting mission-critical services will be caused by people and process issues, and more than 50% of those outages will be caused by change/configuration/release integration and hand-off issues.” Our vision is make a dramatic reduction in these numbers for consumers of cloud services.
This is difficult because the consumer has limited visibility and control over the cloud environment. Predicting and controlling reliability and performance of applications must rely on the visibility and control granted to the consumer by the providers of the cloud.
Our approach relies on creating a process model for each operations process and using that model to guide near real time detection, diagnosis, and recovery from errors in the execution of the process.
Please find the videos about POD-Discovery and POD-Viz as below (full-screen watching recommended):
Please refer to the following linked posters for details.
Two components of the POD framework are publicly available and their source code can be found here.
- POD-CCaaS: The POD Conformance Checking Service provides functionality to investigate whether a sequence of activities that is observed during runtime deviates from the expected behavior. In this regard, the service provides functionality to analyze if the observed order of the activities conforms to a predefined model and if the execution of the activities is within the expected time frame.
- POD-Assertion Evaluation: The Assertion Evaluation can be used to examine the impact of the process execution on its environment. In particular, the process execution can be monitored with regard to assertions that are defined offline and that refer to the state of the environment in which the process is executed.
- Mostafa Farshchi, Jean-Guy Schneider, Ingo Weber, and John Grundy: R2c: Robust rolling-upgrade in clouds. accepted by Journal of Systems and Software, April 2017.
- Daniel Sun, Alan Fekete, Vincent Gramoli, Guoqiang Li, Xiwei Xu, and Liming Zhu: Metric selection and anomaly detection for cloud operations using log and metric correlation analysis. IEEE Transactions on Dependable and Secure Computing, (32.3) 2016.
- Nick van Beest, Ingo Weber: Behavioral Classification of Business Process Executions at Runtime. PRAISE 2016, Rio de Janeiro, Brazil, September, 2016.
- [PDF] Min Fu, Liming Zhu, Ingo Weber, Len Bass, Anna Liu and Sherry Xu: Process-oriented non-intrusive recovery for sporadic operations on cloud. DSN 2016, Toulouse, France, July, 2016.
- [PDF] Ingo Weber, Surya Nepal and Liming Zhu: Developing dependable and secure cloud applications. IEEE Internet Computing, Volume 20 Number 3, pp. 74-79, May, 2016. (Re-published in the digest magazine IEEE Computing Edge, July, 2016)
- [PDF] Mostafa Farshchi, Jean-Guy Schneider, Ingo Weber and John Grundy: Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis. ISSRE 2015, pp. 24-34, Washington DC, USA, November, 2015.
- [PDF] Sherry Xu, Liming Zhu, Daniel Sun, An Binh Tran, Ingo Weber, Min Fu and Len Bass: Error diagnosis of cloud application operation using bayesian networks and online optimisation. EDCC2015, Paris, France, September, 2015.
- [PDF] Ingo Weber, Andreas Rogge-Solti, Chao Li and Jan Mendling: CCaaS: Online conformance checking as a service. BPM 2015, Demo Track, Innsbruck, Austria, August, 2015.
- [PDF] Ingo Weber, Chao Li, Len Bass, Sherry Xu and Liming Zhu: Discovering and visualizing operations processes with POD-Discovery and POD-Viz. DSN 2015, Rio de Janeiro, Brazil, June, 2015.
- [PDF] Ingo Weber, Mostafa Farshchi, Jan Mendling and Jean-Guy Schneider: Mining processes with multi-instantiation. ACM SAC 2015, Salamanca, Spain, April, 2015.
- [PDF] Min Fu, Liming Zhu, Len Bass and Anna Liu: Recovery for failures in rolling upgrade on clouds. DCDV 2014, Atlanta GA, USA, June, 2014.
- [PDF] Sherry Xu, Liming Zhu, Ingo Weber, Len Bass and Daniel Sun: POD-diagnosis: Error diagnosis of sporadic operations on cloud applications. DSN 2014, Atlanta GA, USA, June, 2014.
- [PDF] Min Fu, Liming Zhu, Len Bass and Sherry Xu: A recoverability-oriented analysis for operations on cloud applications. WICSA 2014, Sydney, Australia, April, 2014.
- [PDF] Sherry Xu, Ingo Weber, Hiroshi Wada, Len Bass, Liming Zhu and Steve Teng: Detecting cloud provisioning errors using an annotated process model. MW4NextGen’13, Beijing, China, December, 2013.