Real-time Pose Estimation

May 30th, 2019

There many challenges involving real-time pose estimation. Below are some of the ones our team works on:

Multi-Person Human Pose Estimation

Problem: Estimating the location and orientation of human features (e.g. arms, legs, head, etc) in imagery and identifying which person they belong to.
Challenges: Handling noisy detection bounding boxes, occluded joints, associating the correct feature and person in crowded scenes, performing in real-time, fast moving features, working with temporal data, 3D to 2D mapping issues, unusual/rare poses.
Solution: Top down estimation via object detection and keypoint segmentation. An object detector identifies unique person instances visible within the image, a crop of each person is extracted from the image and fed to a keypoint segmentation network. The segmentation network identifies the probability that a feature joint (e.g. elbow, shoulder, knee, etc) is located within a region of the crop or not visible due to occlusion/cropping. By observing these joint probabilities the most probable human pose is estimated.
Impact: Key technology in action recognition and prediction, person tracking, human computer interfaces, pedestrian avoidance for automated driving, etc.

Human Motion Prediction

Problem: Predicting future human motion given a limited, short observation of past motion.
Challenges: Human motion is a highly stochastic process. Given a single observation, multiple plausible future motions are likely.

Solution: Our solution is designing a stochastic generative model that is capable of effectively learning the variations in human motion, such that it can generate multiple diverse motions that all are considered plausible continuations of the same observation. Our approach outperforms all the existing baselines in terms of quality and diversity of generated motions.
Applications and Impact: Predicting future human motion is the key to many computer vision tasks such as visual surveillance, sport analysis, pedestrian intention forecasting in autonomous cars, and safe human-robot interaction.

Attention for Person Re-Identification

Problem: Given an image of a specific person, the task is to retrieve all images, from a gallery set, that contain a person with the same identity.

Application: Person tracking, person identification and crime prevention.
Challenges: Person misalignment, light illumination pose variation, background clutter, occlusion.
Solution: We propose a Bilinear Attention Network (BAT-net), which has two feature extractors, namely, a person-appearance and a person-part. The key component of BAT-net is the Bilinear Attention Module, which captures second order statistical features hidden in the feature maps, and enhances the discriminative power of the feature embedding. Further, the proposed Attention in Attention mechanism also builds a connection between second order global features and local features.

Multi-Spectral Super-Resolution and Colour Matching

Problem: Super-resolve and colour-predict multi-spectral images using registered stereo multi-spectral/RGB image pairs.

Challenges: Modern multi-spectral cameras have a lower spectral and spatial resolution due to limited physical space on the image sensor. These constraints introduce ambiguity to the problem of simultaneous colour prediction and super-resolution.
Solution: We propose a novel CNN-based model that takes a multi-spectral image and produces the RGB equivalent of the same image with improved resolution.
Impact: The above work is carried out as part of a collaborative project with O&A BU which aims at monitoring the health of the coral reefs and development of appropriate responses. A stereo camera system (multi-spectral/RGB) equipped with additional sensors such as spectrometers and IMU, is used to gather image and spectroscopy data from the coral reef. This will facilitate registration and 3D habitat mapping of the corals, as well as classification of flora/fauna and the habitat in the great barrier reef, with the objective to expand the application internationally.

Zero-shot Learning for 3D Point Cloud Object Recognition

Problem: 3D point cloud object recognition from an arbitrary scene, for which there is no previously seen training data.

Challenge: Only a limited amount of 3D data is available for training, which in comparison to the 2D case, results in weak pre-trained models.
Solution: Our solution is to design a novel loss function which is capable of decreasing the effect of poor feature quality obtained from the weak pre-trained model. This loss effectively uses unlabelled data in an unsupervised manner which gives an improvement in classification performance. Our method achieves state-of-the-art performance and outperforms all the existing baselines.
Impact: Recognizing 3D objects previously not seen when training the system is becoming vital to many applications such as self-driving cars‎, service robotics, UAVs, and more.

Applications

Some of our work on this space includes:

Monitoring Food Handling & Preparation, Logistics
- CRC-P Nutronomics
- Fast food industry (Competitive Foods Australia)
Visual Perception System for the DARPA SubTerranean Challenge
Automatic surface condition rating (Cygnus)
Joint mobility measurement for rehabilitation
- CRC-P Coviu
Habitat Mapping on the Coral Reef (multi spectral)
- AIM FSP, collaborative project with Oceans & Atmosphere

Our highly skilled team of world class researchers and engineers is open to partnerships and collaborations for research, development, and commercialisation.

Subscribe to our News via Email

Vision processing for Bionic Vision technology

Coming up next:

Augmented Reality