Learning from Synthetic Human Group Activities

Abstract

The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M³Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and group activities. Powered by Unity Engine, M³Act features multiple semantic groups, highly diverse and photorealistic images, and a comprehensive set of annotations, which facilitates the learning of human-centered tasks across single-person, multi-person, and multi-group conditions. We demonstrate the advantages of M³Act across three core experiments. The results suggest our synthetic dataset can significantly improve the performance of several downstream methods and replace real-world datasets to reduce cost. Notably, M³Act improves the state-of-the-art MOTRv2 on DanceTrack dataset, leading to a hop on the leaderboard from 10th to 2nd place. Moreover, M³Act opens new research for controllable 3D group activity generation. We define multiple metrics and propose a competitive baseline for the novel task.

Synthetic Data Generation

A comparison of synthetic datasets as well as commonly-used real datasets for activity understanding and person tracking.

The data generation process of M³Act. It consists of multiple data simulations with scene instantiation, group activity authoring, and a data capture module. A high degree of randomization is involved in all aspects of the process to ensure diverse data.

Sample Images

Video Demo

Visualization of M³Act3D Dataset

Controllable Group Activity Generation

BibTeX

@inproceedings{chang2024learning,
        title={Learning from Synthetic Human Group Activities},
        author={Chang, Che-Jui and Li, Danrui and Patel, Deep and Goel, Parth and Zhou, Honglu and Moon, Seonghyeon and Sohn, Samuel S and Yoon, Sejong and Pavlovic, Vladimir and Kapadia, Mubbasir},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        pages={21922--21932},
        year={2024}}

@article{chang2024equivalency,
      title={On the Equivalency, Substitutability, and Flexibility of Synthetic Data},
      author={Chang, Che-Jui and Li, Danrui and Moon, Seonghyeon and Kapadia, Mubbasir},
      journal={arXiv preprint arXiv:2403.16244},
      year={2024}}