Name

InterGen

Overview

A text-guided diffusion model that generates realistic two-person interaction motions from the new InterHuman dataset.

Honor & Awards

Published in the International Journal of Computer Vision, one of the top-tier journals in the field of computer vision and artificial intelligence.

Contribution

Participated in the design of a diffusion-based approach that integrates human-to-human interactions into the motion diffusion process to allow layman users to customize high-quality two-person interaction motions with only text guidance
Captured the raw data of two-person interaction motions with an optical motion capture system with a synchronized view of 76 RGB cameras and applied Ground Truth to generate interaction motion data
Created the InterHuman multimodal dataset comprising 107 million frames of diverse two-person interactions with accurate skeletal motions and 16,756 natural language descriptions
Implemented a tailored motion diffusion model featuring two cooperative transformer-based denoisers with a mutual attention mechanism and a novel motion input representation

DEtails

InterGen is a diffusion-based framework designed to generate realistic and diverse two-person interaction motions directly from text descriptions.
This work introduces InterHuman, a large-scale multimodal dataset containing over 107 million motion frames and 23,000 natural language annotations covering a wide range of human interactions.
To model the spatial and semantic relations between two performers, the method employs cooperative transformer denoisers with shared weights and a relation-aware motion representation.
InterGen produces high-quality, customizable interaction motions, opening up new possibilities for animation, virtual production, and embodied AI applications.

Here are some generated results:

"Oppenheimer and Einstein walked side by side in the garden and had profound discussions."

"With fiery passion two dancers entwine in Latin dance sublime."

For more detailed information about this work, please refer to the paper; if access is restricted, an arXiv version is also available.
The project website includes additional qualitative results and visualizations.
This work has been open-sourced, and the code and dataset are available on GitHub.