CHAIRS

Full-Body Articulated Human-Object Interaction

¹School of Intelligence Science and Technology, Peking University ²Beijing Institute of General Artificial Intelligence (BIGAI) ³Department of Automation, Tsinghua University
⁴Center on Frontiers of Computing Studies, Peking University ⁵Institute for Artificial Intelligence, Peking University
^*Equal contributors ⁺Work done during internship at BIGAI

Abstract

Fine-grained capturing of 3D HOI boosts human activity understanding and facilitates downstream visual tasks, including action recognition, holistic scene reconstruction, and human motion synthesis. Despite its significance, existing works mostly assume that humans interact with rigid objects using only a few body parts, limiting their scope. In this paper, we address the challenging problem of f-AHOI, wherein the whole human bodies interact with articulated objects, whose parts are connected by movable joints. We present CHAIRS, a large-scale motion-captured f-AHOI dataset, consisting of 16.2 hours of versatile interactions between 46 participants and 74 articulated and rigid sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as realistic and physically plausible full-body interactions. We show the value of CHAIRS with object pose estimation. By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation to tackle the estimation of articulated object poses and shapes during whole-body interactions. Given an image and an estimated human pose, our model first reconstructs the pose and shape of the object, then optimizes the reconstruction according to a learned interaction prior. Under both evaluation settings (e.g., with or without the knowledge of objects' geometries/structures), our model significantly outperforms baselines. We hope CHAIRS will promote the community towards finer-grained interaction understanding. We will make the data/code publicly available.

Dataset Clips

Full-Body Articulated Human-Object Interaction

Abstract

Method Overview

Dataset Clips

Result Examples

The optimization results of our method on in-the-wild images with a person sitting in a sitable furniture.

The optimization results of our method on BEHAVE dataset.