Fine-grained capturing of 3D Human-Object Interactions (HOIs) boosts human activity understanding and facilitates downstream visual tasks, including action recognition, holistic scene reconstruction, and human motion synthesis. Despite its significance, most existing works assume that humans interact with rigid ob jects, limiting their scope. In this paper, we address the challenging problem of Articulated Human-Object Interaction (A-HOI), wherein whole human bod7 ies interact with articulated objects, whose parts are connected by prismatic or revolute joints. We present Capturing Human and Articulated-object InteRac9 tionS (CHAIRS), a large-scale motion-captured A-HOI dataset, consisting of 16.2 hours of versatile interactions between 46 participants and 74 rigid and articulated sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as the realistic and physically plausible part-level interactions. We show the value of CHAIRS with a new chal lenging task- Kinematic-Agnostic Human and Object Pose Estimation (KA-HOPE). Leveraging the interactions between human and object parts, we devise the very first model to tackle the joint estimation of human and object poses during interac tions, which significantly outperforms the baseline models and shows improved generalizability across kinematic structures. We hope CHAIRS will promote the community toward more fine-grained interaction understanding between humans and 3D scenes.
Supplementary notes can be added here, including code, math, and images.