Reconstructing People, Places, and Cameras

Lea Müller* Hongsuk Choi* Anthony Zhang Brent Yi Jitendra Malik Angjoo Kanazawa

 UC Berkeley
* equal contribution

TLDR; We propose - Humans and Structure from Motion (HSfM) - Our approach integrates Human Mesh Reconstruction and Structure from Motion to jointly estimate 3D human pose and shape, scene point maps, and cameras poses in a metric world coordinate frame.


Our approach places people in the scene and improves camera pose and scene reconstruction. Here we show a top view of gym environment before HSfM optimization (DUSt3R output) and after optimization with HSfM Loss.

Reconstruction Result


Humans and Structure from Motion enables capturing people interacting with their environment as well as the spatial positioning between individuals. Here we show a reconstruction of three people building Lego together.

Overview

Humans and Structure from Motion (HSfM) is a novel method that jointly reconstructs 3D humans, scene, and cameras from a sparse set of uncalibrated images. To achieve this, HSfM combines Human Mesh Recovery (HMR) methods for local human pose estimation and Structure from Motion (SfM) techniques for scene and camera reconstruction and to localize people. Specifically, our approach combines camera and scene reconstruction from data-driven SfM methods, such as DUSt3R, with the bundle adjustment step from traditional SfM applied to 2D keypoints where a human body model provides 3D human meshes and constrains human size.

Problem Statement

Step by Step Method

Evaluation

This joint reasoning not only enables accurate human placement in the scene. Notably, it also improves camera poses and the scene reconstruction itself. Evaluations on public benchmarks show significant improvements. Here we show the camera angle (RRA) and scaled translation (s-CCA) accuracy in percent at a threshold of 10 degree / meter on the EgoHumans benchmark.

Joint Human and Scene Reconstruction Metrics

Method Video

Acknowledgements

The interactive results are powered by Viser.

This project is supported in part by DARPA No. HR001123C0021, IARPA DOI/IBC No. 140D0423C0035, NSF:CNS-2235013, ONR MURI N00014-21-1-2801, Bakar Fellows, and Bair Sponsors. The views and conclusions contained herein are those of the authors and do not represent the official policies or endorsements of these institutions. We also thank Chung Min Kim for her critical reviews of this paper and Junyi Zhang for his valuable insights on the method.

Citation

@article{mueller2024hsfm,
  title={Reconstructing People, Places, and Cameras},
  author={Lea M\"uller and Hongsuk Choi and Anthony Zhang and Brent Yi and Jitendra Malik and Angjoo Kanazawa},
  year={2024},
  journal={arXiv:2412.17806},
}