Early this semester, we were working on volumetric capture. The project is about to pre-record human’s presentation and replay it remotely offline. The project includes: 1) Camera rig. To capture the whole body of the presenter, four walls of the cameras are the initial setup. 2) Movable frames. To support the movement of the presenter, we need frames to fix the cameras and also movable. 3) Calibration and sync up across multiple cameras. Currently we are using real sense cameras. Potential high resolution RGB cameras are the plan B. The above is the project scope.
Surprisingly, one of my friends told me there was a similar topic being discussed in CVPR yesterday. However, it is just a tutorial. I failed to find more relevant articles online. But I do find a paper related to the same group involved in the tutorial. I am gonna give a quick review on that. They did a great job.
Before we discuss their approach, let’s go through what kinds of methods previous work has used for volumetric capture. There is a scale here. One end is cartoon-like virtual avatar, which is low cost and also low fidelity. The other end is high-cost, high resolution, multi-view, and off-line calculation. The goal of the paper is to find a new balance point that affords real-time capture with promising quality.
Three categories were listed for human body reconstruction. Image based, volumetric reconstruction, and ML based.
Imaged based rendering usually does not form a full 3D model because of the limited view point, neither a new view point. 360 degree panoramas and warping techniques were used to enlarge the views. To me, I am more interested in how to embedded 2D image or image based reconstruction result into AR or VR environment.
Volumetric reconstruction method could be very high cost, such as using 100+ cameras. The current state-of-the-art system is one of them, and widely used in production studios. Different methods were tried to decrease the amount of the cameras.
ML based methods has been used for object reconstruction widely. For human body, especially for unseen poses, the case is different. Some work tried to detangle appearance from pose. Some used UV map for new viewpoints rendering.
This paper proposed a semi-parametric approach to render a subject (human) in unseen poses and arbitrary viewpoints.

Here shows some of the results.

Feel free to checkout the full paper here. https://arxiv.org/abs/1905.12162.