Facebook is working with partners like OTOY, Adobe, Framestore, Foundry and others to build out a new kind of camera system, tools and workflows capable of volumetric capture. The system should equate to a more realistic representation of reality in captured footage. In the end, it will let viewers in VR move their head around with greater freedom and see the action in a video from different angles. The plan is to release it later this year.
This new soccer ball-sized camera system will be an evolution of the earlier open “Surround 360” camera announced last year that equated to roughly $30,000 in parts. I asked Facebook for details on the pricing of the new six degrees of freedom system, but the company is planning for manufacturing partners to license the new camera designs and turn it into a product — so pricing is up to them.
There are two versions. One is slightly smaller than a soccer ball and the other is slightly bigger. The larger one includes 24 cameras while the smaller one has six (called x24 and x6 respectively). The system uses a variety of techniques to extract depth data from the images they capture and produce a small bubble of space in which a visitor to VR can move their head around and see the scene from different angles. This is potentially an enormous advancement for reality capture that could have broad implications for everything from Hollywood special effects workflows to recording memories with family.
“It’s legitimately a light field camera,” said Jules Urbach, CEO of OTOY. “This Facebook camera is basically the sweet spot. It’s the workflow of the future.”
A Potentially Breakthrough Moment For Recording Reality
360-degree videos are currently a major category of content you can see in a VR headset but many projects are substandard because it is difficult to capture and deliver. In addition, early adopting enthusiasts often complain about 360-degree videos because of the immersion-breaking fact that you can only turn your head in a scene but cannot move around. That’s because most existing cameras just capture a traditional video essentially wrapped around you like an IMAX dome. The above animation made by Vincent McCurley for the National Film Board of Canada explains the difference between “true” VR typically made in a game engine and traditional 360-degree videos captured by an array of cameras.
What Facebook and OTOY discovered earlier this year is that by combining their various technologies, which treat captured images more like a stream of data points, they can extract enough depth information to create a small space centered around the camera in which a person can move their head around with complete freedom to see the scene accurately depicted from different angles.
“Both of us [Facebook and OTOY] were developing technologies independently of each other,” said Facebook Engineering Director Brian Cabral. “When they matured to a certain point it was clear that they were very complementary.”
Facebook is working with post-production and visual effects companies including Adobe, OTOY, Foundry, Mettle, DXO, Here Be Dragons, Framestore, Magnopus, and The Mill to develop tools and workflows for this new system. Facebook is planning to license the new designs to partners with the aim of releasing a camera product later this year.
Diving Deep Into 6 Degrees Of Freedom VR Video
According to Facebook, the 24-camera system is said to capture full RGB and depth at every pixel for all of the cameras. It “oversamples” 4x at every point and uses depth-estimation algorithms to capture both high-resolution views and depth data. The six-camera system oversamples 3x. OTOY’s Urbach offered us some technical details in an email about how the system works.
“The lens arrangement has precise overlaps (which is how depth is optically generated from raw color data), and this overlap of views (~8 on the x24 footage we’ve used) is enough to fill in many occluded holes and minimize depth shadows in the view bubble around a user’s head,” Urbach wrote. “For far field depth shadows, we use heuristsics in our player to fill in any gaps not in the camera capture data. A simple workflow approach, which will be discussed [in a] F8 presentation, is to shoot a background plate of the environment before you shoot dynamic elements. Our software will allow you to easily layer these back in through Octane. To that end, OTOY is developing a template for moving the x24 in a path or circle for ~4-6 seconds that will give you a light field set capture.”
Diving deeper, we asked Urbach how close an object or subject can be to one of these cameras and he offered the following in-depth response:
For normal 6DOF video playback at 1:1 scale, it’s best to have the nearest part of the scene be >= 1 m from camera origin unless you have multiple overlapping view volumes or plates layered over each other to provide extra near field coverage. Interestingly, the 1 m delimeter is the same distance [John] Carmack recommended to artists for the synthetic Render The Metaverse contest scenes…
Having no bounds around the camera position is the default mode in ‘snow globe’ mode (where you can shrink an entire scene into the palm of your hand – and have it placed a cm from view origin like a real snow globe). It looks amazing, and is my favorite way to experience this content. It will be the default mode when we launch a 6DOF video bubble in AR/XR devices like Tango (or ODG R9).