OLATverse: A Large-scale Real-world Object Dataset with Precise Lighting Control

Abstract

We introduce OLATverse, a large-scale real-world dataset comprising over 9M images of 765 objects, captured from multiple viewpoints under a diverse set of precisely controlled lighting conditions. While recent advances in object-centric inverse rendering, novel view synthesis and relighting have demonstrated promising results, most techniques still heavily rely on the synthetic datasets for training and small-scale real-world datasets for benchmarking, which limits their realism and generalization. To address this gap, OLATverse offers two key advantages over existing datasets: large-scale real object coverage and high-fidelity appearance under precisely controlled illuminations. Specifically, our dataset contains 765 common and uncommon real-world objects, spanning a wide range of material categories. Each object is captured under 35 DSLR cameras and 331 individually controlled light sources, enabling the simulation of diverse illumination conditions. Additionally, for each object, we provide well-calibrated camera parameters, accurate object masks, diffuse albedo and photometric surface normals as auxiliary resources. We further construct an extensive evaluation set, establishing the first comprehensive real-world object-centric benchmark for inverse rendering and normal estimation. We believe that OLATverse marks a pivotal step toward the integration of next generation of methods with real-world data.

Statistics

(a) We visualize the statistics of OLATverse, including the material distribution and high-level object category distribution. (b) We compare OLATverse against one representative OLAT real dataset, OpenIllumination. We visualize the number of objects in the six largest material and high-level object categories. While OpenIllumination covers the similar material and object categories as OLATverse, the scale of each category is significantly smaller than our dataset.

Dataset Comparison

Comparison of object-centric dataset targeting inverse rendering and relighting tasks. We list a detailed comparison of OLATverse with existing datasets across several key attributes. The compared aspects include number of object (# Objs), whether data source is real (Real), lighting conditions (IllumCond), number of illuminations (# Illum), number of views (# Views), and capture device (Device). In the column of IllumCond, ENV denotes environment illumination, PAT represents pattern illumination. Unspec. indicates that the corresponding information is not specified in the dataset, and Blue indicates that only a small portion of the dataset satisfies the criterion.

Pipeline

Illustration of the dataset capture setup and process pipeline. We utilize wooden stands with varying sizes and (a) a lightstage setup to capture raw videos of objects. During the calibration session, we record (b) reference objects to extract accurate camera parameters, which are utilized to extract (c) undistorted OLALs and relit images under varying illuminations from raw videos. Next, we capture (d) background stand image and perform (e) semi-automatic mask segmentation and normal extraction for each object.

One sample in OLATverse

We visualize one sample of OLATverse, which includes full bright (FB), OLATs, relit images under varying pre-defined environmental illuminations (ENV), object mask, surface normals, and diffuse albedo.

Preview Dataset

We randomly selected a subset of dataset for preview. (OLATverse is multi-view object dataset. For simplicity, in this video, we only show objects captured from a single view.)