We introduce OLATverse, a large-scale real-world dataset comprising over 9M images of 765 objects, captured from multiple viewpoints under a diverse set of precisely controlled lighting conditions. While recent advances in object-centric inverse rendering, novel view synthesis and relighting have demonstrated promising results, most techniques still heavily rely on the synthetic datasets for training and small-scale real-world datasets for benchmarking, which limits their realism and generalization. To address this gap, OLATverse offers two key advantages over existing datasets: large-scale real object coverage and high-fidelity appearance under precisely controlled illuminations. Specifically, our dataset contains 765 common and uncommon real-world objects, spanning a wide range of material categories. Each object is captured under 35 DSLR cameras and 331 individually controlled light sources, enabling the simulation of diverse illumination conditions. Additionally, for each object, we provide well-calibrated camera parameters, accurate object masks, diffuse albedo and photometric surface normals as auxiliary resources. We further construct an extensive evaluation set, establishing the first comprehensive real-world object-centric benchmark for inverse rendering and normal estimation. We believe that OLATverse marks a pivotal step toward the integration of next generation of methods with real-world data.
(a) We visualize the statistics of OLATverse, including the material distribution and high-level object category distribution. (b) We compare OLATverse against one representative OLAT real dataset, OpenIllumination. We visualize the number of objects in the six largest material and high-level object categories. While OpenIllumination covers the similar material and object categories as OLATverse, the scale of each category is significantly smaller than our dataset.
Comparison of object-centric dataset targeting inverse rendering and relighting tasks. We list a detailed comparison of OLATverse with existing datasets across several key attributes. The compared aspects include number of object (# Objs), whether data source is real (Real), lighting conditions (IllumCond), number of illuminations (# Illum), number of views (# Views), and capture device (Device). In the column of IllumCond, ENV denotes environment illumination, PAT represents pattern illumination. Unspec. indicates that the corresponding information is not specified in the dataset, and Blue indicates that only a small portion of the dataset satisfies the criterion.
Illustration of the dataset capture setup and process pipeline. We utilize wooden stands with varying sizes and (a) a lightstage setup to capture raw videos of objects. During the calibration session, we record (b) reference objects to extract accurate camera parameters, which are utilized to extract (c) undistorted OLALs and relit images under varying illuminations from raw videos. Next, we capture (d) background stand image and perform (e) semi-automatic mask segmentation and normal extraction for each object.
We visualize one sample of OLATverse, which includes full bright (FB), OLATs, relit images under varying pre-defined environmental illuminations (ENV), object mask, surface normals, and diffuse albedo.
We randomly selected a subset of dataset for preview. (OLATverse is multi-view object dataset. For simplicity, in this video, we only show objects captured from a single view.)
We demonstrate a subset of the validation dataset, including OLAT data from one view and the extracted surface normals from five views.
There are many excellent dataset papers focusing on real-world object capture.
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset
OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects
Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark
There are also several excellent works focusing on OLAT dataset of Human avatar.
3DPR: Single Image 3D Portrait Relighting with Generative Priors
HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis