GitHub

Discovering and using Spelke segments

Rahul Venkatesh*¹ · Klemen Kotar*¹ · Lilian Naing Chen*¹ · Simon Kim¹· Luca Thomas Wheeler¹ · Jared Watrous¹ · Ashley Xu¹ · Gia Ancone¹ · Wanhee Lee¹ Honglin Chen² ·Daniel Bear³ · Stefan Stojanov¹ · Daniel L. K. Yamins¹

¹Stanford ²OpenAI ³Noetik, Inc.

(* equal contribution)

📦 SpelkeBench: A benchmark for evaluating Spelke segment extraction

SpelkeBench is a dataset of 500 images with associated segmentation annotations for evaluating models' ability to extract Spelke segments. Unlike conventional definitions of segmentation, Spelke segments provide a category-agnostic notion of segments based on what moves together in the physical world.

🔽 Download Instructions

To download the dataset, run the following command from the top level of the repository, which will save the dataset inside the datasets/ folder:

sh scripts/download_spelke_bench.sh

📁 Dataset Format

The dataset is a .h5 file where each key contains a dictionary with the following entries:

image: The input image
segments_gt: The ground truth Spelke segments
poke_location: The location of the virtual poke to be used by the model to generate the segments

🖼️ Example Outputs

Some examples from the dataset are shown below and comparisons to SAM and Entity segmentation are illustrated below.

🕸️ SpelkeNet: Operationalizing Spelke segments

To discover Spelke segments, we build SpelkeNet, a model that learns to complete flow fields and implicitly captures how objects move together in the physical world.

Setting up the Conda environment to use SpelkeNet

cd SpelkeNet
conda create -n spelkenet python=3.10 -y
conda activate spelkenet
pip install -e .

Two key structure extractions from SpelkeNet for object discovery

Motion Affordance Maps: which estimate the regions likely to respond to an external force, independent of camera motion.

Expected Displacement Maps: A flow field that captures how the scene will respond to an applied virtual poke.

We provide jupyter notebooks which demonstrate how these maps can be extracted from SpelkeNet:

📓 Motion Affordance Maps
📓 Expected Displacement Maps

🔍 Statistical counterfactual probing for Spelke object discovery

Using these structure extractions, we first sample a location that is likely to move from the motion affordance map, and apply various virtual pokes at this location in order to identify regions that consistently move together. We then compute the expected motion correlation by averaging across various pokes the dot product between the poke vector and the expected displacement map. Finally, Otsu thresholding on the averaged dot product yields our desired Spelke segment.

We provide a notebook which demonstrates how to extract Spelke segments from SpelkeNet:

📓 Spelke Object Discovery

📊 Evaluating SpelkeNet on SpelkeBench

We provide scripts to run inference and evaluate segmentation models on the SpelkeBench dataset. To evaluate a model define a model class that inherits from:

spelke_net.inference.spelke_object_discovery.spelke_bench_class.SpelkeBenchModel

and implement the run_inference method with the following signature:

class SpelkeBenchModel:
    """
    Base class for SpelkeBench models.
    This class should be inherited by all models that are used in the SpelkeBench framework.
    """

    def __init__(self):
        """args to initialize the model"""
        return

    def run_inference(self, input_image, poke_point):
        '''
        Run inference on the input image and poke point.
        :param input_image: numpy array of shape [H, W, 3] in [0, 1] range
        :param poke_point: (x, y) tuple representing the poke point in the image, x horizontal, y vertical
        :return: H, W numpy array representing the segment mask
        '''

We provide a reference implementation of a SpelkeNet model in spelke_net.inference.spelke_object_discovery.spelke_bench_class.SpelkeNetModel1B. This class is initialized with the following default values:

Parameter	Default Value	Description
`num_zoom_iters`	`2`	Number of zoom-in refinement iterations.
`num_seq_patches`	`256`	Number of sequential flow token generations
`num_seeds`	`3`	Number of rollouts per virtual poke
`num_dirs`	`8`	Number of virtual pokes
`model_name`	`model_1B.pt`	Name of the model checkpoint file to load. Refers to the 1-billion parameter SpelkeNet checkpoint.
`num_zoom_dirs`	`5`	Number of virtual pokes to be applied during the zooming in stage

🖼️ Inference on a single image or a small set

Use the following command to run inference on one or more images:

python spelke_net/inference/spelke_object_discovery/run_inference.py \
  --device cuda:<device_id> \
  --dataset_path ./datasets/spelke_bench.h5 \
  --img_names entityseg_1_image2926 \
  --output_dir <out_dir>
  --model_name SpelkeNetModel1B

Flag	Description
`--device`	Which GPU to use (e.g., `cuda:0`)
`--dataset_path`	Path to the SpelkeBench `.h5` file
`--img_names`	Space-separated list of image keys in the `.h5` file
`--output_dir`	Directory to save prediction outputs

You can pass multiple image keys like:

--img_names entityseg_1_image2926 entityseg_2_image1258 ...

⚙️ Parallel Inference on a Multi-Node Cluster

To run inference in parallel across multiple nodes and GPUs, we provide a wrapper script. Here's a typical setup assuming:

4 nodes are available
each node has 4 GPUs
you want to split the workload evenly across nodes

You would launch the following on each node with node-specific values:

python spelke_net/inference/spelke_object_discovery/run_inference_parallel.py \
  --gpus 0 1 2 3 \
  --dataset_path ./datasets/spelke_bench.h5 \
  --output_dir <out_dir> \
  --num_splits 4 \
  --split_num <node_id> \
  --model_name SpelkeNetModel1B

To use the larger 7B model, replace the model name:--model_name SpelkeNetModel7B

💾 Output Format

Each prediction is saved as a separate .h5 file in <out_dir>, containing the following keys:

segment_pred: predicted Spelke segment mask
segment_gt: ground truth Spelke segment mask
probe_point: virtual poke location
image: original input image

📊 Evaluation

After inference is complete, run the evaluation script:

python spelke_net/inference/spelke_object_discovery/evaluate_folder.py \
  --input_dir <out_dir> \
  --output_dir <metrics_out_dir>

This will:

Save visualizations to <metrics_out_dir>
Print the following metrics to the console:
- Average Recall (AR)
- Mean Intersection-over-Union (mIOU)

Metric	SAM2	DINOv1-B/8	DINOv2-L/14	DINOv2-G/14	CWM	SpelkeNet
AR	0.4816	0.2708	0.2524	0.2254	0.3271	0.5411
mIoU	0.6225	0.4990	0.4931	0.4553	0.4807	0.6811

🧠 Testing Your Own Model

To evaluate a custom model:

Implement a model class following the SpelkeNetModel interface in the file spelke_net/inference/spelke_object_discovery/spelke_bench_class.py.
Pass its class name to the --model_name argument in the above commands.

🔧 Using Spelke segments for object manupulation

Evaluating models on 3DEditBench

Step 1: Install the `3DEditBench` package

git clone https://github.com/neuroailab/3DEditBench.git`
cd 3DEditBench
conda activate spelkenet
pip install -e . --no-deps

Step 2: Download `3DEditBench` into `datasets/3DEditBench/` by running:

cd SpelkeNet
sh scripts/download_3DEditBench.sh

Step 2: Download precomputed segments using SpelkeNet and SAM on 3DEditBench

This command will download the precomputed segments into datasets/precomputed_segments/

sh scripts/download_3DEditBench_precomputed_segments.sh

Step 3: Parallel Inference on a multi-node cluster

To run inference in parallel across multiple nodes and GPUs, 3DEditBench provides a wrapper script similar to SpelkeBench evaluation.

For a cluster of 4 Nodes × 4 GPUs, on each node, launch:

editbench-launch \
  --gpus 0 1 2 3 \
  --dataset_path ./datasets/3DEditBench \
  --output_dir ./experiments/my_model_run \
  --num_splits 4 \
  --split_num <node_id> \
  --model_class <your_model.YourEditingModel>

Replace <node_id> with the appropriate node index (0, 1, 2, or 3).

Available Models for Evaluation

You can choose from the following predefined models by setting the --model_class flag:

Model Class	Description
`spelke_net.inference.object_manipulation.edit_model.ObjectEditModelSAM`	Uses the Segment Anything Model (SAM) to generate object masks (pre-computed) based on the point prompt.
`spelke_net.inference.object_manipulation.edit_model.ObjectEditModelSpelkeNet`	Uses SpelkeNet to infer motion-based object segments (pre-computed) from point prompts.
`spelke_net.inference.object_manipulation.edit_model.ObjectEditModelGT`	Uses the ground-truth segmentation masks provided in the 3DEditBench dataset.

📌 Tip: After all splits finish, you can evaluate the results with the editbench-evaluate-metrics utilty on the hdf5_result_files/ directory.

Step 4: Evaluate Aggregate Metrics

After all splits finish, you can evaluate your model’s aggregating performance on 3DEditBench:

editbench-evaluate-metrics \
  --predictions_path ./experiments/my_model_run/hdf5_result_files
  --results_dir <your_results_dir>

📊 Benchmark Results

Method	Segment	MSE ↓	PSNR ↑	LPIPS ↓	SSIM ↑	EA ↑
LRAS-3D	SpelkeNet	0.009	21.64	0.213	0.698	0.776
	SAM	0.013	20.17	0.255	0.685	0.633
LightningDrag	SpelkeNet	0.017	19.16	0.195	0.672	0.679
	SAM	0.020	18.18	0.241	0.658	0.536
Diffusion Handles	SpelkeNet	0.024	17.42	0.364	0.555	0.576
	SAM	0.031	16.15	0.419	0.526	0.495
Diffusion as Shader	SpelkeNet	0.015	19.29	0.194	0.707	0.640
	SAM	0.019	18.20	0.253	0.682	0.503

Applying your own object manipulations

Use the command below to apply custom object edits with either SAM or SpelkeNet segments by setting the segment_type flag. You can control the object’s rotation (azimuth, elevation, tilt) and translation (tx, ty, tz) in 3‑D space. These transformations are specified in the world coordinate system, where azimuth controls rotation around the axis vertical to the ground plane.

CUDA_VISIBLE_DEVICES=<gpu_id> \
python spelke_net/inference/object_manipulation/custom_object_edits.py \
  --hdf5_file ./datasets/3DEditBench/0005.hdf5 \
  --segment_hdf5_file ./datasets/precomputed_segments/0005.hdf5 \
  --output_dir ./experiments/test_custom \
  --segment_type spelkenet \
  --azimuth 0.0  --elevation 0.0  --tilt 0.0 \
  --tx 0.15  --ty 0.0  --tz 0.0 \
  --num_runs 1

Flag	Description
`--hdf5_file`	Path to the input scene (`.hdf5`) containing RGB frames and GT 3D edits
`--segment_hdf5_file`	Path to the segmentation masks (`.hdf5`) for the same scene.
`--segment_type`	Choose which masks to use: `sam` or `spelkenet` or `GT`.
`--output_dir`	Root directory for this run’s outputs. A `viz/` subfolder is created automatically.
`--azimuth`, `--elevation`, `--tilt`	Rotations (degrees): yaw, pitch, and roll, respectively.
`--tx`, `--ty`, `--tz`	Translations along X, Y, Z axes in meters.
`--num_runs`	Number of random generations for the same 3D transform

Output format: This command will reproduce the result in the teaser image, and saves visualisations in ./experiments/test_custom/viz/

📫 Citation

If you use this code or the dataset in your research, please cite our paper:

@misc{venkatesh2025discoveringusingspelkesegments,
      title={Discovering and using Spelke segments}, 
      author={Rahul Venkatesh and Klemen Kotar and Lilian Naing Chen and Seungwoo Kim and Luca Thomas Wheeler and Jared Watrous and Ashley Xu and Gia Ancone and Wanhee Lee and Honglin Chen and Daniel Bear and Stefan Stojanov and Daniel Yamins},
      year={2025},
      eprint={2507.16038},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.16038}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
demo_images		demo_images
notebooks		notebooks
scripts		scripts
spelke_net		spelke_net
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Discovering and using Spelke segments

📦 SpelkeBench: A benchmark for evaluating Spelke segment extraction

🔽 Download Instructions

📁 Dataset Format

🖼️ Example Outputs

🕸️ SpelkeNet: Operationalizing Spelke segments

Setting up the Conda environment to use SpelkeNet

Two key structure extractions from SpelkeNet for object discovery

🔍 Statistical counterfactual probing for Spelke object discovery

📊 Evaluating SpelkeNet on SpelkeBench

🖼️ Inference on a single image or a small set

⚙️ Parallel Inference on a Multi-Node Cluster

💾 Output Format

📊 Evaluation

🧠 Testing Your Own Model

🔧 Using Spelke segments for object manupulation

Evaluating models on 3DEditBench

Step 1: Install the `3DEditBench` package

Step 2: Download `3DEditBench` into `datasets/3DEditBench/` by running:

Step 2: Download precomputed segments using SpelkeNet and SAM on 3DEditBench

Step 3: Parallel Inference on a multi-node cluster

Available Models for Evaluation

Step 4: Evaluate Aggregate Metrics

📊 Benchmark Results

Applying your own object manipulations

📫 Citation

About

Uh oh!

Releases

Packages

Languages

License

neuroailab/SpelkeNet

Folders and files

Latest commit

History

Repository files navigation

Discovering and using Spelke segments

📦 SpelkeBench: A benchmark for evaluating Spelke segment extraction

🔽 Download Instructions

📁 Dataset Format

🖼️ Example Outputs

🕸️ SpelkeNet: Operationalizing Spelke segments

Setting up the Conda environment to use SpelkeNet

Two key structure extractions from SpelkeNet for object discovery

🔍 Statistical counterfactual probing for Spelke object discovery

📊 Evaluating SpelkeNet on SpelkeBench

🖼️ Inference on a single image or a small set

⚙️ Parallel Inference on a Multi-Node Cluster

💾 Output Format

📊 Evaluation

🧠 Testing Your Own Model

🔧 Using Spelke segments for object manupulation

Evaluating models on 3DEditBench

Step 1: Install the 3DEditBench package

Step 2: Download 3DEditBench into datasets/3DEditBench/ by running:

Step 2: Download precomputed segments using SpelkeNet and SAM on 3DEditBench

Step 3: Parallel Inference on a multi-node cluster

Available Models for Evaluation

Step 4: Evaluate Aggregate Metrics

📊 Benchmark Results

Applying your own object manipulations

📫 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Step 1: Install the `3DEditBench` package

Step 2: Download `3DEditBench` into `datasets/3DEditBench/` by running:

Packages