Honghua Chen, Yushi Lan, Yongwei Chen, Yifan Zhou, Xingang Pan
S-Lab, Nanyang Technological University, Singapore

MVDrag3D provides a precise, generative, and flexible solution for 3D drag-based editing, supporting more versatile editing effects across various object categories and 3D representations.
Drag-based editing has become popular in 2D content creation, driven by the capabilities of image generative models. However, extending this technique to 3D remains a challenge. Existing 3D drag-based editing methods, whether employing explicit spatial transformations or relying on implicit latent optimization within limited-capacity 3D generative models, fall short in handling significant topology changes or generating new textures across diverse object categories. To overcome these limitations, we introduce MVDrag3D, a novel framework for more flexible and creative drag-based 3D editing that leverages multi-view generation and reconstruction priors. At the core of our approach is the usage of a multi-view diffusion model as a strong generative prior to perform consistent drag editing over multiple rendered views, which is followed by a reconstruction model that reconstructs 3D Gaussians of the edited object. While the initial 3D Gaussians may suffer from misalignment between different views, we address this via view-specific deformation networks that adjust the position of Gaussians to be well aligned. In addition, we propose a multi-view score function that distills generative priors from multiple views to further enhance the view consistency and visual quality. Extensive experiments demonstrate that MVDrag3D provides a precise, generative, and flexible solution for 3D drag-based editing, supporting more versatile editing effects across various object categories and 3D representations.
The overall architecture of MVDrag3D. Given a 3D model and multiple pairs of 3D dragging points, we first render the model into four orthogonal views, each with corresponding projected dragging points. Then, to ensure consistent dragging across these views, we define a multi-view guidance energy within a multi-view diffusion model. The resulting dragged images are used to regress an initial set of 3D Gaussians. Our method further employs a two-stage optimization process: first, a deformation network adjusts the positions of the Gaussians for improved geometric alignment, followed by image-conditioned multi-view score distillation to enhance the visual quality of the final output.
git clone --recurse-submodules https://github.com/chenhonghua/MvDrag3D.git
cd MvDrag3D
Install the required dependencies. For example:
pip install -r requirements.txt
Or, if you use conda:
conda env create -f environment.yml
conda activate your_env_name
-
Place your images, keypoints, and other data in the appropriate directories (e.g.,
MvDrag3D/dragonGaussian/viking_axe2/
). -
Prepare the
src_points_path
andtgt_points_path
files by following the provided example formats. -
Keypoints should be manually selected using tools like MeshLab or CloudCompare.
-
Source 3D keypoints should be saved in:
srcs_single_keypoints.txt
-
Corresponding target keypoints should be saved in:
user_single_keypoints.txt
-
For dragging on a 3D Gaussian Splatting (3DGS) scene:
-
Use LGM to generate the initial 3DGS point cloud:
Initial_3DGS.ply
-
The associated input image should be:
test_mvdream_0.png
-
The file
mv_drag_points_all.txt
contains all automatically computed projected dragging points from four different viewpoints. -
The file
mv_drag_points_occ.txt
contains only the visible projected dragging points from four different viewpoints.Note: If a point is occluded in any view, its projected coordinates can be set to zero — either automatically based on depth information or manually.
You can use the provided bash script:
bash MvDrag3D/bash_test.sh
Or run the Python script directly:
CUDA_VISIBLE_DEVICES=0 python main_me.py --config configs/configs.yaml ...
(Refer to bash_test.sh
for parameter examples.)
- The results will be saved in the directory specified by the
workspace_name
parameter. - Initial_3DGS:
Initial_3DGS.ply
- After performing multi-view dragging, the editing result is:
mv_drag_best_test.png
- Based on the dragged image, a coarse 3DGS is reconstructed using LGM:
Mvdrag_3DGS.ply
- Finally, a two-stage optimization is applied: Deformation Optimization and Appearance Optimization. We otain:
Deformation_3DGS.ply and appearance_3DGS.ply.
This section explains the key parameters used during the editing and dragging process:
-
w_edit
andw_content
These are weights that balance the editing energy:w_edit
: controls how strongly the image is modified.w_content
: controls how much the original content is preserved, especially outside the editing mask.For minimal changes outside the mask, set
w_content
to a higher value (e.g.,10
).
-
guidance_scale
The classifier-free guidance (CFG) scale used in diffusion-based editing.
A larger value results in more significant changes to image content. -
num_steps
Number of denoising steps used in the MvDream editing process.
More steps generally improve quality but increase runtime. -
scale
Controls the size of the multi-view editing mask.
The default mask is a circle drawn between the dragging start and end points.
scale
adjusts the radius of this circle — higher values result in larger edit regions.
For other parameters and advanced configuration options, please refer to the settings in
main_me.py
.
For more details on parameters, optional features, or troubleshooting, please refer to other documentation in the repository or open an issue.
This project is built upon the great work of the following code bases:
We sincerely thank the authors for their contributions to the community.