Skip to content

chenhonghua/MvDrag3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors

Honghua Chen, Yushi Lan, Yongwei Chen, Yifan Zhou, Xingang Pan

S-Lab, Nanyang Technological University, Singapore

Abstract

Drag-based editing has become popular in 2D content creation, driven by the capabilities of image generative models. However, extending this technique to 3D remains a challenge. Existing 3D drag-based editing methods, whether employing explicit spatial transformations or relying on implicit latent optimization within limited-capacity 3D generative models, fall short in handling significant topology changes or generating new textures across diverse object categories. To overcome these limitations, we introduce MVDrag3D, a novel framework for more flexible and creative drag-based 3D editing that leverages multi-view generation and reconstruction priors. At the core of our approach is the usage of a multi-view diffusion model as a strong generative prior to perform consistent drag editing over multiple rendered views, which is followed by a reconstruction model that reconstructs 3D Gaussians of the edited object. While the initial 3D Gaussians may suffer from misalignment between different views, we address this via view-specific deformation networks that adjust the position of Gaussians to be well aligned. In addition, we propose a multi-view score function that distills generative priors from multiple views to further enhance the view consistency and visual quality. Extensive experiments demonstrate that MVDrag3D provides a precise, generative, and flexible solution for 3D drag-based editing, supporting more versatile editing effects across various object categories and 3D representations.

Pipeline

The overall architecture of MVDrag3D. Given a 3D model and multiple pairs of 3D dragging points, we first render the model into four orthogonal views, each with corresponding projected dragging points. Then, to ensure consistent dragging across these views, we define a multi-view guidance energy within a multi-view diffusion model. The resulting dragged images are used to regress an initial set of 3D Gaussians. Our method further employs a two-stage optimization process: first, a deformation network adjusts the positions of the Gaussians for improved geometric alignment, followed by image-conditioned multi-view score distillation to enhance the visual quality of the final output.

User Instructions

1. Clone the repository

git clone --recurse-submodules https://github.com/chenhonghua/MvDrag3D.git
cd MvDrag3D

2. Install dependencies

Install the required dependencies. For example:

pip install -r requirements.txt

Or, if you use conda:

conda env create -f environment.yml
conda activate your_env_name

3. Prepare your data

  • Place your images, keypoints, and other data in the appropriate directories (e.g., MvDrag3D/dragonGaussian/viking_axe2/).

  • Prepare the src_points_path and tgt_points_path files by following the provided example formats.

  • Keypoints should be manually selected using tools like MeshLab or CloudCompare.

  • Source 3D keypoints should be saved in:

    srcs_single_keypoints.txt
    
  • Corresponding target keypoints should be saved in:

    user_single_keypoints.txt
    
  • For dragging on a 3D Gaussian Splatting (3DGS) scene:

  • Use LGM to generate the initial 3DGS point cloud:

    Initial_3DGS.ply
    
  • The associated input image should be:

    test_mvdream_0.png
    
  • The file mv_drag_points_all.txt contains all automatically computed projected dragging points from four different viewpoints.

  • The file mv_drag_points_occ.txt contains only the visible projected dragging points from four different viewpoints.

    Note: If a point is occluded in any view, its projected coordinates can be set to zero — either automatically based on depth information or manually.

4. Run the main program

You can use the provided bash script:

bash MvDrag3D/bash_test.sh

Or run the Python script directly:

CUDA_VISIBLE_DEVICES=0 python main_me.py --config configs/configs.yaml ...

(Refer to bash_test.sh for parameter examples.)

5. View results

  • The results will be saved in the directory specified by the workspace_name parameter.
  • Initial_3DGS:
    Initial_3DGS.ply
    
  • After performing multi-view dragging, the editing result is:
    mv_drag_best_test.png
    
  • Based on the dragged image, a coarse 3DGS is reconstructed using LGM:
    Mvdrag_3DGS.ply
    
  • Finally, a two-stage optimization is applied: Deformation Optimization and Appearance Optimization. We otain:
    Deformation_3DGS.ply and appearance_3DGS.ply.
    

6. Parameter Descriptions

This section explains the key parameters used during the editing and dragging process:

  • w_edit and w_content
    These are weights that balance the editing energy:

    • w_edit: controls how strongly the image is modified.
    • w_content: controls how much the original content is preserved, especially outside the editing mask.

      For minimal changes outside the mask, set w_content to a higher value (e.g., 10).

  • guidance_scale
    The classifier-free guidance (CFG) scale used in diffusion-based editing.
    A larger value results in more significant changes to image content.

  • num_steps
    Number of denoising steps used in the MvDream editing process.
    More steps generally improve quality but increase runtime.

  • scale
    Controls the size of the multi-view editing mask.
    The default mask is a circle drawn between the dragging start and end points.
    scale adjusts the radius of this circle — higher values result in larger edit regions.

For other parameters and advanced configuration options, please refer to the settings in main_me.py.


For more details on parameters, optional features, or troubleshooting, please refer to other documentation in the repository or open an issue.


Acknowledgements

This project is built upon the great work of the following code bases:

We sincerely thank the authors for their contributions to the community.

BibTeX


@article{chen2024mvdrag3d,
  title={MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors},
  author={Chen, Honghua and Lan, Yushi and Chen, Yongwei and Zhou, Yifan and Pan, Xingang},
  journal={arXiv preprint arXiv:2410.16272},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published