ImagePulse图律脉动数据集开源发布：解码GPT-4o级图像生成能力，四大原子数据集+自动生成工具开放

GPT-4o 展现出的突破性图像生成能力已引发广泛关注。然而开源模型要实现同等能力仍需持续探索。虽然当前开源图像数据集的训练效果尚难以对标 GPT-4o，但当我们将 GPT-4o 的图像生成能力分解为

魔搭ModelScope社区

20人浏览 · 2025-04-24 18:12:56

魔搭ModelScope社区 · 2025-04-24 18:12:56 发布

01.引言

GPT-4o 展现出的突破性图像生成能力已引发广泛关注。然而开源模型要实现同等能力仍需持续探索。虽然当前开源图像数据集的训练效果尚难以对标 GPT-4o，但当我们将 GPT-4o 的图像生成能力分解为“图像风格迁移”、“图像局部编辑”等原子化能力时，开源模型已具备这些原子能力。基于此，魔搭社区 DiffSynth-Studio 团队正式启动 ImagePulse（图律脉动）数据集建设项目，构建原子能力数据集，致力于为下一代图像理解与生成模型构建关键的数据基础。

开源项目链接：

https://github.com/modelscope/ImagePulse

目前，ImagePulse 开源了四个原子能力数据集，以及对应的数据集构建脚本。

02.原子能力数据集

1、修改、添加、移除

对图像中的特定区域中的物体进行修改、添加、移除，用于训练模型的图像编辑能力。

数据集链接：

https://www.modelscope.cn/datasets/DiffSynth-Studio/ImagePulse-ChangeAddRemove

图律脉动数据集-修改、添加、移除

图1
图2
编辑区域
编辑指令	Remove the mustache and beard, change the white shirt to a blue turtleneck sweater, and remove the glass of milk.
反向编辑指令	Add a mustache and beard, change the blue turtleneck sweater to a white shirt, and add a glass of milk.

2、放大、缩小

对图像中的区域进行聚焦放大，用于训练模型的超分辨率和扩图能力。

图律脉动数据集-放大、缩小

图1
图2
放大区域
编辑指令	Zoom in to focus on the headband.
反向编辑指令	Zoom out to show the full view of the anime girl.

3、风格迁移

在保留图像结构的前提下更换图像的风格，用于训练模型的风格迁移能力。

图律脉动数据集-风格迁移

图1
图2
编辑指令	transform the image into a cartoon style with vibrant colors and a confident expression.
反向编辑指令	transform the image into a realistic portrait with a serious expression and subtle lighting.

4、人脸保持

在保证人脸特征不变的情况下对任务动作、神态等进行随机修改，用于训练模型的人脸保持能力。

图律脉动数据集-人脸保持

图1
图2
编辑指令	Add a nighttime street scene with bokeh lights in the background.
反向编辑指令	Remove the nighttime street scene and bokeh lights from the background.

03.运行数据集生成

用户可自行运行数据集生成脚本，生成更多训练数据，我们也期待开源社区的开发者们能够共同参与到 ImagePulse 数据集的建设中，一起构建下一代图像生成模型。

git clone https://github.com/modelscope/ImagePulse.git
cd ImagePulse
pip install -r requirements.txt


python change_add_remove.py \
  --target_dir "data/dataset" \
  --cache_dir "data/cache" \
  --dashscope_api_key "sk-xxxxxxxxxxxxxxxx" \
  --qwenvl_model_id "qwen-vl-max" \
  --modelscope_access_token "xxxxxxxxxxxxxxx" \
  --modelscope_dataset_id "DiffSynth-Studio/ImagePulse-ChangeAddRemove" \
  --num_data 1000000 \
  --max_num_files_per_folder 1000

-target_dir: 数据集存储路径
-cache_dir: 缓存路径
-dashscope_api_key: 百炼API Key，调用百炼 API 时需填入
-qwenvl_model_id: 百炼上 Qwen-VL 模型的 ID，调用百炼 API 时需填入
-modelscope_access_token: 魔搭社区访问令牌，上传数据集到魔搭社区时需填入
-modelscope_dataset_id: 魔搭社区数据集 ID，上传数据集到魔搭社区时需填入
-num_data: 数据样本总量
-max_num_files_per_folder: 每个打包文件中的文件数量