AI - Stable Diffusion vs DALLE-3

billatnapier
发布于 2025-05-06 09:15
阅读 445

本文比较了Stable Diffusion和DALLE-3两种图像生成模型。Stable Diffusion基于扩散模型，通过添加噪声并去噪生成图像，而DALLE-3在图像质量和细节上表现更出色。文章还展示了使用不同模型生成图像的示例，并探讨了Stable Diffusion模型可能存在的性别偏见。

![](https://img.learnblockchain.cn/2025/05/06/1WOjJUnMucBurGChBuqutig.png)

## Stable Diffusion v DALLE-3

在图像生成方面，Goodfellow 等人\[1] 定义了生成对抗网络 (GANs) 的用法。由此，我们可以使用两个神经网络 (NNs)，它们可以互相帮助来改进模型。对于训练，其中一个神经网络将用于将纯噪声的图像转换为可识别的图像。为了增强所创建图像的创造力，我们可以集成扩散模型 \[2]，它会在多个步骤中从图像中添加额外的噪声。很可能，很快，我们将无法区分真实的摄影图像和由 GenAI 生成的图像。

### Huggingface

GenAI 的世界正在爆炸式发展，你只需看看 Huggingface 上的模型数量，就会发现一个全新的 AI 世界正在被构建：

![](https://img.learnblockchain.cn/2025/05/06/17Az_6YpPh2PjkRNw6bPAuQ.png)

闭源模型，如 ChatGPT、Bard 等，运行在通常会过滤用户内容的系统上。但开源模型显然具有能够在本地硬件上运行的优势。为此，Llama 3 提供了一个 LLM 模型，该模型已使用 80 亿或 700 亿个参数进行了训练。

因此，虽然 Llama 3 专注于文本生成，但可以使用 Stable Diffusion 将文本转换为图像。它通过生成一些噪声，然后生成与使用深度神经网络的图像匹配的图像来实现这一点。这种类型的方法对于创建比基于 GAN (Generative Adversarial Network) 的方法更详细的细节非常有用。

### Stable Diffusion

让我们尝试一下：

```
Illustrate two robots named Bob and Alice. They are drawing a llama on a
blackboard. Make it a manga style
```

```
绘制两个名为 Bob 和 Alice 的机器人。他们正在黑板上画一只美洲驼。把它做成漫画风格
```

为此，我们可以使用以下代码：

```
from diffusers import StableDiffusionPipeline
import torch

model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.to("mps")  # "mps" enables Apple Metal support for fast generation
# “mps” 启用 Apple Metal 支持以实现快速生成

prompt = "Illustrate two robots named Bob and Alice. They are drawing a llama on a blackboard. Make it a manga style"
image = pipe(prompt).images[0]
image.show()
```

为此，我们使用 CompVis/stable-diffusion-v1–4 模型 \[ [here](https://huggingface.co/CompVis/stable-diffusion-v1-4)]。由于它使用噪声来启动每个图像，因此每次生成时，我们都会获得一个新图像。在四次运行后，我们看到生成的图像每次都不同：

![](https://img.learnblockchain.cn/2025/05/06/1kVtv-QMFRcKT5jFyRwM4Sg.png)

所产生的图像显然使用的是一个简单的模型，当我们将它们与 DALLE-3 进行比较时，我们可以看到所产生的图像质量的巨大差异：

![](https://img.learnblockchain.cn/2025/05/06/1WOjJUnMucBurGChBuqutig.png)

一般来说，DALLE-3 在过去一年左右的时间里取得了巨大的进步，并提供了高质量的图像。而且，DALLE-3 还有许多新功能，包括 Studio Ghibi：

```
Illustrate two robots named Bob and Alice. They are drawing a llama on a
blackboard. Make it a Studio Ghibi style
```

```
绘制两个名为 Bob 和 Alice 的机器人。他们正在黑板上画一只美洲驼。把它做成吉卜力工作室的风格
```

这产生了：

![](https://img.learnblockchain.cn/2025/05/06/11JhmVHVlzh-oEEEGFDNvnA.png)

如果我们要求扩散模型提供一张肖像照片，它会做得更好，但也可能展示了其模型的性别偏见：

```
from diffusers import StableDiffusionPipeline
import torch

prompt = "portrait photo of a cybersecurity professional"
image = pipe(prompt).images[0]
image.show()
```

```
一张网络安全专业人士的肖像照片
```

这产生了：

![](https://img.learnblockchain.cn/2025/05/06/1is4Miz6G8HK-pM6xUmGsnw.png)

切换到另一个 runwayml/stable-diffusion-v1–5 模型，对眼镜方面没有太大帮助：

```
from diffusers import StableDiffusionPipeline
import torch

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.to("mps")  # "mps" enables Apple Metal support for fast generation
# “mps” 启用 Apple Metal 支持以实现快速生成

prompt = "portrait photo of a cybersecurity professional"
image = pipe(prompt).images[0]
image.show()
```

```
一张网络安全专业人士的肖像照片
```

这给出了一个样本：

![](https://img.learnblockchain.cn/2025/05/06/1UJEZ1NwXQlILG-JeH1dJLQ.png)

对于 DALLE-3，我们得到一个选择：

```
Sure! Would you like me to generate a realistic image of a cybersecurity professional, or are you looking for a specific style (e.g., cartoon, futuristic, formal portrait, etc.)? Also, do you have any preferences for:

Gender or age?

Clothing style (e.g., business suit, hoodie, tactical gear)?

Background (e.g., server room, dark digital background, office)?

Let me know so I can tailor the image to your needs.
```

```
当然！你希望我生成一张逼真的网络安全专业人士的图像，还是你正在寻找特定的风格（例如，卡通、未来主义、正式肖像等）？此外，你对以下内容有任何偏好吗？

性别或年龄？

服装风格（例如，商务套装、连帽衫、战术装备）？

背景（例如，服务器机房、黑暗的数字背景、办公室）？

请告诉我，以便我可以根据你的需求定制图像。
```

对于：

```
portrait photo of a female cybersecurity professional aged 30-40 and wearing
a business suit and in a server room.
```

```
一位 30-40 岁，穿着商务套装，在服务器机房的女网络安全专业人员的肖像照片。
```

我们得到：

![](https://img.learnblockchain.cn/2025/05/06/11_tyYV0lCSJqLbYmLyL5tg.png)

### Conclusions

DALLE-3 引擎的进步速度令人震惊！

### References

\[1\] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. _Advances in neural information processing systems_, _27_.

\[2\] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In _Proceedings of the IEEE/CVF conference on computer vision and pattern recognition_ (pp. 10684–10695).

>- 原文链接： [billatnapier.medium.com/...](https://billatnapier.medium.com/stable-diffusion-v-dalle-3-55d61c786b81)
>- 登链社区 AI 助手，为大家转译优秀英文文章，如有翻译不通的地方，还请包涵～

Stable Diffusion v DALLE-3

在图像生成方面，Goodfellow 等人[1] 定义了生成对抗网络 (GANs) 的用法。由此，我们可以使用两个神经网络 (NNs)，它们可以互相帮助来改进模型。对于训练，其中一个神经网络将用于将纯噪声的图像转换为可识别的图像。为了增强所创建图像的创造力，我们可以集成扩散模型 [2]，它会在多个步骤中从图像中添加额外的噪声。很可能，很快，我们将无法区分真实的摄影图像和由 GenAI 生成的图像。

Huggingface

GenAI 的世界正在爆炸式发展，你只需看看 Huggingface 上的模型数量，就会发现一个全新的 AI 世界正在被构建：

Stable Diffusion

让我们尝试一下：

Illustrate two robots named Bob and Alice. They are drawing a llama on a
blackboard. Make it a manga style

绘制两个名为 Bob 和 Alice 的机器人。他们正在黑板上画一只美洲驼。把它做成漫画风格

为此，我们可以使用以下代码：

from diffusers import StableDiffusionPipeline
import torch

model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.to("mps")  # "mps" enables Apple Metal support for fast generation
# “mps” 启用 Apple Metal 支持以实现快速生成

prompt = "Illustrate two robots named Bob and Alice. They are drawing a llama on a blackboard. Make it a manga style"
image = pipe(prompt).images[0]
image.show()

为此，我们使用 CompVis/stable-diffusion-v1–4 模型 [ here]。由于它使用噪声来启动每个图像，因此每次生成时，我们都会获得一个新图像。在四次运行后，我们看到生成的图像每次都不同：

所产生的图像显然使用的是一个简单的模型，当我们将它们与 DALLE-3 进行比较时，我们可以看到所产生的图像质量的巨大差异：

一般来说，DALLE-3 在过去一年左右的时间里取得了巨大的进步，并提供了高质量的图像。而且，DALLE-3 还有许多新功能，包括 Studio Ghibi：

Illustrate two robots named Bob and Alice. They are drawing a llama on a
blackboard. Make it a Studio Ghibi style

绘制两个名为 Bob 和 Alice 的机器人。他们正在黑板上画一只美洲驼。把它做成吉卜力工作室的风格

这产生了：

如果我们要求扩散模型提供一张肖像照片，它会做得更好，但也可能展示了其模型的性别偏见：

from diffusers import StableDiffusionPipeline
import torch

model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.to("mps")  # "mps" enables Apple Metal support for fast generation
# “mps” 启用 Apple Metal 支持以实现快速生成

prompt = "portrait photo of a cybersecurity professional"
image = pipe(prompt).images[0]
image.show()

一张网络安全专业人士的肖像照片

这产生了：

切换到另一个 runwayml/stable-diffusion-v1–5 模型，对眼镜方面没有太大帮助：

from diffusers import StableDiffusionPipeline
import torch

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.to("mps")  # "mps" enables Apple Metal support for fast generation
# “mps” 启用 Apple Metal 支持以实现快速生成

prompt = "portrait photo of a cybersecurity professional"
image = pipe(prompt).images[0]
image.show()

一张网络安全专业人士的肖像照片

这给出了一个样本：

对于 DALLE-3，我们得到一个选择：

Sure! Would you like me to generate a realistic image of a cybersecurity professional, or are you looking for a specific style (e.g., cartoon, futuristic, formal portrait, etc.)? Also, do you have any preferences for:

    Gender or age?

    Clothing style (e.g., business suit, hoodie, tactical gear)?

    Background (e.g., server room, dark digital background, office)?

Let me know so I can tailor the image to your needs.

当然！你希望我生成一张逼真的网络安全专业人士的图像，还是你正在寻找特定的风格（例如，卡通、未来主义、正式肖像等）？此外，你对以下内容有任何偏好吗？

    性别或年龄？

    服装风格（例如，商务套装、连帽衫、战术装备）？

    背景（例如，服务器机房、黑暗的数字背景、办公室）？

请告诉我，以便我可以根据你的需求定制图像。

对于：

portrait photo of a female cybersecurity professional aged 30-40 and wearing
a business suit and in a server room.

一位 30-40 岁，穿着商务套装，在服务器机房的女网络安全专业人员的肖像照片。

我们得到：

Conclusions

DALLE-3 引擎的进步速度令人震惊！

References

[1] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.

[2] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684–10695).

原文链接： billatnapier.medium.com/...

登链社区 AI 助手，为大家转译优秀英文文章，如有翻译不通的地方，还请包涵～

翻译
学分: 2
分类: AI
标签: Stable Diffusion DALLE-3 图像生成扩散模型 GAN 人工智能

本文参与登链社区写作激励计划，好文好收益，欢迎正在阅读的你也加入。

AI - Stable Diffusion vs DALLE-3

Stable Diffusion v DALLE-3

Huggingface

Stable Diffusion

Conclusions

References

0 条评论

文章目录