en DreamBooth

DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.^[1]^[2]^[3]

Technology

Pretrained text-to-image diffusion models, while often capable of offering a diverse range of different image output types, lack the specificity required to generate images of lesser-known subjects, and are limited in their ability to render known subjects in different situations and contexts.^[1] The methodology used to run implementations of DreamBooth involves the fine-tuning the full UNet component of the diffusion model using a few images (usually 3--5) depicting a specific subject. Images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier. As an example, a photograph of a [Nissan R34 GTR] car, with car being the class); a class-specific prior preservation loss is applied to encourage the model to generate diverse instances of the subject based on what the model is already trained on for the original class.^[1] Pairs of low-resolution and high-resolution images taken from the set of input images are used to fine-tune the super-resolution components, allowing the minute details of the subject to be maintained.^[1]

Usage

DreamBooth can be used to fine-tune models such as Stable Diffusion, where it may alleviate a common shortcoming of Stable Diffusion not being able to adequately generate images of specific individual people.^[4] Such a use case is quite VRAM intensive, however, and thus cost-prohibitive for hobbyist users.^[4] The Stable Diffusion adaptation of DreamBooth in particular is released as a free and open-source project based on the technology outlined by the original paper published by Ruiz et. al. in 2022.^[5] Concerns have been raised regarding the ability for bad actors to utilise DreamBooth to generate misleading images for malicious purposes, and that its open-source nature allows anyone to utilise or even make improvements to the technology.^[6] In addition, artists have expressed their apprehension regarding the ethics of using DreamBooth to train model checkpoints that are specifically aimed at imitating specific art styles associated with human artists; one such critic is Hollie Mengert, an illustrator for Disney and Penguin Random House who has had her art style trained into a checkpoint model via DreamBooth and shared online, without her consent.^[7]^[8]

References

^ ^a ^b ^c ^d Ruiz, Nataniel; Li, Yuanzhen; Jampani, Varun; Pritch, Yael; Rubinstein, Michael; Aberman, Kfir (August 25, 2022). "DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation". arXiv:2208.12242 [cs.CV].
^ Yuki Yamashita (September 1, 2022). "愛犬の合成画像を生成できるAI 文章で指示するだけでコスプレ米Googleが開発". ITmedia Inc. (in Japanese). Archived from the original on August 31, 2022. 米Google Researchと米ボストン大学の研究チームが開発した...数枚の被写体画像とテキスト入力を使って、与えられた被写体が溶け込んだ新たな合成画像を作成する被写体駆動型Text-to-Imageモデルだ。 [... developed by a research team from Google Research and Boston University, is a subject-driven text-to-image model that takes several images of a subject and text prompts to create newly generated images featuring the subject.]
^ Brendan Murphy (October 13, 2022). "AI image generation is advancing at astronomical speeds. Can we still tell if a picture is fake?". The Conversation. Archived from the original on October 30, 2022. Recently, Google has released Dream Booth, an alternative, more sophisticated method for injecting specific people, objects or even art styles into text-to-image AI systems.
^ ^a ^b Ryo Shimizu (October 26, 2022). "まさに「世界変革」──この2カ月で画像生成AIに何が起きたのか？". Yahoo! News Japan (in Japanese). Archived from the original on October 26, 2022. Stable Diffusionは、一般に個人の写真や特定の人物を出すのが苦手だが、自分のペットや友人の写真をわずかな枚数から学習させる「Dreambooth」という技術が開発され、これも話題を呼んだ。ただし、Dreamboothでは、巨大なGPUメモリが必要になり、個人ユーザーが趣味の範囲で買えるGPUでは事実上実行不可能なのがネックとされていた。 [Stable Diffusion is generally inadequate at generating personal photographs or specific individuals, however the development of "Dreambooth" allows training from a small number of photos featuring your pets or friends, causing quite a stir. However, the drawback is that Dreambooth requires a large amount of GPU memory, making it practically unfeasible to run on GPUs that individual users can afford within their hobbyist price range.]
^ Benj Edwards (December 9, 2022). "AI image generation tech can now create life-wrecking deepfakes with ease". Ars Technica. Archived from the original on December 12, 2022. But not long after its announcement, someone adapted the Dreambooth technique to work with Stable Diffusion and released the code freely as an open source project.
^ Kevin Jiang (December 1, 2022). "These AI images look just like me. What does that mean for the future of deepfakes?". Toronto Star. Archived from the original on December 8, 2022. For example, DreamBooth could be used to copy signatures or official signage to fake documents, create misleading photos or videos of politicians, manufacture revenge porn of individuals and more... A specific issue with DreamBooth and Stable Diffusion is that they're open source, Gupta continued. Unlike centralized AI-generation models that can impose regulations and barriers to image creation, the decentralized models like DreamBooth mean anyone can access and improve on the technology.
^ Isabel Berwick; Sophia Smith (December 14, 2022). "Will AI replace human workers?". Financial Times. Illustrator Hollie Mengert, whose artwork was used to train an AI model without her consent, spoke publicly against the practice of training AI models on artists' work without permission.
^ "Генеративные нейросети и этика: появилась модель, копирующая стиль конкретного художника". DTF (in Russian). November 9, 2022. Archived from the original on November 9, 2022. Так, совсем недавно известная художница и иллюстратор Холли Менгерт стала своеобразным датасетом для новой нейросети (не давая на то согласия)... «В первую очередь мне показалось бестактным то, что моё имя фигурировало в этом инструменте. Я ничего о нём не знала и меня об этом не спрашивали. А если бы меня спросили, можно ли это сделать, я бы не согласилась». [So, quite recently, the artist and illustrator Hollie Mengert became the data source for a new neural network (without giving her consent)... "My initial reaction was that it felt invasive that my name was on this tool, I didn’t know anything about it and wasn’t asked about it. If I had been asked if they could do this, I wouldn’t have said yes."]

External links

DreamBooth on GitHub.io
DreamBooth on Stable Diffusion

[ruiz-et-al-1] Ruiz, Nataniel; Li, Yuanzhen; Jampani, Varun; Pritch, Yael; Rubinstein, Michael; Aberman, Kfir (August 25, 2022). "DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation". arXiv:2208.12242 [cs.CV].

[2] Yuki Yamashita (September 1, 2022). "愛犬の合成画像を生成できるAI 文章で指示するだけでコスプレ米Googleが開発". ITmedia Inc. (in Japanese). Archived from the original on August 31, 2022. 米Google Researchと米ボストン大学の研究チームが開発した...数枚の被写体画像とテキスト入力を使って、与えられた被写体が溶け込んだ新たな合成画像を作成する被写体駆動型Text-to-Imageモデルだ。 [... developed by a research team from Google Research and Boston University, is a subject-driven text-to-image model that takes several images of a subject and text prompts to create newly generated images featuring the subject.]

[3] Brendan Murphy (October 13, 2022). "AI image generation is advancing at astronomical speeds. Can we still tell if a picture is fake?". The Conversation. Archived from the original on October 30, 2022. Recently, Google has released Dream Booth, an alternative, more sophisticated method for injecting specific people, objects or even art styles into text-to-image AI systems.

[yahoojpn-4] Ryo Shimizu (October 26, 2022). "まさに「世界変革」──この2カ月で画像生成AIに何が起きたのか？". Yahoo! News Japan (in Japanese). Archived from the original on October 26, 2022. Stable Diffusionは、一般に個人の写真や特定の人物を出すのが苦手だが、自分のペットや友人の写真をわずかな枚数から学習させる「Dreambooth」という技術が開発され、これも話題を呼んだ。ただし、Dreamboothでは、巨大なGPUメモリが必要になり、個人ユーザーが趣味の範囲で買えるGPUでは事実上実行不可能なのがネックとされていた。 [Stable Diffusion is generally inadequate at generating personal photographs or specific individuals, however the development of "Dreambooth" allows training from a small number of photos featuring your pets or friends, causing quite a stir. However, the drawback is that Dreambooth requires a large amount of GPU memory, making it practically unfeasible to run on GPUs that individual users can afford within their hobbyist price range.]

[5] Benj Edwards (December 9, 2022). "AI image generation tech can now create life-wrecking deepfakes with ease". Ars Technica. Archived from the original on December 12, 2022. But not long after its announcement, someone adapted the Dreambooth technique to work with Stable Diffusion and released the code freely as an open source project.

[6] Kevin Jiang (December 1, 2022). "These AI images look just like me. What does that mean for the future of deepfakes?". Toronto Star. Archived from the original on December 8, 2022. For example, DreamBooth could be used to copy signatures or official signage to fake documents, create misleading photos or videos of politicians, manufacture revenge porn of individuals and more... A specific issue with DreamBooth and Stable Diffusion is that they're open source, Gupta continued. Unlike centralized AI-generation models that can impose regulations and barriers to image creation, the decentralized models like DreamBooth mean anyone can access and improve on the technology.

[7] Isabel Berwick; Sophia Smith (December 14, 2022). "Will AI replace human workers?". Financial Times. Illustrator Hollie Mengert, whose artwork was used to train an AI model without her consent, spoke publicly against the practice of training AI models on artists' work without permission.

[8] "Генеративные нейросети и этика: появилась модель, копирующая стиль конкретного художника". DTF (in Russian). November 9, 2022. Archived from the original on November 9, 2022. Так, совсем недавно известная художница и иллюстратор Холли Менгерт стала своеобразным датасетом для новой нейросети (не давая на то согласия)... «В первую очередь мне показалось бестактным то, что моё имя фигурировало в этом инструменте. Я ничего о нём не знала и меня об этом не спрашивали. А если бы меня спросили, можно ли это сделать, я бы не согласилась». [So, quite recently, the artist and illustrator Hollie Mengert became the data source for a new neural network (without giving her consent)... "My initial reaction was that it felt invasive that my name was on this tool, I didn’t know anything about it and wasn’t asked about it. If I had been asked if they could do this, I wouldn’t have said yes."]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]