Pretrained text-to-image diffusion models, while often capable of offering a diverse range of different image output types, lack the specificity required to generate images of lesser-known subjects, and are limited in their ability to render known subjects in different situations and contexts.[1] The methodology used to run implementations of DreamBooth involves the fine-tuning the full UNet component of the diffusion model using a few images (usually 3--5) depicting a specific subject. Images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier. As an example, a photograph of a [Nissan R34 GTR] car, with car being the class); a class-specific prior preservation loss is applied to encourage the model to generate diverse instances of the subject based on what the model is already trained on for the original class.[1] Pairs of low-resolution and high-resolution images taken from the set of input images are used to fine-tune the super-resolution components, allowing the minute details of the subject to be maintained.[1]
Usage
DreamBooth can be used to fine-tune models such as Stable Diffusion, where it may alleviate a common shortcoming of Stable Diffusion not being able to adequately generate images of specific individual people.[4] Such a use case is quite VRAM intensive, however, and thus cost-prohibitive for hobbyist users.[4] The Stable Diffusion adaptation of DreamBooth in particular is released as a free and open-source project based on the technology outlined by the original paper published by Ruiz et. al. in 2022.[5] Concerns have been raised regarding the ability for bad actors to utilise DreamBooth to generate misleading images for malicious purposes, and that its open-source nature allows anyone to utilise or even make improvements to the technology.[6] In addition, artists have expressed their apprehension regarding the ethics of using DreamBooth to train model checkpoints that are specifically aimed at imitating specific art styles associated with human artists; one such critic is Hollie Mengert, an illustrator for Disney and Penguin Random House who has had her art style trained into a checkpoint model via DreamBooth and shared online, without her consent.[7][8]
^Yuki Yamashita (September 1, 2022). "愛犬の合成画像を生成できるAI 文章で指示するだけでコスプレ 米Googleが開発". ITmedia Inc. (in Japanese). Archived from the original on August 31, 2022. 米Google Researchと米ボストン大学の研究チームが開発した...数枚の被写体画像とテキスト入力を使って、与えられた被写体が溶け込んだ新たな合成画像を作成する被写体駆動型Text-to-Imageモデルだ。 [... developed by a research team from Google Research and Boston University, is a subject-driven text-to-image model that takes several images of a subject and text prompts to create newly generated images featuring the subject.]
^ abRyo Shimizu (October 26, 2022). "まさに「世界変革」──この2カ月で画像生成AIに何が起きたのか?". Yahoo! News Japan (in Japanese). Archived from the original on October 26, 2022. Stable Diffusionは、一般に個人の写真や特定の人物を出すのが苦手だが、自分のペットや友人の写真をわずかな枚数から学習させる「Dreambooth」という技術が開発され、これも話題を呼んだ。ただし、Dreamboothでは、巨大なGPUメモリが必要になり、個人ユーザーが趣味の範囲で買えるGPUでは事実上実行不可能なのがネックとされていた。 [Stable Diffusion is generally inadequate at generating personal photographs or specific individuals, however the development of "Dreambooth" allows training from a small number of photos featuring your pets or friends, causing quite a stir. However, the drawback is that Dreambooth requires a large amount of GPU memory, making it practically unfeasible to run on GPUs that individual users can afford within their hobbyist price range.]
^Kevin Jiang (December 1, 2022). "These AI images look just like me. What does that mean for the future of deepfakes?". Toronto Star. Archived from the original on December 8, 2022. For example, DreamBooth could be used to copy signatures or official signage to fake documents, create misleading photos or videos of politicians, manufacture revenge porn of individuals and more... A specific issue with DreamBooth and Stable Diffusion is that they're open source, Gupta continued. Unlike centralized AI-generation models that can impose regulations and barriers to image creation, the decentralized models like DreamBooth mean anyone can access and improve on the technology.
^Isabel Berwick; Sophia Smith (December 14, 2022). "Will AI replace human workers?". Financial Times. Illustrator Hollie Mengert, whose artwork was used to train an AI model without her consent, spoke publicly against the practice of training AI models on artists' work without permission.
^"Генеративные нейросети и этика: появилась модель, копирующая стиль конкретного художника". DTF (in Russian). November 9, 2022. Archived from the original on November 9, 2022. Так, совсем недавно известная художница и иллюстратор Холли Менгерт стала своеобразным датасетом для новой нейросети (не давая на то согласия)... «В первую очередь мне показалось бестактным то, что моё имя фигурировало в этом инструменте. Я ничего о нём не знала и меня об этом не спрашивали. А если бы меня спросили, можно ли это сделать, я бы не согласилась». [So, quite recently, the artist and illustrator Hollie Mengert became the data source for a new neural network (without giving her consent)... "My initial reaction was that it felt invasive that my name was on this tool, I didn’t know anything about it and wasn’t asked about it. If I had been asked if they could do this, I wouldn’t have said yes."]