Stable Diffusion 3 in R? Why not? Thanks to {reticulate} šā¤ļøš
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
āFascinatingā describes my journey with Stable Diffusion 3. Itās deepened my appreciation for original art and masterpieces. Understanding how to generate quality art is just the beginningāit drives me to explore the underlying structure. Join me in exploring SD3 in R!”
Objectives
- Motivation
- What is Stable Diffusion?
- Installation
- Reproducible Code & Explaination
- Results
- PNG and Metadata
- Why SD3 in R?
- Opportunities for improvement
- Lessons learnt
Disclaimer
This is for educational purpose only. Please read through the intended uses, safety, risk and mitigation here. All images in this article were generated by Stable Diffusion 3 (SD3). Your hardware may be different than mine. The following codes were made for Mac Metal.
Motivation
Since our fascination with LLM, our next adventure is generative AI Art. This comes in handy when generating images for blogs. Since our experience with prompt engineering, RAG, has been quite informative, why not give genAI Art a try as well! Our next adventure is in text2img gen AI. Letās take a look, a simple approach from installation and getting started in R!
Python <- Positron -> R
Alright, I have to admit, I did not start this off in R first. I used python in Positron to figure out how it works before transitioning to R. To be honest, Positron has been one of my favorite IDE! I wasnāt much of a pythoner, mainly because of the lack of single line / chunk execution (without being in a notebook of course), as itās hard for me to understand what each chunk of code means or returns without executing them one by one, just like Rstudio! But with Positron, itās like Rstudio for python! Itās been a great journey and I am finding myself liking python the same as R! Itās a great feeling! You can technically use Rstudio for python (which I have), but I found the autocompletion in Rstudio did not return a complete lists of python modules as VScode or Positron. š¤·āāļø Maybe itās just me. Getting to know both languages python and R, I feel whatās there not to ā¤ļø in both of these elegant languages !? I say, we should use both! š¤£ Oh what do I knowā¦
What is Stable Diffusion?
Stable Diffusion is a deep learning, text-to-image model that uses diffusion techniques to generate detailed images based on text descriptions. Itās also capable of other tasks like inpainting and outpainting. Released in 2022, itās a product of Stability AI and is considered part of the current AI boom. Unlike previous proprietary models, Stable Diffusion is open-source and can run on most consumer GPUs, making it more accessible.
Installation
I assume you already know how to use R
and have reticulate
installed with some python
knowledge.
1. Create a python environment
library(reticulate) virtualenv_create(envname = "sd") # you can change sd to whatever you want
Virtual environments in Python provide isolated spaces for projects, each with its own set of dependencies and potentially different Python versions. This isolation prevents conflicts between projects, ensures reproducibility, and makes dependency management easier.
You would have to restart your IDE in order to use the environment
2. Use the virtual environment & Download the necessary python modules
library(reticulate) use_virtualenv("sd") # installation py_install(c("diffusers","transformers","torch","torchvision", "torchaudio","accelerate","sentencepiece","protobuf"), envname = "sd", pip = T)
Reproducible Code & Explaination
1. Load modules
diffusers <- import("diffusers") torch <- import("torch") StableDiffusion3Pipeline <- diffusers$StableDiffusion3Pipeline pil <- import("PIL")
2. Load Model
pipe <- StableDiffusion3Pipeline$from_pretrained("stabilityai/stable-diffusion-3-medium", torch_dtype=torch$float16) pipe$to("mps") #assign to metal, change to 'cuda' if you have nvidia gpu or 'cpu' if you have neither # prepare to generate seed generator <- torch$Generator()
Now, if this is your first time running this, it might take sometime to download the model. It may ask for a Huggingface API key, if you have not created an account or obtained a key, please click here to request an access token. You will need to check āread/writeā on repo access in order for the token to work. If you have tried and failed, please let me know, Iāll see if I can assist you in getting the right one.
Once your model is downloaded, it will get loaded and weāre ready to go! If you want to save the model for future local use without re-downloading, save it to your desired directory.
2.5 Optional
pipe$save_pretrained("stable_diffusion_v3_model/")
3 Prompt & Settings
metadata <- list( prompt = 'paint mona lisa by da vinci in Picasso\'s cubism style which is represented by fragmented forms, multiple perspectives in a single image, geometric shapes', num_inference_steps = 60L, height = 512L, width = 512L, seed = 1000L, guidance_scale = 8 ) output <- pipe(prompt = metadata$prompt, prompt_3=metadata$prompt, num_inference_steps = metadata$num_inference_steps, height = metadata$height, width = metadata$width, generator = generator$manual_seed(metadata$seed), guidance_scale = metadata$guidance_scale) output$images[[1]]$show()
If youāre just trying out, you can do withou the metadata
and insert all those parameters directly onto pipe
. But if youāre planning to generate multiple images, itās best to save them in a list for easy metadata insertion when youāre saving to PNG
.
Depending on the speed of your hardware, the generation of the image may take sometime, with the above prompt and setting, mine took about 2-3 seconds per iteration (per num_inference_steps). You can play around with that setting to speed things up before finetuning the image to get better quality. Same with the width and height, the smaller it is the faster it generates.
With the above prompt, you should see exactly this.
If you want the model to be more consistent in following the prompt, you can increase the guidance_scale
parameter. The higher the number, the more the model will try to follow the prompt. But this also takes a hit in quality.
prompt_3
is needed if your prompt has longer than 77 tokens as the regular CLIP would cut off anything after that.
See this.
Lastly, the show()
code will open up the png file created in a separate window. You can view it in Rstudio, but will have to use package such as magick
.
Results
Paint mona lisa by da vinci in Picasso's cubism style which is represented by fragmented forms, multiple perspectives in a single image, geometric shapes
Interesting, it does have some cubism style but not distinctive enough to be recognized as Picassoās in my opinion. Might have to work a bit more on the prompt, play with the seed, guidance scale in order to get the result I really want.
Traditional Chinese ink wash painting depicting a ballet dancer in motion. Elegant, flowing brushstrokes capture the graceful movements of the dancer. Minimalist style with areas of negative space. Emphasis on dynamic lines and gestural forms. Monochromatic palette with varying shades of black ink on aged, textured paper. Subtle splatter and dry brush techniques add depth. Composition inspired by Song Dynasty landscapes.
I think LLM generated prompts appear to have much more detailed used by SD3.
generate new york central park oil paiting with an ukiyo-e style style, sunset, winter season
Even though this is not an LLM generated prompt, I am quite content with this. It depicts the essence of central park, with the city buildings behind, in a slight ukiyo-e style. I wonder how I can make it more of that style with prompt. ?perhaps I have to use LoRA
generate a painting of an oak tree, summer, highlighting the intricate details of the oak leaf shape and its vein, dandelion seeds floating in the air as the wind blows across the oak tree, with realistic portraiture, sfumato technique, classical themes
I am quite content with this one as well, though I had to increase the num_inference_step significantly. The color though is not very summer like š¤£ looks more like a fall, and i donāt think dandelion seeds are seen at that time of the year. lol! Oh well.
Photorealistic close-up of a white ceramic mug on golden sand, with āToo blessed to be stressedā written in elegant script. Tranquil beach scene in background with gentle waves rolling onto shore. Soft, warm lighting reminiscent of golden hour. Sharp focus on mug with slight depth of field blurring the ocean. Visible texture of sand grains around the mug. Reflections of sky and water on the mugās glossy surface. Hyper-detailed rendering with vibrant yet natural colors
Too blessed, be wh!?!?! I beg your pardon !?! š¤£ Picture looks great though! Might have to try different seed and or increase guidance? What do you think?
Georgia Oā Keeffe Flower Painting of dandelion flower, floral motif
This looks quite pretty. Doesnāt look very OāKeefe thoughā¦ itās pretty stiil!
PNG and Metadata
Instead of saving prompt and config separately, we can save them all inside png
file.
# Insert metadata to save as png info <- pil$PngImagePlugin$PngInfo() info$add_text(key = "prompt", value = "whatever prompt we want") # Save PNG output$images[[1]]$save("something.png", format="PNG", pnginfo = info)
And when you load them through pillow
, you can read the metadata like so
test_png <- pil$Image$open("ballet2.png") # Metadata of the ballet dancer in caligraphic style test_png$info ## $prompt ## [1] "Traditional Chinese caligraphic painting depicting a ballet dancer in motion. Elegant, flowing brushstrokes capture the graceful movements of the dancer. Minimalist style with areas of negative space. Emphasis on dynamic lines and gestural forms. Monochromatic palette with varying shades of black ink on aged, textured paper. Subtle splatter and dry brush techniques add depth. Composition inspired by Song Dynasty landscapes." ## ## $num_inference_steps ## [1] 200 ## ## $height ## [1] 512 ## ## $width ## [1] 512 ## ## $seed ## [1] 1001 ## ## $guidance_scale ## [1] 8
The numbers will be in string, you can use the bonus function below to easily convert them into integer or double.
Bonus!!!
Letās write a function that adds the metadata above automatically.
# function for conversion pnginfo <- function(x) { result <- pil$PngImagePlugin$PngInfo() for (i in 1:length(x)) { result$add_text(key = names(x)[i], value = as.character(x[[i]])) } return(result) } # save in a list info <- pnginfo(metadata) # then add in when saving to png output$images[[1]]$save("code.png", format="PNG", pnginfo = info)
Well, I figured that there is quite a bit to convert character in the metadata PNG to integer etc to be quite cumbersome, Iāve written a function to do the conversion so that it is simple enough to get a reproducible piece.
convert_back <- function(x) { int_float <- function(num) { if (num %% 1 == 0) { return(as.integer(num)) } else { return(as.numeric(num)) } } for (i in 1:length(x)) { if (str_detect(x[[i]], "^[0-9]")) { x[[i]] <- int_float(as.numeric(x[[i]])) } else { next } } return(x) } metadata <- test_png$info metadata <- convert_back(metadata)
Now you can easily convert the metadata back to integer or double. Why the difference in integer and oduble distinction? Some of the diffusers parameters prefer integer as opposed to double/float
And if there are any images in this article interest you, you can download the png
and use the method described above to get the prompt and parameters to reproduce the exact piece. Itās deterministic!
Why SD3 in R?
Investigate function and its output
Well, it works really well when weāre investigating whatās in the package, and also generated objects. I had no idea what the output of the pipe
was, but at least with Rstudio, we can easily click on the list and see whatās up! You can probably do that in Positron as well. Not VScode, at least not to my knowledge. Maybe Iām wrong.
Easily generate images
We can easily generate images for our blog, projects etc. Here weāve written functions to automatically insert metadata onto our png files and also to convert it back! How cool! LOL, using a python package.
More control over parameters
Why SD3 though? Why not just use gemini, openAI, midjourney? Well, yes you can use those, but I think having the control of the seed, and its parameters, I think we can slowly understand how SD3 generates certain aspect of style. I think thatās quite important.
Code chunk execution
This I think is the winner for me. In both Positron and Rstudio this allows me to learn python and also the modules of interest, without being in a notebook! It flows better for me as a non-power user.
Opportunities for improvement
Wow, there are so much to learn! We didnāt even go through the basics and fundamentals of SD3, the tokenizer, encode, VAE, denoiser, scheduler, etc! Maybe next time! But the math and algorithm behind is just purely fascinating! Latent space, and the most mind-blowing of all, itās deterministic!
Other things I want to learnt and apply are:
- upscale: which is the turn a low-res image to a high-res, using img2img. I think this will be quite straight forward
- controlnet: which is to control the output of the image, such as the color, style, etc. I think this will be quite challenging
- IP-adapter: which is to adapt the image to a certain style, such as surrealism, cubism, etc. I think this will be quite challenging as well. Think of it as extracting feature of an image and use it to guide image generation with prompt.
- apply LoRA for certain style, such as ukiyo-e woodblock print or traditional chinese caligraphic style.
- If we truly want to focus on art, we should use WebUI such as ComfyUI instead of writing codes for these. Though there are definitely benefits in automating certains tasks with codes
I highly recommend this book
Using Stable Diffusion with Python by Andrew Zhu if you want to get deeper into how diffusers
module and SD3 works.
Lessons learnt
- negative prompt does not work much in SD3
- longer prompt (>77 tokens) can be inserted in
prompt_3
- challenging for model to maintain fidelity in generating exact words, instead of using the word āspellā, use āsayā I thought might be a better approach
- generate a few inference steps (e.g. 20) and see if the initial result looks good before increasing the steps for further refinement of the quality
- the diffusers documentation is not bad! Quite informative
- generate prompts via LLM creates a better quality image
- set seed and tweak to learn how to model behaves
- learnt how to insert metadata in png files, if youāre interested in knowing my prompt and params for all pngs here, please feel free to use
pillow
to extract it!
If you like this article:
- please feel free to send me a comment or visit my other blogs
- please feel free to follow me on twitter, GitHub or Mastodon
- if you would like collaborate please feel free to contact me
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.