AI in Flux

Hands-on with “FLUX.1”, the new state of the art
text-to-image generation model from Black Forest Labs

Today we’re going to explore FLUX.1, the new family of state-of-the-art text-to-image models from Black Forest Labs. The bright minds behind Stability AI’s Stable Diffusion models recently struck out on their own with a seed funding round of $31M Andreesen Horowitz and have delivered exceptional new text-to-image generation capabilities with their first family of Flux.1 models.

In this post, we’re going to talk about what FLUX.1 is, the quality of images you can create with it, how to test drive it, and how it compares with other text-to-image models. I’ll cover some ways you can dive deeper with the model using both cloud and local tools for image generation and fine-tuning and then conclude with implications of FLUX capabilities for brands and creators.

What is FLUX.1?

FLUX.1 is a suite of text-to-image generation models designed for image synthesis and generation. FLUX.1 [pro] is Black Forest’s commercial version of the model and provides top-of-the-line prompt following, visual quality, image detail, and output diversity, and is available via the Black Forest Labs API. FLUX.1 [dev] is an open-weight model for non-commercial applications, and FLUX.1 [schnell] is a fastest model tailored for local development and personal use, openly available under an Apache2.0 license. FLUX.1 defines the new state-of-the-art in image synthesis, with models that set standards in their respective model class. We’ll talk more about those benchmarks in a moment, but first...

Take a test drive with FLUX.1

Before you read any further, I highly recommend trying FLUX out for yourself with this simple image generation demo Black Forest has made available on HuggingFace. The best way to experience the power of this model is to create your own images. Here’s an image I created with the tool using the prompt “a high resolution photograph, vintage style, of a pink classic convertible car”.

                      Pink Classic Car, created using Flux.1

Here's another image created using the FLUX Demo using the prompt “a high resolution photograph, futuristic, of a colorful cityscape at night”

Futuristic Cityscape at Night, created with Flux.1

And one more image from the demo for fun. Here’s a high-resolution generative image of a person using the prompt “A high-resolution photograph, vintage style, of a young man on a sailboat, sailing cape cod”. Now go try a few prompts in the demo yourself and see what you can create!

                            Young Man on a Sailboat, created with Flux.1

The evolution of text-to-image models

I’ve been fine-tuning latent diffusion models since Dreambooth caught fire in early 2023. I started experimenting with Dreambooth to train models for the characters from my Nuggies illustrated children’s book series. I was never really successful in training good models with Dreambooth, but it wasn’t long before Stablity AI’s Stable Diffusion model launched and evolved quickly from good (SD 1.5), to curiously not so great (SD 2.0), back to pretty good (SDXL), and now with SD3 Medium, almost great­. While the quality of fine-tuned output in Stability AI’s models has improved significantly over time, limitations remained. Fine-tuned Stable Diffusion models like SD 1.5 and SDXL tended to produce a lot of image artifacts like weird hands and other errant anatomical features. They have also never been great at rendering text. For fine tuners like me, this meant a lot of negative prompting, throwaway images, and/or post-processing on images. I’ve gotten great results fine-tuning SDXL over time, but working with the model always felt like a wrestling match. It generally took dozens of image iterations with a lot of post-processing to get one high quality fine-tuned image. You can see the evolution of Stability Diffusion in the output from my illustrated character models below.

Stable Diffusion 1.5 LORA for ‘Baby Chomper’ Illustrated Character

Output from SD1.5 was good, but not great. You can see discoloration in the edges of the AI generated Baby Chomper character, his ear is out of proportion, but overall, the essence of the character was well captured in the model.

'Baby Chomper', illustrated character, modeled with Stable Diffusion 1.5

Stable Diffusion SDXL LORA for ‘Chomper’ illustrated character

'Chomper', illustrated character, modeled with Stability AI's SDXL 

SDXL really began to demonstrate the power of this technology to capture objects and styles and then generate them in any context. The character representation in this model for the Chomper character was quite good, however artifacts remained, like the dark black and white outline around his body, which was prevalent in most output from this fine-tuned model.

Fine-tuning explained: Text-to-image fine-tuning adapts a base model like FLUX.1 or SDXL to specific objects (people or products for ex.) or styles (an illustration or photographic style) by further training it on a custom set of images to create a LORA, or Low Rank Adaptation model. LORAs allow for customization of base models like SDXL or FLUX.1 to specific styles or subjects, providing the ability to generate custom images of any specific person, product, object or style in any context using simple text-prompts.

Why FLUX is so Fetch

Now we have FLUX.1 from Black Forest Labs and… WOW. FLUX seems to be good at everything right out of the box. Anatomy, hands, high resolution details. You want text in your images? FLUX renders text… perfectly. FLUX gets all the details right.  Photos of human subjects look so real; you’d never know they were AI generated.  I’m not sure what kind of black magic the guys over at Black Forest are practicing, but they landed this first version of their text-to-image model perfectly.

Don’t take my word for it, check out these benchmarks for FLUX.1 variants vs state-of-the-art image models available today. The Pro and Dev versions of FLUX.1 significantly outperform popular commercial models like Midjourney, Dall-E 3 HD, and SD3 Medium on dimensions like visual quality, prompt adherence, typography, size variants, and diversity of output. Even the lightweight Schnell version of FLUX is highly performant relative to other models and you can run it on your local computer provided you have a GPU with at least 12GB of RAM.

         Flux.1 Performance – Image Credit: Black Forest Labs

Examples of fine-tuned output from FLUX.1 Dev:

I’ve already migrated two of my SDXL illustrated character models to FLUX.1 [DEV] using AI-Toolkit and the results are incredible. The character details and illustration style have been captured almost perfectly. The spooky thing about Flux is that the output images are actually BETTER than the input images. That is to say, it has improved on the original illustration quality and style. Check out the examples below and then we’ll walk through how to fine-tune the model using your own images in the next section.

Coco in a Jeep, Created with a custom LORA for FLUX.1 [Dev]

Chomper sleeping, with text rendering, created with a custom LORA for FLUX.1 [Dev]

Getting Started with FLUX.1

If you want to go deeper and start using or fine-tuning FLUX, here are a few ways to get started:

Free FLUX.1 Demo (Easiest!) - You can use the Black Forest FLUX demo on Hugging Face to generate images without fine-grained controls over the outputs.

Local Inference with ComfyUI (Advanced) - If you want finer-grained controls over your text-to-image generation, you can run FLUX.1 locally on your PC using ComfyUI, a node-based graphical interface, for text-to-image generation which allows you to experiment with an almost infinite number of inference parameters and settings. You can run ComfyUI on a Windows PC. You’ll need a modern NVIDIA GPU with at least 12GB of RAM to run the FLUX.1 Schnell model, and 24GB of RAM to run the FLUX.1 Dev model. Here is a tutorial to help you get started with ComfyUI.

ComfyUI interface and workflow for Flux inference

Fine-tuning FLUX.1 in the cloud (Intermediate)- If you want to fine-tune FLUX with your own images to create custom images based on a specific style or subject, you can do this instantly and with little technical experience using cloud-based services like Fal.ai or Replicate.  For a few dollars of GPU time, you can simply upload your training images, set a few parameters, and click train, and these services will create a custom LORA for FLUX based on your style or subject.

All you need is a credit card and a set of images of the subject or style you want to train. I plan to do a walkthrough of each of these services soon, so sign up for my newsletter if you want to get an alert when those new posts drop. Check out Replicate and Fal.AI using these links:


Fine-tuning FLUX.1 on your own Local PC with AI-Toolkit (Hard)
- If you’re willing to brave the command line and are able to edit simple .JSON configuration files using an IDE like VisualCode, you can fine-tune FLUX using AI-Toolkit.  If you want to deploy FLUX fine-tuning locally on your machine, you will need an NVIDIA GPU with at least 12GB to model with Schnell (24GB for the Dev version of the model) and you will need to deploy AI-Toolkit, a python library of tools for fine-tuning text-to-image models.

There are many great tutorials available on how to fine-tune FLUX with AI Toolkit. Here are a few that helped me get started:

Lessons Learned from Fine-Tuning FLUX.1 with AI-Toolkit and running FLUX inference in ComfyUI

The step-by-step tutorials for AI-Toolkit and ComfyUI will help you get off to a fast start, however I found I still had to endure several days of nitty troubleshooting to get everything working. A few tips and tricks for you in case you run into trouble:

Managing the AI-Toolkit File Structure on PCs – I like to use a lot of subfolders to manage my different training image data sets. For whatever reason, AI-Toolkit’s configuration file for LORA training chokes on subfolders and will not run training if you use a path to one…so you will need to create separate folders in the parent directory for each training data set. Not ideal, but it works.

Don't forget to setup your trigger word in the AI-Toolkit .config file for training

Trigger Words – Each of your training images needs to be accompanied by a text file with a caption explaining what the subject is in each image. AI-Toolkit training configuration files can be configured to inject a “trigger” word into each of these files by simply adding [trigger] somewhere in the caption. You’ll want to edit your caption files with this tag and then add a unique trigger word to your .config file for training. As an example, I use the trigger words Coco555 and Chomper555 for my illustrated character models. Once I’ve successfully trained the LORA and start running it in ComfyUI to generate new images, I can prompt for these characters in any context simply by adding the character trigger word to the prompt.

Inference Parameters in ComfyUI  - after I trained my first illustrated character model with FLUX using AI-Toolkit, I was excited to start generating images. I loaded up the LORA, set my prompt, and clicked generate.  To my utter disappointment, all I got was a terrible fuzzy image akin to snow on a TV screen. I spent a day or two pulling my hair out trying to troubleshoot. I retrained the LORA using more training steps and a higher learning rate. After retraining, I got the same terrible output. After more hair-pulling, I learned that FLUX image generation is very sensitive to the combination of scheduler and sampler parameter settings in ComfyUI. It turns out my LORAs were fine, I was just using incompatible parameters which were throwing off my output.

This handy guide and comparison of samplers and schedulers for Flux by CoffeeVectors over at CivitAI was a lifesaver. I highly recommend testing different combinations to see what works best for your model. In my case the sampler/scheduler combination of LCM + Beta gave me the most accurate image output for my trained models.

Workflows in ComfyUI

If you’re new to ComfyUI, the tool can be a bit overwhelming at first. There are an infinite number of ways to configure Comfy for running your models. The good news is that there is a huge community of tinkerers out there sharing their work and, specifically, their workflows for ComfyUI. Workflows are templates which have all the model parameters you need to start generating images already setup. Here’s the workflow I used to get started with generating images in ComfyUI with my first model:

FLUX + LORA (simple) | ComfyUI Workflow (openart.ai)

Additional FLUX workflows from StableDiffusion Tutorials can also be found here:

stablediffusiontutorials/Flux-workflows at main (huggingface.co)

All you have to do to deploy a workflow template is drag and drop the file onto your ComfyUI UX and voila…the new workflow will appear. You need to adapt the settings to choose your preferred model, LORA, and inference parameters, but workflows take 95% of the pain out of setting up a new ComfyUI inference routine.

Real World Use Cases for FLUX.1

FLUX.1’s capabilities have some serious real-world implications for brands and creators that rely on high quality custom image creation and design to do their work.

For Brands - By fine-tuning FLUX.1 with their own creative and product images, brands can build incredible flexibility and speed into their design workflows. A brand like Nike, for example, could fine-tune Flux and create custom LORAs for each of its shoe products, and then instantly concept out new creative for marketing campaigns that put those products into any context- like on the edge of an aircraft carrier or on top of the Burj Khalifa. For brands with large product portfolios, operating in dozens of geographic regions and hundreds of local markets, being able to tailor promotional product images for each market on the fly and make them relevant for different audiences with magazine quality photography is a game-changer. It allows designers unlimited degrees of freedom to explore different concepts for campaigns, to quickly test those concepts, and then scale them out without endless revisions and rework. Brands can also use FLUX to train LORAs to create custom stylized images based on their library of branded assets like logos, illustrations, and brand photography. This will allow creative teams to quickly create custom “brand-aware” images for marketing campaigns in minutes vs days or weeks.

For Creators For creators with unique intellectual property like illustrations or photography, FLUX can be an invaluable tool for augmenting their work. For me, as an author of several illustrated children’s books, being able to model the characters and then quickly create new content for social media and other promotional channels is invaluable. Flux won’t replace hand-drawn illustrations for my books, but it does provide an easy way to create fresh new engaging content for our readers to keep them interested in between book releases. For photographers, being able to take photo libraries of unique subjects like people or cars and then train custom LORAs to manipulate them with Generative AI opens up a whole new world of creative possibilities for showcasing their work.

  Logo generated using Flux.1 [Dev]

Social Sharing Image for Derivativ Blog, generated using Flux.1 [Dev]

For Everyone – For anyone who needs a unique image for any use case, FLUX is an incredible tool. I created the Derivativ logo concept in about 15 minutes of experimentation using FLUX. I did some post-processing on the logo in Adobe Photoshop to make the font align to my website font and touched up some minor pixelation in the “D” design element, but it is otherwise as generated from Flux…nice! I also used FLUX to create eye-catching header and social sharing images for my blog and Medium pages. You can create anything your imagination can dream up and FLUX will deliver images with exceptional quality every time.

Conclusion

I'm blown away by the power of FLUX.1. It is a giant leap forward, completely revolutionizing image synthesis and generation. The full power of FLUX fully reveals itself when you fine-tune the model and start creating your own custom images. With fine-tuning, you can accurately model subjects and styles and then place them in any visual context you can imagine. I’ll be writing more about FLUX and related tools in the months to come. I hope this post inspired you to begin your own exploration of FLUX and you’ve already got an image or two under your belt!

Previous
Previous

Get to know Pinokio