Hi
A few month ago, FLUX.1 models series has been released. I give them a try and was extremely impressed by the results, for me it's the best open source image generation model available now.
More here : https://flux-1.ai/
Vanilla off-the-shelf FLUX models are great and address some of the main issues with existing ones, like Stable Diffusion (which struggles with rendering hands correctly and writing text).
But the great thing is that FLUX is open source, meaning it's possible to fine-tune the model with a consumer-grade GPU (a 24GB VRAM GPU, which costs around $2000, can handle it). Since I have both the hardware and experience in fine-tuning such large models, I gave it a try, and the results are extremely impressive.
I succeeded in adding characters to the model with remarkable results.
Example:
I trained the model with just a few pictures of a French actress (10 in total), the real actress looks like this:
The training last for about 1 hour.
Now the model knows who is Laure, and I can use her in the prompt.
Second test more tricky with text
And a more tricky one with an indirection on the prompt
Here we are with 10 photos of someone, you can create virtual photo shoot (for the best) or terrible deep fake (for the worst).
We have 10 photos of someone, and with them, you can create either a virtual photoshoot (at best) or a terrible deep fake (at worst).
So, what do you think? Will it be the worst or the best?
A few month ago, FLUX.1 models series has been released. I give them a try and was extremely impressed by the results, for me it's the best open source image generation model available now.
More here : https://flux-1.ai/
Vanilla off-the-shelf FLUX models are great and address some of the main issues with existing ones, like Stable Diffusion (which struggles with rendering hands correctly and writing text).
But the great thing is that FLUX is open source, meaning it's possible to fine-tune the model with a consumer-grade GPU (a 24GB VRAM GPU, which costs around $2000, can handle it). Since I have both the hardware and experience in fine-tuning such large models, I gave it a try, and the results are extremely impressive.
I succeeded in adding characters to the model with remarkable results.
Example:
I trained the model with just a few pictures of a French actress (10 in total), the real actress looks like this:
The training last for about 1 hour.
Now the model knows who is Laure, and I can use her in the prompt.
First prompt : "Laure, sitting in the grass in Paris in a public park with a yellow dress and red shoes, Eiffel Tower back ground, natural lighting of sunset"
Second test more tricky with text
Prompt : "Laure floating in the air in the International Space Station with a blue suit holding with 2 hands a paper with the text "Hello ACF"
You will note the quality of the handsPrompt "Laure in manga anime drawing with a short black dress"
And a more tricky one with an indirection on the prompt
prompt : "A man is standing, viewed from the back, painting a portrait of Laure on an watercolor paint board. He is holding a paintbrush in one hand and a palette in the other"
Here we are with 10 photos of someone, you can create virtual photo shoot (for the best) or terrible deep fake (for the worst).
We have 10 photos of someone, and with them, you can create either a virtual photoshoot (at best) or a terrible deep fake (at worst).
So, what do you think? Will it be the worst or the best?