TLDR: Got Image Generation and Image Editing ("make this photo, anime style") onto Open WebUI today. All entirely self-hosted, using the very powerful Qwen Image Edit model released in September 2025 (which is based on the also very powerful Qwen Image model 20b).

In October 2024, I was studying reverse proxies, and refreshing my docker knowledge by setting up Immich and my own mail server on my Rpi (both defunct, good learning experiences though). I think, sometime in early 2025, was the first time I installed Ollama (the very popular LLM self-hosted server) and the accompanying Open Webui (the GUI), on my small (and aging, circa 2018) Intel NUC PC. Slow as heck, running tiny models, and more of a novelty back then to be able to run a LLM on your PC than for practical purposes. Back then, I (wrongly) thought that since it was recommended as a "starter" kit to run local LLMs with Ollama, Open WebUI was just a basic chat GUI.

In April 2025, I half-successfully forked a Whisper-based solution, Scriberr, so that it could display live transcription. These were the days before I started using Cursor, so it was basically copy and paste galore with ChatGPT. Again, pretty slow on my Intel PC, but the hope was that all this could scale up when I got a better PC.

In May 2025, I started work on a python script to find air tickets with the best redemption miles. Currently at standstill, but perhaps at the rate Cursor is advancing at, I can restart it again at some point.

Unsatisfied with the apparent simplicity of Open Webui, in June 2005, I tried building my own Oobabooga (a web ui) docker image butt failed due to some C++ dependencies. Eventually found a working docker image but the huge amount of configuration options - most of which were irrelevant to me anyway because I didn't even have a GPU then - turned me off. Around this time, I started experimenting with RAGFlow, using of all things, my phone as a LLM API server (because it was actually faster than my Intel at inferencing). I ran multiple hour-long sessions embedding ONE case (Denka Advantech) - in comparison, my GPU now crunches it in 30 seconds - just so I could run vector searches (I must have run the same query maybe a hundred times - my test prompt was a simple one - explain why the SGCA declined to follow Cavendish. I think after something like 20-30 tries with different RAPTOR settings, I finally got the "golden" answer, only to lose it after some minute change in settings.

Eventually I pulled the trigger and got myself one of those Nvidia GPUs and basically got a new system using my old m-ITX PC case. but due to a very long shipping delay on one of my PC components from Amazon (a 64GB x 2 RAM kit that nobody sold locally), my Beast PC was finally assembled late July/early August.

Between August and November, lots and lots of experimenting! Found out Ollama really sucked at async RAG processing, so moved to vLLM (also a royal pain to configure) which gave me blazing speeds. Then figured out that I needed to get past RAG, and went into Langflow, structured output, document processing/contract review flows. Currently a bit stuck because I need to make major architectural choices, like how do I store my contract clauses? I guess I can't be storing all that in JSON...so perhaps just use a regular SQL database (and then RAG it)? But do I even need to store contracts clause-by-clause that way? Probably, for the sake of document automation/assembly software, but which I haven't looked into detail yet. Along the way, I also dabbled in finding a contract redlining/editing tool - found GPT Local Host as a MS Word Add in, but its redlining was not precise enough (it replaced entire sentences instead of doing word diffs). Reddit tech bros then introduced me to this pretty sweet open source docx web editor called Superdoc, which pretty much solved my issue in a way because I could build a diff/redlining extension for it which could also make LLM calls. Apparently, Superdoc can also run in headless mode, which opens up some exciting options for LLM-powered document assembly/automation. So that is the next major decision too to make - how to achieve document assembly and automation.

Anyway! Back to my current recent project, which was detailed in my previous post - basically, coming back full circle to the idea that incepted it all - that I needed my own self-hosted ChatGPT replacement in case OpenAI pulls the plug one day. It's funny how far I've come and yet not touched that original idea because I thought a LLM Web UI was going to be trivial. Well, yes and no. The basic chat functionality is quite easy to set up, but I had grand plans about a routing system or some tool call flow where the LLM could generate images by hooking up to ComfyUI (backend AI image generation server; to be precise, I use SwarmUI which is the excellent frontend but which bundles the ComfyUI backend). And you know, generate those anime style photos of yourself and so on.

Yes, I forgot to mention that somewhere along the way, I also started experimenting with AI image generation, a lot of pain doing the configuration).

So voila, I found out today that all this is available in Open WebUI almost out of the box through use of "Tools". Well, to be precise, I had tinkered with the experimental Image Generation function a while back but gave up because I wasn't familiar with ComfyUI back then. So today I familiarized myself MORE by ... watching a Youtube tutorial. It's amazing how lazy (or great, depending on how you see it) people have become with Gen AI, because there was this part of the video where you had to populate the ComfyUI Workflow "Node Values" (all this was gibberish to me before today and even now I barely understand it) from a JSON workflow, and the speaker just said - copy and paste the JSON into ChatGPT and it will give you the answer! I thought he was going to tell me how to read the JSON but OK, I can reverse engineer later if needed.

So this is the process of troubleshooting and problem-solving I went through today (boy there was A LOT of configuration and troubleshooting).

  1. First, I had to fix this rather pressing issue where my admin user somehow could not log into Open Web UI (even after turning off auth) because the Docker Gods had decreed it so. I eventually traced it down to this rather dumb issue. The default admin user had an email address of "admin@localhost" - which is not a valid email address, obviously. So when I tried to log in, Open WebUI cleverly told me that it was not a valid email address. Thankfully, I had ONE device (my phone) which was still logged into the admin account, so I went to change the email address to a valid email address. Voila! All admin logins worked after that.
  2. Next, I took a closer look into this image generation thing again. SwarmUI, although easy to use by any standards, is still a nice hot mess of config options. First, you gotta find out what's the API address of the ComfyUI backend because I wasn't running ComfyuUI standalone (surprise, this info is not easily googleable and you gotta dig into docs). (The answer is http://[SwarmUI IP address]:7801/ComfyBackendDirect)
  3. Next, you have to get this thing called a "ComfyUI workflow". I don't know why image generation is so complicated, but basically it's a JSON file which defines various components of the image generation workflow where things like "clip" (also known as "text embedder", why on earth are there two different names for the same thing), Steps, and "VAE" are defined. You can either use some of the default ones, or you can download one that suits your specific model - doing the latter, however, requires you to pay close attention to which directory exactly your model files and model-related files are found in, because if not, it can't find them!
  4. I started with an easy pick, SDXL v1.0. I exported the SD Comfy workflow (choosing "API" option), copied and pasted the contents of the created JSON into Open Web UI, then asked ChatGPT to check the JSON and tell me what the "ComfyUI Workflow Nodes" values in the JSON were. Easy enough -boom! suddenly, I had image generation working (you have to toggle it)!
  5. But we all know that SDXL is kinda outdated and while I know there are many finetunes out there, I wasn't interested in trawling through the million options. So I stuck with something simple, another SDXL model, the 3.5 medium. So because I didn't have some files, specifically those below, I had to download and THEN move them into a new "sdv3" subfolder in my "clip" folder in SwarmUI:
  "clip_name1": "sdv3/clip_g.safetensors",
  "clip_name2": "sdv3/clip_l.safetensors",
  "clip_name3": "sdv3/t5xxl_fp16.safetensors"

6. So ok, SDXL 3.5, pretty modern model. I then came across this cool collection of Open WebUI tools, one of which was Qwen Image Editing. Upload an image, give a prompt, and let it do its magic. So I thought wow I had to do this, to do some background removal, maybe do some anime style generations (one reason why I wanted to self host in the first place, data privacy) - and plus if someone had already made a ready-made tool for Open WebUI, must be easy to implement, right? Just plug it in and go?

  1. Well it was correct in the sense that I did not have to do any coding, but wildly incorrect in terms of the amount of configuration effort required. First I had to do the usual config, where's ollama, where's my ComfyUI API (aha! since I had set up SDXL earlier, I knew what that was). Then OK - I had to go download Qwen Image Edit 2507 - but which quant (if at all)? Had to run some calculations. But in the end, I just looked at the bundled JSON workflow which told me to get the fp8 one. Then I read this cryptic documentation:

Prerequisites: You must have ComfyUI installed and running with the required models and custom nodes:

  • For Flux Kontext: Flux Dev model, Flux Kontext LoRA, and required ComfyUI nodes
  • For Qwen Edit 2509: Qwen Image Edit 2509 model, Qwen CLIP, VAE, and ETN_LoadImageBase64 custom node
  • See the Extras folder for workflow JSON files: flux_context_owui_api_v1.json and image_qwen_image_edit_2509_api_owui.json
  1. OK, so CLIP, VAE, I "kind of" know, because the huggingface repository does have those files. But what was more cryptic was "custom node". Like what was that???
  2. The Google told me that in ComfyUI, you can apparently install custom nodes (later I found out, it's usually not just ONE custom node, they usually come in packages that may span up to hundreds of nodes). How do I do that? Install "ComfyUI Manager", they said.
  3. How do I do that? The docs said:

Go to ComfyUI/custom_nodes dir in terminal (cmd)

git clone https://github.com/ltdrdata/ComfyUI-Manager comfyui-manager

  1. So wait - what? Because I'm using dockerized SwarmUI and not even ComfyUI directly, basically, I had no locally mapped folder with such a directory name. So fine, after some checks, at least I found out that that directory survives in a docker volume. Since it was a docker volume and not a mapped local folder, I had to enter the docker container of SwarmUI (always a joy to type "docker exec -it..." no, I'm kidding), locate that container, and thanks to the Docker Gods, git was actually installed already in that environment so the git clone command went well.
  2. Great! So now that I had the Manager installed, apparently there is this feature that allows you to install missing custom nodes automatically...except that it didn't work. And in addition to the missing "ETN_LoadImageBase64" custom node, I also had an easy GPUclean missing custom node.
  3. Well OK, no issue, I can load it manually right? Well, yes - save that in Manager there was no such node called "ETN_LoadImageBase64". After some googling I found that it was part of this package called "ComfyUI Easy Use" (ok...), and installed it. Ditto with googling the other missing component, I can't remember what the package was called.
  4. So after that, I thought I was home free! Not yet...because when I tried to generate, Open WebUI threw out an error. More Googling ensued - I found out that the error was caused by the Adaptive Memory function that I had earlier enabled. So disable it, and FINALLY - Image Edit works!!!
  5. I had to detail this because I may forget it the next time I have to do it again - hopefully never.