I'm a huge fan of what Cloudflare can do with its Workers platform. In fact, this site, the editor for this site, and all sorts of personal things run on their infrastructure. Cloudflare's engineers are trailblazers in the network infrastructure and distributed compute sectors. Check out their blog, they lead the industry in near daily publications about their work to further the capabilities and technologies we rely on.
When it comes to AI, I don't think anyone there honestly uses their own product Workers AI. The API is well thought out. It really fits into their Workers ecosystem like a glove. Their documentation makes sense. That's not the problem.
Their model selection is the problem.
GPT OSS (August 2025) not industry leading in the least, nor is Llama 4-scout (released April 2025) by the end of 2025, let alone 2026.
They do have modern Chinese models like Z.ai's GLM available. Do they work? No.
When they do work, it's on models that perform so badly that the only value the API call provided was to shame Cloudflare for thinking this was an acceptable option.
A better alternative: OpenRouter
On the day of my post on Notes from 2026 Open Confidential Computing Conference, I added an AI feature to my custom site editor to generate descriptive alt text with text extraction and semantic descriptions of content. The fact that it could be done brought me joy. The fact that it failed to be useful made me disappointed.
After restoring the rest of my post — which got obliterated due to local state not being in sync with the server — I created an OpenRouter account and dropped in $10. The SDK was practically equivalent. The model selection was naturally fresh and easy to identify models that could take image inputs and output text.
It worked. It worked and I didn't feel annoyed and disappointed.
In doing this feature tweak I learned that the max token count will cause an OpenRouter error if the model tries to consume more. And in that case, I don't get the results at all. Just an error.
The fix: bump max tokens from 1024 to 8192 for image descriptions. That should give it time to spend thinking tokens that are invisible to me, or for slides that are sufficiently information dense.
The cost of labeling 55 images or so with some duplication to compare models and adjust prompts?
Twelve cents. I paid twelve cents to make alt text on that image heavy article accessible. With a frontier lab, it would have been a dollar or two.
What about ...
Vercel's AI SDK is a go-to recommendation on YouTube. For basic text extraction or semantic descriptions in alt text — the limit of my current ambitions with AI in my blog editor — the prices by frontier US labs are too high.
Groq would probably be fine too through Vercel's AI SDK without costing me much. Either way, OpenRouter has the reputation that they provide a stable interface for all sorts of models, no matter the input kind.
Local models like Gemma can be quite useful too. My friend Corbin who writes The Spacebar shared how Gemma-3n-e4b (an 8 GB model on Ollama) can generate competent alt text despite it's local-friendly size.
If Cloudflare can't beat Gemma then what?
I don't know. They need to have a serious conversation inside on what they offer. AI tech is changing every week and can be truly exhausting to follow. Their current offerings and their recommended offerings are stale and in some cases non-functional. Especially in my use case with image descriptions.
Cloudflare, you dropped the ball. Do better. Give your customers products that work with modern choices. I don't need everything new or experimental. Form partnerships with Anthropic, OpenAI, and Google for their models too. Widen your catalog for the customers that will pay you the big money. The Workers platform is absolutely amazing and I want to do the rest of my work here instead of AWS.
Get an advocate. Have someone push the boundaries of what makes or breaks your product for your customers. My first experience with your AI API should not be a dismal disappointing flop of a failure.
Where are you taking AI?
On this blog: I might use LLM models to generate a first draft Open Graph summary. No plans to use AI generated imagery unless citing another's as a piece of evidence to prove a point. These paragraphs of text will always be authored by me.
At work : I surgically apply code generation with Cursor
. I spend most of my time verifying all the details before I create piece-wise layers of code patches that I can review. If I'm making prototypes I explore and confirm before it goes into production. I am not going to be the reason my employer has downtime like Amazon with AI. I have a well earned reputation of making on-call a non-stressor. I'm not going to shoot myself in the foot with stress like that. My experience outside of work keeps me very well informed on what the boundaries of safety I can delegate to agents at this time.
At home : Claude
4.6 Opus changed the narrative. That's worth writing about. In the background I'm still working on a video-engine project. It's like OBS but not. It's like Resolume. But not. Its like an MistServer but not. This project has reawakened my interest in Rust and GL. Projects with
are really fun.
I will also strategically use emojis to poke fun at the people that copy paste from ChatGPT.
And unlike LLMs, I can be trusted with em–dashes. Yes that was an en dash, actually.