Cloudflare's AI offerings are [no longer] a joke

I'm a huge fan of what Cloudflare can do with its Workers platform. In fact, this site, the editor for this site, and all sorts of personal things run on their infrastructure. Cloudflare's engineers are trailblazers in the network infrastructure and distributed compute sectors. Check out their blog, they lead the industry in near daily publications about their work to further the capabilities and technologies we rely on.

When it comes to AI, I don't think anyone there honestly uses their own product Workers AI. The API is well thought out. It really fits into their Workers ecosystem like a glove. Their documentation makes sense. That's not the problem.

A screenshot of a documentation page for Workers AI. The top paragraph explains that Workers AI allows running AI models serverlessly on Cloudflare's GPU network from code like Workers, Pages, or via the Cloudflare API. Below this is a bulleted list titled Workers AI gives you access to, featuring three points: 50+ open-source models, a serverless pay-for-what-you-use pricing model, and inclusion in a fully-featured developer platform with AI Gateway, Vectorize, and Workers. Two buttons appear at the bottom: a blue button labeled Get started and a white button labeled Watch a Workers AI demo with an external link icon.

Their model selection is the problem.

This image displays a user interface grid featuring six large language models available for text generation from providers OpenAI, Meta, and NVIDIA. The top row lists OpenAI's gpt-oss-120b and gpt-oss-20b. The middle row shows Meta's llama-4-scout-17b-16e-instruct and llama-3.3-70b-instruct-fp8-fast. The bottom row features Meta's llama-3.1-8b-instruct-fast and NVIDIA's nemotron-3-120b-a12b. Each card provides the model name, the provider, a brief description of the model's purpose or architecture, and occasionally includes tags such as Batch or Function calling at the bottom.

GPT OSS (August 2025) not industry leading in the least, nor is Llama 4-scout (released April 2025) by the end of 2025, let alone 2026.

They do have modern Chinese models like Z.ai's GLM available. Do they work? No.

Shows an internal error on the alt text field

When they do work, it's on models that perform so badly that the only value the API call provided was to shame Cloudflare for thinking this was an acceptable option.

Note

Cloudflare received my feedback and will be making improvements to the error messaging when the model does not support the given inputs. GLM 4.7 flash is that it is not multi-modal. A human error!

I thought because GLM 4.5 was multimodal, obviously any GLM with a higher release number ought to be too. Not so!

A better alternative: OpenRouter

On the day of my post on Notes from 2026 Open Confidential Computing Conference, I added an AI feature to my custom site editor to generate descriptive alt text with text extraction and semantic descriptions of content. The fact that it could be done brought me joy. The fact that it failed to be useful made me disappointed.

After restoring the rest of my post — which got obliterated due to local state not being in sync with the server — I created an OpenRouter account and dropped in $10. The SDK was practically equivalent. The model selection was naturally fresh and easy to identify models that could take image inputs and output text.

It worked. It worked and I didn't feel annoyed and disappointed.

In doing this feature tweak I learned that the max token count will cause an OpenRouter error if the model tries to consume more. And in that case, I don't get the results at all. Just an error.

The fix: bump max tokens from 1024 to 8192 for image descriptions. That should give it time to spend thinking tokens that are invisible to me, or for slides that are sufficiently information dense.

The cost of labeling 55 images or so with some duplication to compare models and adjust prompts?

OpenRouter fees added up to 12 cents over 61 requests. Total token usage: 167k. Mostly used by Qwen 3.5

Twelve cents. I paid twelve cents to make alt text on that image heavy article accessible. With a frontier lab, it would have been a dollar or two.

What about ...

Vercel's AI SDK is a go-to recommendation on YouTube. For basic text extraction or semantic descriptions in alt text — the limit of my current ambitions with AI in my blog editor — the prices by frontier US labs are too high.

Groq would probably be fine too through Vercel's AI SDK without costing me much. Either way, OpenRouter has the reputation that they provide a stable interface for all sorts of models, no matter the input kind.

Local models like Gemma can be quite useful too. My friend Corbin who writes The Spacebar shared how Gemma-3n-e4b (an 8 GB model on Ollama) can generate competent alt text despite it's local-friendly size.

A screenshot asking Gemma to describe a picture of a laptop. its description is reasonable.

If Cloudflare can't beat Gemma then what?

I don't know. They need to have a serious conversation inside on what they offer. AI tech is changing every week and can be truly exhausting to follow. Their current offerings and their recommended offerings are stale and in some cases non-functional. Especially in my use case with image descriptions.

Cloudflare, you dropped the ball. Do better. Give your customers products that work with modern choices. I don't need everything new or experimental. Form partnerships with Anthropic, OpenAI, and Google for their models too. Widen your catalog for the customers that will pay you the big money. The Workers platform is absolutely amazing and I want to do the rest of my work here instead of AWS.

Note

As of April 26, 2026, Cloudflare now offers access to external inference providers!

We’re also excited to share that you'll now have access to 70+ models across 12+ providers — all through one API, one line of code to switch between them, and one set of credits to pay for them. And we’re quickly expanding this as we go.

Get an advocate. Have someone push the boundaries of what makes or breaks your product for your customers. My first experience with your AI API should not be a dismal disappointing flop of a failure.

Where are you taking AI?

On this blog: I might use LLM models to generate a first draft Open Graph summary. No plans to use AI generated imagery unless citing another's as a piece of evidence to prove a point. These paragraphs of text will always be authored by me.

At work : I surgically apply code generation with Cursor . I spend most of my time verifying all the details before I create piece-wise layers of code patches that I can review. If I'm making prototypes I explore and confirm before it goes into production. I am not going to be the reason my employer has downtime like Amazon with AI. I have a well earned reputation of making on-call a non-stressor. I'm not going to shoot myself in the foot with stress like that. My experience outside of work keeps me very well informed on what the boundaries of safety I can delegate to agents at this time.

At home : Claude 4.6 Opus changed the narrative. That's worth writing about. In the background I'm still working on a video-engine project. It's like OBS but not. It's like Resolume. But not. Its like an MistServer but not. This project has reawakened my interest in Rust and GL. Projects with are really fun.

I will also strategically use emojis to poke fun at the people that copy paste from ChatGPT.

And unlike LLMs, I can be trusted with em–dashes. Yes that was an en dash, actually.

#Cloudflare's AI offerings are [no longer] a joke#

#A better alternative: OpenRouter#

#What about ...#

#If Cloudflare can't beat Gemma then what?#

#Where are you taking AI?#

Cloudflare's AI offerings are [no longer] a joke

A better alternative: OpenRouter

What about ...

If Cloudflare can't beat Gemma then what?

Where are you taking AI?