← / WritingArchitecture· May 2026· 6 min

LLMs are compilers, not runtimes.

Most teams pay frontier prices to do work that should have been compiled once and cached forever. Here is the pattern that turned a $400 a month parsing bill into eight cents.

If you reach for an LLM every time you need to parse a thing, you are paying frontier prices to do work that does not need a frontier model. You also pay every single time the user runs that flow. The bill grows linearly with usage even though the underlying problem stops changing after the first call.

There is a better shape. Use the LLM to write a deterministic function exactly once. Cache that function. Run it for free forever. Only invoke the model again when the function breaks.

Pay frontier prices at the compile step. Pay nano prices at runtime. Most teams reach for an LLM at every request. That is how budgets disappear.

The shape

Take bank transaction parsing as the canonical example. The first time a user adds a new bank, the app ships a cleaned snapshot of the transaction-page DOM to GPT-5 with a strict JSON schema. The model returns a pure JavaScript extractor: doc in, transactions out, no fetch, no eval, no side effects.

Before the extractor is cached, two checks run. The extractor is executed against the same DOM that produced it and the output is validated against the schema. A static-analysis allowlist rejects any extractor that contains forbidden symbols. If both pass, the extractor is hashed and persisted to the per-user cache.

Every subsequent sync runs the cached extractor against the live DOM. No model call. Deterministic. Free. Offline-capable.

When you recompile

The model only runs again on drift. Drift is detected by a layered check, in cheapest-to-most-expensive order: schema violations, sanity checks against row count and balance deltas, and cross-extraction stability between recent syncs. Anything that trips a check kicks off a regeneration.

Across a year that is around five to fifteen calls per bank, not five hundred thousand. The cost asymmetry is roughly a thousand to one.

Why it works

Bank transaction tables do not change every day. UI changes happen quarterly at the fastest. The DOM is a slowly-moving target wrapped in a wildly-changing visual layer. A compiled extractor sees through the visual layer and tracks the slow target. That is the property the LLM-as-runtime version cannot exploit.

The same pattern applies to anything where the underlying structure is stable and the surface is messy. Document parsers. CSV importers. Email classifiers. Code-mod tools. Anywhere you would otherwise pay a model on every request to do work that does not change between requests.

The trap to avoid

Resist mixing the navigation and the parsing. Letting the agent “browse the bank” is a fun demo and a terrible production architecture. Keep navigation deterministic and on rails: the user (or a recorded macro) handles login, pagination, and account selection. The LLM only ever sees the final transactions page. Mixing the two is how you end up paying per-step costs to an agent that gets locked out on a 2FA prompt.

The receipt

On OpenTeller this pattern brings the per-bank cost down to roughly one ten-thousandth of a dollar. The pattern is not exotic and it is not new. It is just under-used because most teams reach for the LLM at runtime out of habit. The compile-time version is almost always the right move when the underlying structure is stable enough to cache.

/ Talk to me

Want this built
in your stack?

If a pattern in this essay maps to something you are trying to ship, send me a note. I write back within a few days.

Get in touch