Why SOM: The Case for a Semantic Web Format for AI Agents

The web was built for humans looking at pixels. AI agents don't need pixels. They need meaning.

Every day, millions of agent API calls send raw HTML to language models, paying for CSS classes, script tags, tracking pixels, and layout divs that carry zero semantic value. SOM fixes this.


The Problem

A typical web page weighs 300-500KB of HTML. Between 80% and 95% of that markup is presentation: class names, inline styles, script blocks, SVG paths, tracking pixels, and deeply nested layout divs. None of it carries meaning.

But when an AI agent reads a web page, all of that noise goes straight into the context window. And context windows cost money.

$10/M
GPT-4 Input Tokens
80-95%
HTML That's Noise
$50+/day
Wasted at 1K pages/day

Here's the deeper issue: the DOM is a rendering tree, not a meaning tree. It tells you WHERE things go on screen, not WHAT things are. A <div> with twelve CSS classes might be a navigation link, a button, a heading, or a decorative container. The DOM doesn't know and doesn't care. It was designed to paint pixels, not convey semantics.

AI agents deserve better input than a rendering tree with the renderer removed.


What SOM Is

SOM (Semantic Object Model) is a structured JSON representation of web content designed for machine consumption. It takes the meaningful content of a web page and expresses it in a format that LLMs can process efficiently.

Instead of this:

Raw HTML
<div class="sc-1234 flex items-center gap-2">
  <a href="/about" class="text-blue-500 hover:underline
     font-medium tracking-tight">About</a>
</div>

SOM gives you this:

SOM Output
{
  "role": "link",
  "text": "About",
  "attrs": { "href": "/about" },
  "actions": ["click"]
}

Same information. Fraction of the tokens. And the agent actually knows it's a clickable link.

Key Properties


The Numbers

We benchmarked SOM against raw HTML on 49 real-world websites. Not toy examples. Real production pages from Google Cloud, Reddit, Stripe, The New York Times, and 45 others.

16.6x
Overall Compression
10.5x
Median Compression
94%
Cost Savings
49
Sites Tested
Model HTML Cost SOM Cost Savings
GPT-4 ($10/M) $50,397 $3,042 $47,355 (94%)
GPT-4o ($2.50/M) $12,599 $761 $11,839 (94%)
Claude Sonnet ($3/M) $15,119 $913 $14,207 (94%)

Best case: cloud.google.com compressed 116.9x, from 464K tokens down to 4K. Even minimal sites like postgresql.org still showed 1.2x compression.

See the full benchmark with all 49 sites →


Why Not Just Strip Tags?

Common objection: "Just use BeautifulSoup or Cheerio to strip HTML tags. Problem solved."

Not quite. Tag stripping is the wrong tool for this job:

SOM is selective. It removes noise but preserves meaning. A stripped page is text. A SOM page is a structured document with roles, regions, and actions.


Why Not the Accessibility Tree?

Accessibility trees are designed for screen readers. They solve a related but fundamentally different problem.

SOM is purpose-built for agent consumption: flat regions, semantic roles, explicit action annotations. It's what you'd design if you started from "what does an LLM need?" instead of "what does a screen reader need?"


Why Not Screenshots + Vision?

Vision models can look at screenshots. So why not just send a screenshot?

Screenshots are appropriate for visual verification: "does this page look right?" They're not appropriate for primary page understanding. SOM gives agents the structured, actionable data they actually need.


SOM as a Standard

SOM isn't locked inside Plasmate. It's an open specification designed to be consumed by any tool, framework, or agent.

You can generate SOM with Plasmate and consume it with anything. Or build your own SOM generator. The format is the standard, not the tool.


Who Benefits


Get Started

Try SOM in under a minute:

# Install Plasmate
cargo install plasmate

# Fetch any URL as SOM
plasmate fetch https://example.com

Use SOM in your project:

# Node.js
npm install som-parser

# Python
pip install som-parser

Learn More