Agentic Browser Engine - Product Specification v1.0
Tagline: "The browser built for machines." Author: David Hurley
Table of Contents
- Executive Summary
- The Problem
- The Vision
- Architecture Overview
- Semantic Object Model (SOM)
- Agent Web Protocol (AWP)
- Stealth Networking Layer
- Sandboxed JS Runtime
- Wasm Skill System
- Session & State Management
- Fleet Orchestration (Commercial)
1. Executive Summary
Every AI agent that interacts with the web today does so through a hack. They control Chrome (or a Chrome derivative) through the Chrome DevTools Protocol (CDP) - a debugging interface designed for human developers inspecting CSS and setting breakpoints. This is the equivalent of driving a car by reaching through the window and turning the steering wheel with a broomstick.
The result: agents that are slow, brittle, expensive (in tokens), easily detected, and fundamentally limited by an architecture that was never designed for them.
Plasmate is a headless browser engine built from scratch in Rust, purpose-designed for AI agents. It introduces three foundational technologies:
The Semantic Object Model (SOM) - A new way to represent web pages that drops visual rendering entirely and outputs a clean, deterministic, token-efficient structure that LLMs can directly reason about.
The Agent Web Protocol (AWP) - A new communication standard between agents and browsers that replaces CDP's coordinate-based commands with intent-based actions.
A WebAssembly Skill System - A community-extensible plugin architecture where developers write site-specific "Skills" (in any language) that teach the browser how to navigate complex web applications.
The business model follows the Netscape playbook: open-source the engine and the protocol aggressively (Apache 2.0), establish AWP as the industry standard, and monetize the commercial fleet - orchestration, persistent sessions, anti-detection proxy network, and analytics.
Target market size: The browser automation market is projected at $15B+ by 2028. The AI agent infrastructure market (broader) is estimated at $50B+ by 2030. Every company building AI agents needs a browser. Today they all rent Chrome. Tomorrow they should run Plasmate.
2. The Problem
2.1 The CDP Bottleneck
The Chrome DevTools Protocol was created in 2011 for one purpose: letting developers debug websites. Fifteen years later, it has become the de facto interface for AI agents interacting with the web - a role it was never designed for.
Problems with CDP for agents:
| Issue | Description | Impact |
|---|---|---|
| Pixel-level commands | CDP operates on screen coordinates (click(x:342, y:891)) |
Agents must understand visual layout to interact with pages |
| DOM verbosity | Returns full DOM trees with styling, attributes, event listeners | A simple page can produce 500KB+ of DOM data, consuming thousands of LLM tokens |
| Visual rendering overhead | Chrome renders CSS, computes layout, paints pixels, composites layers | 80-90% of compute is wasted on visual output no agent needs |
| Memory consumption | Each Chrome tab uses 50-300MB RAM | Fleet of 1,000 agents = 50-300GB RAM |
| Session fragility | CDP WebSocket connections drop, state is lost, recovery is manual | Agents fail silently mid-task |
| Detection surface | Headless Chrome has dozens of detectable fingerprints | Bot detection services block agents within seconds |
| No native concurrency | Chrome was designed for one human user | Running 100+ sessions requires complex orchestration |
| No semantic understanding | DOM is a rendering tree, not a meaning tree | Agents must infer purpose from HTML tag names and CSS classes |
2.2 The Stealth Arms Race
Every headless browser automation tool fights the same battle: appearing human to anti-bot systems. The current approach is bolting stealth patches onto Chrome after the fact:
puppeteer-extra-plugin-stealthpatches ~15 known detection vectors- Browserbase and Steel Browser maintain proprietary stealth layers
- Lightpanda avoids detection by not being Chrome (different TLS fingerprint)
But all of these are reactive. Cloudflare, DataDome, PerimeterX, and Akamai continuously add new detection vectors. The patch-and-pray approach is fundamentally losing.
A native engine can solve this differently: by controlling the entire network stack from TLS handshake to HTTP headers, stealth becomes a first-class architectural feature rather than an afterthought.
2.3 The Token Tax
LLMs process web pages as text. The more tokens a page representation consumes, the more expensive and slow the agent becomes. Current approaches:
| Method | Tokens for a typical e-commerce page | Notes |
|---|---|---|
| Raw HTML | 15,000-50,000 | Includes scripts, styles, metadata |
| Cleaned HTML | 5,000-15,000 | Strip scripts/styles, still verbose |
| Accessibility tree | 2,000-8,000 | Better, but inconsistent across sites |
| Screenshot + vision | 1,000-3,000 (image tokens) | Expensive, can't interact with elements |
| SOM (proposed) | 500-2,000 | Semantic elements with affordances only |
A 10x reduction in tokens per page interaction means:
- 10x cheaper per agent action
- 10x more context available for reasoning
- 10x faster response times
- 10x more pages processable within context windows
2.4 What Exists Today
| Product | Approach | Engine | Protocol | Stealth | Open Source |
|---|---|---|---|---|---|
| Puppeteer/Playwright | Chrome automation | Chromium | CDP | Plugin-based | Yes |
| Browserbase/Stagehand | Cloud browsers | Chromium | CDP + SDK | Managed | SDK open, infra closed |
| Lightpanda | New engine (Zig) | Custom (Zig) | CDP-compatible | Inherent (not Chrome) | Yes (AGPL) |
| Steel Browser | Stealth Chrome | Chromium | CDP | Deep patches | Partial |
| Browser Use | Agent framework | Chromium | CDP + LLM | Basic | Yes |
| Manus Browser Operator | Extension overlay | Chrome | CDP + extension | None | No |
| ChatGPT Atlas | Consumer product | Chromium-based | Proprietary | Unknown | No |
Gap: Nobody has built a native agent protocol. Nobody outputs semantic models instead of DOM. Nobody has a Wasm skill ecosystem. Nobody has stealth at the network stack level.
3. The Vision
3.1 The Netscape Parallel
In 1994, the web existed but was unusable for most people. Netscape didn't just build a better viewer - they invented the infrastructure that made the web work:
| Netscape Innovation | Web Impact | Plasmate Equivalent | Agent Web Impact |
|---|---|---|---|
| SSL/TLS | Secure commerce | AWP | Standardized agent-browser communication |
| JavaScript | Dynamic pages | Wasm Skills | Extensible browser capabilities |
| Cookies | Session persistence | Persistent Agent State | Agents maintain context across sessions |
| Navigator browser | Mass adoption | Plasmate engine | Standard engine for all agent frameworks |
3.2 The End State
In 3 years, the agentic browser landscape should look like this:
- AWP is the standard protocol for agent-to-browser communication, like HTTP is for human-to-server
- Plasmate is the default engine embedded in LangChain, CrewAI, AutoGen, and every major agent framework
- SOM is the standard page representation that LLMs consume, replacing screenshots and DOM dumps
- Wasm Skills are a thriving marketplace, with thousands of community-contributed site navigators
- The commercial fleet (Plasmate Cloud) is a $100M+ ARR business selling orchestration to enterprises
3.3 What Plasmate Is NOT
- Not a consumer browser - no UI, no tabs, no bookmarks. Agents don't need chrome (lowercase c).
- Not a scraping tool - scraping is a use case, not the product. Plasmate is infrastructure.
- Not a Chrome fork - zero Chromium code. Clean-room Rust implementation.
- Not an agent framework - Plasmate doesn't decide what to do. It executes what agents tell it to do via AWP.
- Not a proxy service - though it has proxy capabilities built in. The proxy is a feature, not the product.
4. Architecture Overview
4.1 System Diagram
┌─────────────────────────────────────────────────────────┐
│ AGENT (LLM) │
│ LangChain / CrewAI / AutoGen / Custom │
└────────────────────┬────────────────────────────────────┘
│ AWP (Agent Web Protocol)
│ WebSocket + MessagePack
▼
┌─────────────────────────────────────────────────────────┐
│ PLASMATE ENGINE (Rust) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ AWP │ │ Session │ │ Skill │ │
│ │ Server │ │ Manager │ │ Runtime │ │
│ │ │ │ │ │ (Wasm) │ │
│ └────┬─────┘ └─────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ┌────▼──────────────▼──────────────▼─────┐ │
│ │ Core Pipeline │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌────────┐ │ │
│ │ │ Network │ │ HTML │ │ SOM │ │ │
│ │ │ Layer │→│ Parser │→│ Compiler│ │ │
│ │ │(rustls) │ │(html5ever│ │ │ │ │
│ │ └─────────┘ └─────────┘ └────────┘ │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ │ │
│ │ │ JS │ │ State │ │ │
│ │ │ Runtime │ │ Store │ │ │
│ │ │(rusty_v8│ │(RocksDB)│ │ │
│ │ └─────────┘ └─────────┘ │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
4.2 Data Flow
- Agent sends AWP intent →
{action: "fill_form", target: "#email", value: "[email protected]"} - AWP Server validates intent, resolves target via SOM
- Session Manager loads page state, cookies, auth context
- Skill Runtime checks if a Wasm skill is registered for the current domain
- If skill exists: Skill handles the interaction directly (fast path)
- If no skill: Core pipeline processes: a. Network Layer fetches the page (with stealth TLS, proxy rotation) b. HTML Parser (html5ever) produces a raw DOM c. SOM Compiler transforms DOM into Semantic Object Model d. JS Runtime (V8) executes scripts that modify the SOM (SPAs, dynamic content)
- SOM returned to agent via AWP with action results
4.3 Language & Dependencies (Rust Crates)
| Component | Crate | Purpose |
|---|---|---|
| HTTP client | reqwest + hyper |
HTTP/1.1 and HTTP/2 with full header control |
| TLS | rustls (custom fork) |
TLS 1.3 with JA3/JA4 fingerprint control |
| HTML parsing | html5ever + markup5ever |
Spec-compliant HTML5 parser (same as Servo) |
| CSS parsing | cssparser (minimal) |
Only parse selectors for SOM mapping, no layout |
| JavaScript | rusty_v8 |
V8 engine with Rust bindings, sandboxed per session |
| WebAssembly | wasmtime |
Wasm runtime for skills |
| Async runtime | tokio |
Async I/O, task scheduling |
| Serialization | serde + rmp-serde |
MessagePack serialization for AWP |
| WebSocket | tokio-tungstenite |
AWP transport |
| State storage | rocksdb (via rust-rocksdb) |
Persistent session/cookie store |
| DNS | trust-dns-resolver |
Custom DNS resolution (DoH support) |
| Proxy | tokio-socks + custom |
SOCKS5/HTTP proxy with rotation |
| Compression | flate2 + brotli |
HTTP content decoding |
| URL parsing | url |
Spec-compliant URL handling |
| Regex | regex |
Pattern matching for SOM rules |
| Logging | tracing + tracing-subscriber |
Structured logging and telemetry |
| CLI | clap |
Command-line interface |
| Config | toml + serde |
Configuration files |
5. Semantic Object Model (SOM)
5.1 What Is SOM
The Semantic Object Model is a new page representation format purpose-built for LLM consumption. It replaces the DOM (Document Object Model) by stripping away everything an agent doesn't need (visual styling, layout information, rendering hints) and adding everything an agent does need (element purpose, affordances, interaction methods, semantic relationships).
5.2 Design Principles
- Token-minimal - Represent a page in the fewest possible tokens
- Deterministic - Same page always produces the same SOM (unlike screenshots which vary by viewport)
- Actionable - Every element includes its available interactions
- Hierarchical - Preserve meaningful page structure (navigation, main content, sidebar, footer)
- Semantic - Elements described by purpose, not by HTML tag name
- Stable references - Element IDs persist across page mutations (SPA navigation)
5.3 SOM Format
{
"url": "https://shop.example.com/products/widget-pro",
"title": "Widget Pro - Example Shop",
"timestamp": "2026-03-15T23:00:00Z",
"som_version": "1.0",
"regions": [
{
"role": "navigation",
"id": "nav-main",
"items": [
{"type": "link", "id": "n1", "text": "Home", "href": "/", "actions": ["click"]},
{"type": "link", "id": "n2", "text": "Products", "href": "/products", "actions": ["click"]},
{"type": "link", "id": "n3", "text": "Cart (3)", "href": "/cart", "actions": ["click"], "badge": "3"},
{"type": "search", "id": "n4", "placeholder": "Search products...", "actions": ["type", "submit"]}
]
},
{
"role": "main",
"id": "content",
"sections": [
{
"type": "product",
"id": "p1",
"name": "Widget Pro",
"price": {"amount": 49.99, "currency": "USD"},
"rating": {"score": 4.7, "count": 342},
"description": "Professional-grade widget with titanium core and lifetime warranty.",
"images": ["https://shop.example.com/img/widget-pro-1.jpg"],
"variants": [
{"id": "v1", "label": "Color", "options": ["Black", "Silver", "Blue"], "selected": "Black"},
{"id": "v2", "label": "Size", "options": ["S", "M", "L", "XL"], "selected": null}
],
"actions": [
{"id": "a1", "type": "button", "text": "Add to Cart", "primary": true, "requires": ["v2"]},
{"id": "a2", "type": "button", "text": "Buy Now", "primary": false},
{"id": "a3", "type": "button", "text": "Add to Wishlist", "primary": false}
]
}
]
},
{
"role": "complementary",
"id": "reviews",
"type": "review_list",
"count": 342,
"visible": 5,
"items": [
{"author": "Jane D.", "rating": 5, "text": "Best widget I've ever used. The titanium core makes all the difference.", "date": "2026-03-10"},
{"author": "Mike R.", "rating": 4, "text": "Great quality, shipping was slow though.", "date": "2026-03-08"}
],
"actions": [
{"id": "r1", "type": "pagination", "text": "Load more reviews", "actions": ["click"]}
]
}
],
"forms": [],
"dialogs": [],
"alerts": []
}
5.4 Token Comparison
For the above product page:
| Representation | Approximate Tokens | Ratio |
|---|---|---|
| Raw HTML | ~18,000 | 1x (baseline) |
| Cleaned HTML | ~6,500 | 2.8x better |
| Playwright accessibility tree | ~3,200 | 5.6x better |
| SOM | ~900 | 20x better |
5.5 SOM Compilation Pipeline
Raw HTML (html5ever)
│
▼
DOM Tree
│
├─ Strip: <script>, <style>, <svg>, <noscript>, <meta>, comments
├─ Strip: CSS classes, inline styles, data-* attributes (unless semantic)
├─ Strip: ARIA attributes (consume for semantics, don't output)
│
▼
Cleaned DOM
│
├─ Identify regions: <nav>, <main>, <aside>, <footer>, <header>
├─ Identify semantic blocks: <form>, <table>, <article>, <section>
├─ Identify interactive elements: <a>, <button>, <input>, <select>, <textarea>
├─ Identify content elements: <h1-6>, <p>, <img>, <video>
├─ Consume ARIA roles/labels for element purpose
│
▼
Semantic Tree
│
├─ Merge adjacent text nodes
├─ Collapse wrapper divs (divs with single child, no semantic meaning)
├─ Extract structured data (JSON-LD, microdata, Open Graph)
├─ Resolve relative URLs
├─ Assign stable IDs to interactive elements
├─ Determine affordances (what actions each element supports)
│
▼
SOM Output (JSON / MessagePack)
5.6 SOM Element Types
| Type | HTML Sources | Properties | Actions |
|---|---|---|---|
link |
<a>, [role=link] |
text, href, target | click |
button |
<button>, <input[type=submit]>, [role=button] |
text, disabled, inert, primary | click |
text_input |
<input[type=text|email|password|...]> |
placeholder, value, required, inert, pattern | type, clear, submit |
textarea |
<textarea> |
placeholder, value, maxlength | type, clear |
select |
<select>, [role=listbox] |
options[], selected, multiple | select |
checkbox |
<input[type=checkbox]>, [role=checkbox] |
label, checked | toggle |
radio |
<input[type=radio]>, [role=radio] |
label, checked, group | select |
search |
<input[type=search]>, [role=search] |
placeholder, value | type, submit |
image |
<img> |
alt, src, dimensions | - |
video |
<video>, <iframe[youtube|vimeo]> |
title, duration, src | play, pause, seek |
table |
<table> |
headers[], rows[][] | sort, paginate |
list |
<ul>, <ol> |
items[] | - |
heading |
<h1>-<h6> |
text, level | - |
paragraph |
<p> |
text | - |
form |
<form> |
fields[], action, method | submit |
dialog |
<dialog>, [role=dialog] |
title, content | accept, dismiss |
tab |
[role=tab] |
label, selected, panel_id | click |
menu |
<nav>, [role=menu] |
items[] | - |
alert |
[role=alert], .toast, .notification |
text, type(info|warning|error) | dismiss |
pagination |
.pagination, [role=navigation][aria-label*=page] |
current, total, links[] | click |
product |
[itemtype=Product], heuristic |
name, price, rating, variants | add_to_cart |
article |
<article> |
title, author, date, content | - |
5.7 SOM Heuristics
For pages without semantic HTML or ARIA, SOM uses heuristics:
- Form detection - Cluster of input elements within a shared ancestor = form
- Navigation detection - Lists of links in header/top of page = navigation
- Product detection - Co-occurrence of price pattern + image + heading = product
- Article detection - Large text block with heading + date/author = article
- Modal detection - Fixed/absolute positioned overlay with close button = dialog
- Pagination detection - Sequential number links or "next/prev" buttons = pagination
- Search detection - Text input with magnifying glass icon or "search" in placeholder = search
5.8 SOM Mutation Tracking
When JS modifies the page (SPA navigation, dynamic content loading), SOM tracks changes:
{
"type": "som_mutation",
"timestamp": "2026-03-15T23:00:01Z",
"changes": [
{"op": "add", "path": "/regions/1/sections/0/reviews/5", "value": {"author": "New Review..."}},
{"op": "remove", "path": "/dialogs/0"},
{"op": "replace", "path": "/regions/0/items/2/badge", "old": "3", "new": "4"}
]
}
This uses JSON Patch (RFC 6902) semantics, so agents can track page evolution without re-reading the entire SOM.
6. Agent Web Protocol (AWP)
Full specification in separate document: AWP-SPEC.md
6.1 Summary
AWP replaces CDP for agent-browser communication. Key differences:
| Feature | CDP | AWP |
|---|---|---|
| Addressing | CSS selectors + coordinates | SOM element IDs + semantic targets |
| Commands | Low-level (click, type, evaluate) | Intent-based (add_to_cart, login, search) |
| Responses | Raw DOM + screenshots | SOM + structured results |
| Transport | WebSocket (JSON) | WebSocket (MessagePack) - 30-50% smaller |
| Sessions | Ephemeral (lost on disconnect) | Persistent (survive reconnection) |
| Concurrency | Single-page focus | Multi-page, multi-tab native |
| Observability | Debug logging | Structured telemetry with cost tracking |
| Extensibility | Limited to Chrome internals | Wasm skills for any site-specific logic |
6.2 Intent Hierarchy
AWP commands form a hierarchy from low-level to high-level:
Level 0 - Primitive Actions (always available)
navigate, click, type, select, scroll, wait, screenshot, evaluate_js
Level 1 - Semantic Actions (SOM-aware)
fill_form, submit_form, select_option, toggle, dismiss_dialog,
paginate, sort_table, expand_section, close_tab
Level 2 - Intent Actions (requires SOM understanding)
search, login, logout, add_to_cart, checkout,
read_content, extract_data, follow_link_by_text
Level 3 - Skill Actions (loaded via Wasm)
stripe_checkout, salesforce_navigate, google_sheets_edit,
linkedin_send_message, github_create_issue
Agents can operate at any level. Most will use Level 1-2 for general browsing and Level 3 for domain-specific tasks.
7. Stealth Networking Layer
7.1 Architecture
The networking layer is the lowest level of the engine, handling all HTTP communication with full fingerprint control.
┌─────────────────────────────────────────────┐
│ Stealth Network Layer │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ TLS │ │ HTTP/2 │ │ Proxy │ │
│ │ Spoofer │ │ Tuner │ │ Rotator │ │
│ │ │ │ │ │ │ │
│ │ JA3/JA4 │ │ SETTINGS │ │ SOCKS5 │ │
│ │ ALPN │ │ WINDOW │ │ HTTP │ │
│ │ Ciphers │ │ PRIORITY │ │ Rotating │ │
│ │ Extensions│ │ HEADERS │ │ Sticky │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Header │ │ Cookie │ │ DNS │ │
│ │ Manager │ │ Jar │ │ Resolver │ │
│ │ │ │ │ │ │ │
│ │ Ordering │ │ Per-site │ │ DoH │ │
│ │ Casing │ │ Profiles │ │ DoT │ │
│ │ Realistic│ │ Rotation │ │ Custom │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────┘
7.2 TLS Fingerprint Control
Bot detection services fingerprint TLS connections via:
- JA3 - hash of TLS Client Hello parameters (cipher suites, extensions, curves)
- JA4 - next-gen fingerprint including ALPN, signature algorithms, SNI
- HTTP/2 fingerprint - SETTINGS frame values, WINDOW_UPDATE, PRIORITY frames
Plasmate's custom rustls fork allows setting every parameter:
let tls_config = StealthTlsConfig::new()
.ja3_profile(BrowserProfile::Chrome128) // Mimic Chrome 128
.cipher_suites(CHROME_128_CIPHERS)
.extensions(CHROME_128_EXTENSIONS)
.alpn(&["h2", "http/1.1"])
.curves(&[X25519, SECP256R1, SECP384R1])
.signature_algorithms(CHROME_128_SIGS);
Built-in profiles:
- Chrome 120-130 (Windows, Mac, Linux variants)
- Firefox 120-130
- Safari 17-18
- Edge 120-130
- Custom (full manual control)
7.3 HTTP/2 Fingerprint Control
Beyond TLS, HTTP/2 connections have their own fingerprint:
let http2_config = Http2Config::new()
.initial_window_size(6291456) // Chrome default
.max_concurrent_streams(1000) // Chrome default
.header_table_size(65536) // Chrome default
.max_header_list_size(262144) // Chrome default
.priority_frames(CHROME_PRIORITY) // Frame ordering
.pseudo_header_order(&[":method", ":authority", ":scheme", ":path"]);
7.4 Header Management
Headers must match the impersonated browser exactly:
- Ordering - Chrome sends headers in a specific order (
:methodbefore:path) - Casing - Some headers use different capitalization per browser
- Content -
Accept,Accept-Language,Accept-Encodingmust match browser version - User-Agent - Automatically matches the selected TLS profile
7.5 Proxy Architecture
Built-in proxy support with intelligent rotation:
let proxy_pool = ProxyPool::new()
.add_residential("provider_a", 1000) // 1000 residential IPs
.add_datacenter("provider_b", 500) // 500 datacenter IPs
.strategy(RotationStrategy::StickyPerDomain) // Same IP per domain
.failover(FailoverPolicy::NextProxy) // Auto-rotate on failure
.health_check_interval(Duration::from_secs(30));
Rotation strategies:
RoundRobin- cycle through proxies sequentiallyRandom- random proxy selectionStickyPerDomain- same proxy for same domain (avoids session issues)StickyPerSession- same proxy for entire agent sessionGeoTargeted- select proxy by geographic regionLeastLatency- pick fastest available proxy
8. Sandboxed JS Runtime
8.1 Why Execute JS at All
Modern web applications (SPAs) render content client-side. Without JS execution:
- React/Vue/Angular/Svelte apps show blank pages
- Dynamic content (lazy loading, infinite scroll) never loads
- Login flows that depend on JS fail
- CSRF tokens generated by JS are unavailable
A headless browser for agents MUST execute JavaScript. But it doesn't need to execute ALL JavaScript.
8.2 Selective Execution
Plasmate's JS runtime is selective. It executes:
- ✅ Scripts that modify DOM (content rendering)
- ✅ XHR/fetch calls (data loading)
- ✅ Event handlers triggered by agent actions
- ✅ Form validation scripts
- ✅ Authentication flows (OAuth, CSRF)
It skips:
- ❌ Analytics scripts (Google Analytics, Mixpanel, Segment)
- ❌ Ad scripts (Google Ads, Facebook Pixel)
- ❌ Tracking pixels
- ❌ Animation/transition scripts
- ❌ Service worker registration
- ❌ WebGL/Canvas rendering
- ❌ Web Workers for non-essential tasks
8.3 Memory Isolation
Each agent session gets an isolated V8 context:
let isolate = v8::Isolate::new(v8::CreateParams::default()
.heap_size_limit(64 * 1024 * 1024) // 64MB max per session
.external_memory_limit(32 * 1024 * 1024));
let context = v8::Context::new(&mut isolate);
// Session-scoped: destroyed when session ends
// No cross-session memory leaks
Memory budget per session: 64MB (vs Chrome's 150-300MB per tab)
At 64MB per session:
- 1 server with 32GB RAM = 500 concurrent agent sessions
- 1 server with 128GB RAM = 2,000 concurrent agent sessions
8.4 Script Classification
Plasmate classifies scripts before execution:
| Category | Strategy | Examples |
|---|---|---|
| Essential | Execute immediately | Framework runtime (React, Vue) |
| Data-loading | Execute, capture responses | API calls, GraphQL queries |
| Interactive | Execute on agent action | Event handlers, form validation |
| Analytics | Block entirely | GA, Mixpanel, Hotjar |
| Advertising | Block entirely | Google Ads, FB Pixel |
| Unknown | Execute with timeout (2s) | Unclassified scripts |
Classification uses:
- URL pattern matching - known analytics/ad domains
- AST analysis - detect tracking patterns (pixel insertion, beacon sending)
- Community blocklists - maintained like ad-blocker lists
- Wasm skills - site-specific script classification
9. Wasm Skill System
9.1 Concept
Skills are WebAssembly modules that teach Plasmate how to interact with specific websites or web applications. They're the equivalent of browser extensions but for agents.
9.2 Skill Interface
Every skill implements a standard interface:
// Skill trait (Rust SDK)
pub trait Skill {
/// Domains this skill handles
fn domains(&self) -> Vec<String>;
/// Actions this skill provides
fn actions(&self) -> Vec<SkillAction>;
/// Handle an AWP intent
fn handle(&self, ctx: &SkillContext, intent: &Intent) -> Result<ActionResult>;
/// Transform SOM for this domain (optional)
fn transform_som(&self, som: &mut SomDocument) -> Result<()>;
/// Classify scripts for this domain (optional)
fn classify_script(&self, url: &str, content: &str) -> ScriptClass;
}
9.3 Example: Stripe Checkout Skill
use plasmate_skill_sdk::prelude::*;
struct StripeCheckoutSkill;
impl Skill for StripeCheckoutSkill {
fn domains(&self) -> Vec<String> {
vec!["checkout.stripe.com".into(), "*.stripe.com".into()]
}
fn actions(&self) -> Vec<SkillAction> {
vec![
SkillAction::new("stripe_pay")
.description("Complete a Stripe checkout")
.params(&[
Param::new("card_number", ParamType::String).required(),
Param::new("expiry", ParamType::String).required(),
Param::new("cvc", ParamType::String).required(),
Param::new("email", ParamType::String).optional(),
Param::new("name", ParamType::String).optional(),
]),
SkillAction::new("stripe_apply_coupon")
.description("Apply a coupon code")
.params(&[Param::new("code", ParamType::String).required()]),
]
}
fn handle(&self, ctx: &SkillContext, intent: &Intent) -> Result<ActionResult> {
match intent.action.as_str() {
"stripe_pay" => {
// Navigate through Stripe's multi-step checkout
let email = intent.param("email")?;
let card = intent.param("card_number")?;
let expiry = intent.param("expiry")?;
let cvc = intent.param("cvc")?;
// Fill email field
ctx.fill_by_label("Email", email)?;
ctx.click_by_text("Pay")?;
ctx.wait_for_navigation()?;
// Fill card details
ctx.fill_by_label("Card number", card)?;
ctx.fill_by_label("Expiration", expiry)?;
ctx.fill_by_label("CVC", cvc)?;
// Submit payment
ctx.click_by_text("Pay")?;
ctx.wait_for_url_contains("/success")?;
Ok(ActionResult::success()
.with_data("status", "payment_completed"))
}
"stripe_apply_coupon" => {
let code = intent.param("code")?;
ctx.click_by_text("Add promotion code")?;
ctx.fill_by_label("Promotion code", code)?;
ctx.click_by_text("Apply")?;
ctx.wait_for_element("[data-testid=discount]")?;
Ok(ActionResult::success())
}
_ => Err(SkillError::UnknownAction)
}
}
}
9.4 Skill Distribution
Skills are distributed as .wasm files via a package registry:
# Install a skill
plasmate skill install stripe-checkout
plasmate skill install salesforce-navigator
plasmate skill install google-sheets
# List installed skills
plasmate skill list
# Publish a skill
plasmate skill publish ./my-skill.wasm --name "shopify-admin" --version "1.0.0"
# Skills auto-activate when visiting matching domains
Registry: skills.plasmate.app (npm-like registry for Wasm skills)
9.5 Skill SDK
SDKs in multiple languages (all compile to Wasm):
| Language | SDK | Status |
|---|---|---|
| Rust | plasmate-skill-sdk (crate) |
Primary, full featured |
| TypeScript | @plasmate/skill-sdk (npm) |
Via AssemblyScript or wasm-bindgen |
| Python | plasmate-skill-sdk (pip) |
Via componentize-py |
| Go | plasmate-skill-sdk (module) |
Via TinyGo |
| C/C++ | plasmate_skill.h (header) |
Via Emscripten |
9.6 Community Marketplace
The skill marketplace is a critical network effect:
- Free skills - community-contributed, open source, anyone can publish
- Verified skills - reviewed by Plasmate team, guaranteed quality
- Premium skills - monetized by authors (revenue share: 70% author / 30% Plasmate)
- Enterprise skills - private, company-specific, deployed to their fleet only
As the skill library grows, Plasmate becomes more capable without core engine changes. This is the Chrome Web Store model applied to agent automation.
10. Session & State Management
10.1 Persistent Sessions
Unlike CDP (where sessions die when the WebSocket disconnects), Plasmate sessions are durable:
let session = plasmate.session_create(SessionConfig {
id: "user-checkout-flow",
ttl: Duration::from_hours(24),
persist: PersistMode::Disk, // Survive engine restart
cookies: CookiePolicy::Inherit, // Carry forward from previous sessions
auth: Some(AuthState::load("user-123")?),
});
Session state includes:
- Cookies (per-domain)
- Authentication tokens
- Local storage / session storage
- Navigation history
- SOM snapshots (for resumption)
- Agent memory context
- Proxy assignment (sticky)
10.2 Session Pools
For fleet operations, sessions can be pooled:
let pool = plasmate.session_pool(PoolConfig {
name: "authenticated-gmail",
min_sessions: 10,
max_sessions: 100,
warmup: WarmupStrategy::PreAuthenticate {
url: "https://mail.google.com",
credentials: vault.get("gmail-creds"),
},
idle_timeout: Duration::from_minutes(30),
recycle_after: 50, // New session after 50 uses
});
// Agents check out pre-authenticated sessions
let session = pool.acquire().await?;
// ... use session ...
pool.release(session).await;
10.3 State Encryption
Sensitive session data (credentials, tokens, cookies) is encrypted at rest:
- AES-256-GCM encryption
- Per-session encryption keys
- Key derivation from master secret via HKDF
- Optional HSM/TPM integration for enterprise
11. Fleet Orchestration (Commercial)
11.1 The Control Plane
The commercial product. While the engine is free, running 10,000 agents requires orchestration:
┌─────────────────────────────────────────────────────┐
│ Plasmate Cloud Control Plane │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Scheduler │ │ Autoscaler│ │ Cost Tracker │ │
│ │ │ │ │ │ │ │
│ │ Priority │ │ Demand- │ │ Per-session │ │
│ │ queues │ │ based │ │ Per-domain │ │
│ │ Rate │ │ scaling │ │ Per-agent │ │
│ │ limiting │ │ │ │ LLM + compute │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Session │ │ Proxy │ │ Observability │ │
│ │ Store │ │ Network │ │ │ │
│ │ │ │ │ │ Traces │ │
│ │ Redis + │ │ Residential│ │ Metrics │ │
│ │ S3 │ │ Datacenter│ │ Logs │ │
│ │ Encrypted │ │ Rotating │ │ Dashboards │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────┘
11.2 Fleet API
from plasmate_cloud import Fleet
fleet = Fleet(api_key="plasmate_live_xxx")
# Run 100 agents concurrently
results = await fleet.run_batch(
agents=100,
task="extract_pricing",
urls=["https://competitor1.com", "https://competitor2.com", ...],
skill="pricing-extractor",
proxy_type="residential",
region="us-east",
budget=Budget(max_cost_usd=50.00),
)
# Real-time monitoring
async for event in fleet.stream("batch-123"):
print(f"Agent {event.agent_id}: {event.status} - {event.url}")
11.3 Pricing (Fleet)
| Tier | Price | Sessions | Proxy | Support |
|---|---|---|---|---|
| Starter | $49/mo | 1,000/mo | Datacenter | Community |
| Growth | $199/mo | 10,000/mo | Datacenter + Residential | |
| Scale | $799/mo | 100,000/mo | Premium residential | Priority |
| Enterprise | Custom | Unlimited | Dedicated | 24/7 + SLA |
Usage-based pricing on top:
- Datacenter proxy: $0.001/request
- Residential proxy: $0.01/request
- Session storage: $0.10/GB-month
- Persistent sessions: $0.001/hour