Verdict: ChatGPT, Claude, and other MCP Apps hosts run third-party app UI inside a double iframe because no single iframe configuration can simultaneously (1) isolate the app from the host, (2) let the app keep its own origin-scoped storage, and (3) apply a host-controlled Content Security Policy. The outer iframe supplies a clean, per-app origin. The inner iframe loads the actual app markup via srcdoc. Host-to-app communication happens through JSON-RPC over postMessage — never through shared DOM or cookies.
Last verified: 2026-06-18 · TL;DR:
- Outer iframe = controlled subdomain that gives the app an origin without touching the host.
- Inner iframe =
srcdoc-loaded HTML from the MCP server, sandboxed so it cannot reach the host.- CSP is rewritten by the host using domains the app declares in
_meta.ui.csp.- If you build MCP Apps, list every API, script, image, and frame domain your view touches — or it will silently fail in production.
What problem are MCP Apps actually solving?
The Model Context Protocol started as a way for an AI assistant to call tools and read resources. The MCP Apps extension adds a new surface: interactive HTML interfaces rendered directly inside the chat window. Instead of returning a wall of JSON or markdown, a server can return a live form, chart, dashboard, or checkout flow that the user interacts with without leaving the conversation.
That is powerful because it turns the chat thread into a workspace. It is also dangerous because the host — ChatGPT, Claude, VS Code Copilot, Goose, etc. — is a privileged application. It holds authentication tokens, user conversation history, billing state, and often workspace SSO sessions. Letting a random third-party server paint pixels inside that window is exactly the scenario browsers and security architects have been warning about for years.
The official MCP Apps spec therefore mandates that the HTML runs inside a sandboxed iframe and that all UI-to-host traffic goes through JSON-RPC over postMessage (modelcontextprotocol.io). That much is documented. What is less obvious is why the implementation in ChatGPT and Claude uses two nested iframes instead of one.
Why not just one iframe with srcdoc?
The simplest mental model is: the host creates an iframe, drops the app’s HTML into the srcdoc attribute, and the app renders. This fails for two independent reasons.
Reason 1: the host’s CSP blocks third-party scripts
A srcdoc iframe inherits the Content Security Policy of its parent document (MDN iframe element reference). ChatGPT serves a strict Content-Security-Policy header that includes a script-src directive requiring nonces for inline and external scripts. If an app’s HTML were injected with srcdoc at the top level, the browser would refuse to run any app JavaScript that is not signed with ChatGPT’s per-request nonce.
Relaxing that CSP is not an option. The nonce requirement exists precisely to prevent arbitrary script execution inside the ChatGPT origin.
Reason 2: same-origin access lets apps steal host state
Even if ChatGPT stripped the nonce requirement, a srcdoc iframe shares the host’s origin. That means scripts inside the app could read localStorage, sessionStorage, IndexedDB, and any unmarked cookies scoped to chatgpt.com. A malicious app could exfiltrate conversation metadata, session identifiers, or cached user preferences by simply reading them and posting them to its own backend.
That is the textbook reason browsers have a same-origin policy: code running under one origin should not be able to read another origin’s state. By using srcdoc at the host origin, we would voluntarily defeat it.
Why not sandbox a single iframe?
The next attempt is to add sandbox="allow-scripts" to the iframe. This moves the iframe into a unique, opaque origin — often described as a null origin — so it can no longer read the parent origin’s storage. The problem now flips: the app has no usable origin of its own.
Without an origin, the app cannot use localStorage, IndexedDB, or cookies. Some apps genuinely do not need those, but many do: they need session continuity, client-side caching, or authenticated requests to their own backend. The obvious fix is to add allow-same-origin to the sandbox, but that restores the parent origin and recreates the exfiltration problem.
So a single sandboxed iframe can either be isolated but origin-less, or origin-carrying but not isolated from the host. Both are wrong for a general app platform.
The double-iframe solution
The architecture used by ChatGPT and Claude solves this by nesting two iframes:
Outer iframe: loaded from a host-controlled, per-app subdomain (for example, an
openaiusercontent.comorclaudemcpcontent.comsubdomain, or a stable domain declared by the server in_meta.ui.domain). It has a real, distinct origin. Because the origin is notchatgpt.comoranthropic.com, it cannot read host storage. Because it is a real origin, it can uselocalStorage,IndexedDB, and cookies scoped to its own subdomain.Inner iframe: created inside the outer iframe, loaded via
srcdocwith the app HTML returned by the MCP server. It inherits the outer iframe’s origin, which is the controlled subdomain, not the host origin. The inner iframe is sandboxed so it cannot reach the outer frame or the host, but it still has a usable origin for APIs that require one.Host-to-app bridge: the app communicates with the host through JSON-RPC messages sent over
window.postMessage. The host inspects each message, validates the origin, and routes it to the appropriate MCP tool or UI notification. No DOM, cookie, or storage is shared.
This is the same trick Facebook used for its app marketplace years ago: give every third-party app a quarantined origin, then load its UI inside that quarantine. The host only has to serve one tiny loader script for every app; the heavy content still comes from the MCP server.
What builders must declare: the _meta.ui contract
The host cannot guess which external origins an app needs. The MCP Apps spec therefore lets a server declare dependencies inside _meta.ui on its tool or resource metadata. The host uses these values to rewrite the CSP that is applied to the inner iframe.
The key fields are:
_meta.ui.resourceUri— theui://URI of the HTML view to render._meta.ui.csp— an object listing external domains needed forconnect-src,script-src,img-src,frame-src, etc._meta.ui.permissions— extra sandbox capabilities such as camera, microphone, geolocation, or clipboard._meta.ui.domain— a stable origin hint some hosts use to keep the app on a consistent subdomain.
For example, if your view fetches weather data from https://api.weather.example, loads charts from https://cdn.chart.example, and displays map tiles from https://tiles.map.example, all three origins must appear in the declared CSP. If one is missing, the browser will block the request and the app will appear broken in production even though it works in local development.
OpenAI’s official docs recommend building against the MCP Apps standard fields first and only using ChatGPT-specific extensions such as window.openai when necessary (OpenAI Apps SDK). That keeps the app portable across hosts while still passing the right security metadata to each one.
The hidden production gotcha: CSP works differently in developer mode
Until recently, OpenAI’s developer mode for ChatGPT Apps removed CSP restrictions entirely. That made local testing easy, but it also meant developers only discovered missing CSP domains after submitting to the store. OpenAI has since tightened this, and third-party frameworks such as Skybridge ship a CSP inspector that compares the domains declared in _meta.ui.csp against the domains actually called by the view at runtime.
That workflow matters because CSP failures are silent: the network request is blocked, the UI does not render, and the console error is easy to miss during a store review. The fix is to treat CSP declarations as a first-class part of the app, not an afterthought.
What this means for you
If you are building an MCP App today, the double-iframe architecture is not something you opt into — it is what the host does on your behalf. Your job is to make sure the host has the information it needs:
- Inventory every external dependency. APIs, CDNs, fonts, images, analytics scripts, and any nested iframe sources all need to be declared.
- Test with CSP enabled. Do not rely on developer mode. Use a local emulator or the Skybridge inspector that applies the same CSP the store will apply.
- Keep data tools and render tools separate. Return data from one MCP tool, then let the model call a second tool that renders the view. This avoids unnecessary iframe remounts and keeps the UI snappy (OpenAI UI best practices).
- Assume no host storage is shared. Anything you put in
localStorageis scoped to the host-provided subdomain, not to your own domain. Use server-side state or the host bridge if you need cross-session persistence. - Design for portability. Use
_meta.ui.resourceUriand the standardui/*JSON-RPC bridge. Layer onwindow.openaiextensions only after feature-detecting them, so the same app runs in Claude, ChatGPT, and other hosts.
FAQ
Q: Why do MCP Apps need iframes at all? A: The host is a privileged web application. Running third-party HTML in the same browsing context would let that HTML read host cookies, localStorage, and conversation state. Iframes provide the isolation boundary the same-origin policy is built on.
Q: What is the difference between the outer and inner iframe?
A: The outer iframe loads a tiny host-controlled script from a per-app subdomain, giving the app a real but isolated origin. The inner iframe uses srcdoc to inject the actual app HTML, inherits the outer iframe’s isolated origin, and is sandboxed so it cannot reach the host.
Q: Can an MCP App read my ChatGPT or Claude cookies?
A: No. The app runs under a different origin — a controlled subdomain, not chatgpt.com or anthropic.com. The browser’s same-origin policy prevents it from reading host storage. All communication goes through postMessage, which the host validates.
Q: Why does my app work locally but fail in the ChatGPT store?
A: Developer mode historically disabled CSP restrictions. In production, the host applies a strict CSP built from the domains you declare in _meta.ui.csp. Any API, script, image, or frame domain you call but do not declare will be blocked, often silently.
Q: What are _meta.ui.resourceUri and _meta.ui.csp?
A: resourceUri tells the host which ui:// HTML resource to render. csp tells the host which external origins the app needs so the host can rewrite the Content Security Policy to allow those origins.
Q: Does the double-iframe pattern work the same in Claude, ChatGPT, and VS Code Copilot? A: The official MCP Apps spec defines the contract; each host implements its own sandboxing. ChatGPT and Claude use the nested-iframe approach. The spec guarantees the same JSON-RPC bridge and metadata contract, so the app code stays portable even if the exact iframe nesting differs.
Q: Do I need to use a framework like Skybridge?
A: No — you can implement the postMessage JSON-RPC protocol directly. Frameworks such as Skybridge add type safety, local emulators, hot reload, and CSP inspection, but they are optional.
Discussion
0 comments