Verdict: For any professional using AI to automate their work, background execution is the only viable path. Unlike traditional "computer use" tools that hijack your cursor and freeze your screen, Hermes Agent (powered by cua-driver) operates silently in the background across macOS, Windows, and Linux. This allows you to deploy agents for research, data entry, or audits while you continue to use your machine for other tasks.
Last verified: 2026-06-25 · Core Tech:
cua-driver(formerly Kua) · Best for: Parallel desktop automation · Supported OS: macOS, Windows, Linux
Why "Invisible" Agents Change the Game
Most AI "computer use" demos share a fatal flaw: Foreground Dependency. When you tell an agent to click a button, it takes over your mouse. If you touch your trackpad, the agent crashes or misclicks. You are effectively held hostage by your own automation.
Hermes Agent solves this through a "No-Foreground" contract. By using native process-level event injection instead of raw HID (Human Interface Device) emulation, the agent can:
- Act on occluded windows: It can click buttons in an app that is buried under other windows.
- Synthesize events without cursor warp: It posts clicks directly to the application's event queue. Your real OS cursor stays exactly where you left it.
- Bypass AppNap and Focus: It keeps application accessibility trees alive even when the app is "sleeping" or out of focus.
| Feature | Traditional Computer Use | Hermes Background Mode |
|---|---|---|
| Cursor Control | Hijacks real mouse | Independent virtual cursor |
| Multitasking | Impossible (Screen lock) | Full parallel use |
| Targeting | Visual/Pixel only | Accessibility Tree + Vision |
| Model Lock-in | Usually locked to Claude | Any Vision Model (local/cloud) |
| Security | Minimal/Foreground only | Built-in destructive blocks |
How Background Computer Use Works (The Tech)
The secret sauce is the cua-driver (Computer Use Agent Driver). It acts as a hardware abstraction layer that translates high-level AI commands ("Click the Save button") into native OS calls that don't require window focus.
- On macOS: It leverages private
SkyLightandAccessibilityinterfaces to inject events directly into application processes. - On Windows: It uses
UIAutomationcombined withSendInputandPostMessageto drive apps without focus-stealing. - On Linux: It utilizes
AT-SPI(Assistive Technology Service Provider Interface) for both X11 and Wayland environments.
This architecture also enables 95% token compression via "Set-of-Mark" (SoM) capture. Instead of sending a massive high-res screenshot for every move, the agent uses a semantic map of the screen, drastically reducing costs and latency [1][2].
5 High-Leverage Background Workflows for 2026
You don't need to watch your agent work. Here are five ways to use Hermes Agent in the background today:
- Continuous Content Audits: While you write in one window, the agent can crawl your Obsidian or Notion vault, checking for broken links or outdated stats.
- Asynchronous CRM Cleanup: Have an agent go through your browser-based CRM in the background, verifying LinkedIn profiles and updating lead statuses while you're on a sales call.
- Automated "File Triage": Set an agent to monitor your Downloads folder. When a file arrives, it opens it, reads the content, and moves it to the correct project folder—all without a single window popping to the front.
- Member Onboarding Checks: For communities, an agent can check new Discord or Teams members against a payment database and assign the correct roles silently.
- Secure Local Automation: Because Hermes Agent is model-agnostic, you can run a local vision model (like Qwen 2-VL) to handle sensitive financial or HR data without the screenshots ever leaving your machine.
How to Set Up Hermes Computer Use
If you already have Hermes Agent installed, enabling background computer use takes one command:
# 1. Install the cua-driver
hermes computer-use install
# 2. Verify permissions (macOS/Windows/Linux)
hermes computer-use doctor
# 3. Start a session with computer use enabled
hermes -t computer_use chat
Note: On macOS, you will need to grant "Accessibility" and "Screen Recording" permissions to your terminal or the Hermes binary in System Settings.
What This Means for You
The shift from foreground to background agents is the transition from "AI as a tool" to "AI as a coworker." You no longer have to choose between doing the work yourself or watching a robot do it for you. By leveraging Hermes Agent's Sidekick features, you can keep a small status sprite on your screen that tells you exactly what your "invisible" agent is accomplishing in the background.
FAQ
Q: Does it move my actual mouse? A: No. Hermes uses a virtual event synthesis engine. You can move your mouse freely while the agent clicks in another app.
Q: Which models support this? A: Any model with vision capabilities. This includes Claude 3.5 Sonnet, GPT-4o, Gemini 1.5/3.1, and local models like Qwen 2-VL via Ollama or OpenRouter.
Q: Is it safe to let an AI control my computer? A: Hermes includes "destructive action" blocks. Any command involving deleting files, emptying the trash, or typing passwords is hard-blocked at the tool level or requires manual human approval.
Q: Can it be tricked by prompt injection? A: The system prompt is hardened to ignore any instructions found inside the applications it sees. It only follows the instructions you provide in the chat.
Discussion
0 comments