Verdict: Building AI agents that reliably operate in production environments, especially in high-stakes sectors like government software, demands a rigorous engineering approach. OpenGov, through its OG Assist platform, demonstrates that a focus on control, safety, evaluation, and observability is paramount for scaling AI agent solutions. Their adoption of an Effect-native agent loop, the Agent-to-Agent (A2A) protocol, and robust human-in-the-loop systems provides a blueprint for successful production deployments.
- Focus on control over the agent loop
- Prioritize safety with human oversight and sandboxing
- Implement continuous evaluation and feedback mechanisms
- Leverage comprehensive observability for debugging and performance
Why Production AI Agents Demand a New Engineering Mindset
The promise of AI agents—autonomous systems capable of reasoning, planning, and acting—is immense. However, moving beyond experimental prototypes to production-grade solutions, particularly in critical applications like government enterprise resource planning (ERP) software, introduces significant engineering challenges. OpenGov, with its AI assistant OG Assist, has tackled these head-on, deploying agents that handle complex tasks across budgeting, procurement, asset management, and permitting. Their experience highlights that a proactive, structured approach is essential.
8 Engineering Principles for Production-Ready AI Agents
OpenGov's success with OG Assist stems from a set of core engineering principles designed to manage the complexity and ensure the reliability of AI agents at scale.
1. Own the Agent Loop: Control is King for Complex Use Cases
While frameworks like LangGraph offer a quick start, OpenGov discovered that for evolving and complex production scenarios, full control over the agent loop was indispensable. By developing an "Effect Native Agent Loop" using the Effect TypeScript library, they gained fine-grained command over how agents reason, act, and integrate within their systems. This allows for deep customization, performance optimization, and the ability to implement advanced features not easily accommodated by off-the-shelf solutions.
2. Standardize Inter-Agent Communication with A2A Protocol
Effective multi-agent systems require clear communication protocols. OpenGov adopted Google's Agent-to-Agent (A2A) protocol to enable seamless intercommunication between different agents and backend services. This open standard facilitated the creation of "agent cards" – rigorous specifications that define an agent's capabilities and interaction patterns. This standardization drives development alignment across teams and ensures robust front-end and back-end integration, even supporting extensions like A2UI for agent-driven user interfaces.
3. Build Trust and Safety with Human-in-the-Loop Interventions
For any AI system in a critical environment, safety and trust are non-negotiable. OpenGov integrates deterministic "human-in-the-loop" mechanisms that interrupt the agent's workflow when tool calls require human approval. This allows users to explicitly accept or reject an agent's proposed action, particularly for mutating operations (those that change data or state). This approach ensures humans remain in the driver's seat, fostering confidence and preventing unintended consequences.
4. Isolate Execution with Ephemeral Sandboxing
Executing code or creating files by AI agents in a production system poses security risks. To mitigate this, OpenGov implemented an on-demand sandboxing solution. Agents perform these operations within isolated, ephemeral environments that are automatically torn down after use. This containment strategy prevents any potential malicious or erroneous agent actions from impacting the core production systems, ensuring both security and stability.
5. Master Long Context: Rolling Summarization and Memory
Managing context windows for LLMs is a persistent challenge, especially in long-running conversations. OpenGov found that simple "stuffing" of all prior messages quickly became inefficient and led to token limits. Their solution involves "rolling summarization," where older parts of a conversation are summarized to preserve key information without overloading the context. Coupled with a memory component, this allows agents to recall relevant details from earlier in the interaction, enabling more coherent and effective long-form dialogues than simple sliding window approaches.
6. Implement Continuous Feedback and Automated Evaluations
"Shipping is the start, not the finish" is a core philosophy for OpenGov's AI team. They employ a multi-pronged approach to continuous improvement:
- User Feedback: A simple "thumbs up/down" mechanism in the UI allows users to quickly rate agent responses, providing direct signals for improvement.
- Automated Evals in CI: Integration of automated evaluations within their Continuous Integration (CI) pipeline ensures that new prompts and agent behaviors are rigorously tested against expected outcomes and tool calls, maintaining high accuracy standards.
7. Leverage Out-of-the-Box Observability and Tracing
Debugging and understanding complex agentic systems require deep visibility into their operations. OpenGov benefits from Effect's built-in tracing capabilities. Every function call is tagged with "spans," which aggregate into comprehensive traces. This allows engineers to drill down into the execution flow, identify performance bottlenecks, cross-reference data across different services, and quickly diagnose failures. Such detailed observability is crucial for maintaining and evolving production AI agents.
8. Enhance Developer Velocity with Internal AI Tools
Beyond customer-facing applications, OpenGov uses AI internally to boost developer productivity. By leveraging tools like Claude and Cursor, and building custom internal agents, they accelerate various development workflows, including code reading, writing, and review. This internal application of AI agents reinforces the company's commitment to agent-driven efficiency and helps the team iterate faster.
What this means for you
For businesses and developers looking to deploy AI agents in production, OpenGov's journey offers critical insights. Prioritizing robust engineering principles—from foundational architectural decisions like owning the agent loop and standardizing communication, to operational safeguards like human oversight and sandboxing, and continuous improvement through feedback and observability—is essential. The future of AI in the enterprise hinges on the ability to build and scale these intelligent systems reliably and responsibly.
FAQ
Q: What is the Agent-to-Agent (A2A) Protocol? A: The A2A protocol is an open standard, initiated by Google, designed to enable seamless communication and interoperability between disparate AI agent systems, regardless of their underlying frameworks or vendors. It helps agents discover each other's capabilities and collaborate on tasks.
Q: Why did OpenGov move from LangGraph to an Effect Native Agent Loop? A: OpenGov found that as their use cases evolved and scaled, they needed more fine-grained control over the agent's behavior. An Effect Native Agent Loop allowed them to leverage Effect's built-in features like tracing, structured concurrency, and logging, providing superior control and flexibility for complex production scenarios.
Q: How does OpenGov ensure safety with AI agents in critical software? A: OpenGov ensures safety through deterministic human-in-the-loop approvals for agent tool calls, especially for mutating operations. Additionally, agents execute code and create files within isolated, ephemeral sandboxes, preventing risks to core production systems.
Q: What is rolling summarization for long context management? A: Rolling summarization is a technique used to manage long conversation contexts for LLMs. Instead of feeding the entire conversation history, older parts of the conversation are summarized to retain key information efficiently, reducing token usage while preserving relevant context for the agent.
Q: How does observability help in managing production AI agents? A: Observability, particularly through tracing, provides deep insights into an agent's execution flow. It allows engineers to monitor function calls, identify bottlenecks, and debug issues across integrated services, making it easier to maintain and improve complex agentic systems in production.
Q: Can AI agents be used for internal development processes? A: Yes, OpenGov uses AI agents internally (e.g., via Claude and Cursor) to accelerate development workflows, including code writing, review, and overall shipping processes, demonstrating a significant boost in developer velocity.
Discussion
0 comments