A new generation of artificial intelligence tools is moving beyond the assistant role, taking on autonomous operational responsibilities in some of the most critical digital infrastructure powering modern business. At the forefront of this shift is a San Francisco company building AI agents that independently investigate system failures and learn from every incident they encounter.
Cleric, which has raised $9.8 million in venture funding, has developed self-learning AI agents specifically designed for site reliability engineering. The technology represents a departure from conventional monitoring tools and AI copilots, instead offering systems that autonomously analyze production failures and improve their diagnostic capabilities over time.
The distinction matters in an industry where traditional AI tools require constant human prompting and often fail when underlying systems change. Cleric’s approach centers on building institutional memory that compounds in effectiveness as the system processes more alerts, incidents, and engineer interactions.
When production incidents occur, the AI agent automatically examines logs, metrics, and system configurations, develops hypotheses about root causes, and delivers evidence-backed findings directly into Slack, the workplace communication platform where many engineering teams already coordinate their work. Engineers can then ask follow-up questions or guide investigations through natural conversation.
The Slack-native design addresses what many consider the primary obstacle to new DevOps tool adoption: workflow disruption. By operating within existing communication channels and integrating with current observability and incident management systems, the technology eliminates the need for additional dashboards or team retraining.
Enterprise engineering teams across mobility platforms, cloud infrastructure providers, and AI-driven systems have already deployed the technology. A case study with BlaBlaCar, Europe’s largest mobility platform, documented a reduction in incident investigation time exceeding 90 percent, with teams shifting from reactive firefighting to proactive identification of systemic issues.

The company addresses a growing challenge in enterprise technology: as software systems become increasingly complex and AI adoption accelerates across industries, reliability has emerged as a constraint on innovation. A global shortage of experienced reliability engineers has intensified pressure on organizations to find alternatives to traditional staffing models.
“Production systems aren’t static, so AI tools can’t be either. Cleric learns from every incident and engineer decision to continuously improve how reliability work gets done,” said Shahram Anver, CEO and co-founder of the company.
Anver brings over a decade of experience building large-scale distributed systems and reliability platforms. Before founding Cleric, he held senior engineering positions at high-growth technology companies, working on observability, incident response, and developer infrastructure supporting millions of users. He has become a recognized voice in discussions about agentic AI systems and autonomous operations in production environments.
The technology’s design reflects particular attention to concerns that have historically slowed AI adoption in mission-critical infrastructure. The system operates in read-only mode, presents evidence-backed conclusions, and never executes destructive changes automatically. This emphasis on safety, explainability, and human oversight directly addresses one of the most significant barriers to enterprise AI deployment in production systems.
Rather than attempting to address every alert that occurs, the system focuses on what one customer engineering leader described as providing “intelligent coverage that helps teams eliminate systemic issues before they become outages.”
The technology concentrates on high-frequency, routine incidents where engineering teams lose the most time to repetitive operational work. By shouldering this burden, the AI agents free human engineers to focus on higher-value problem solving and innovation rather than constant firefighting.

This progression from AI copilots to autonomous agents operating in production environments represents a broader evolution in enterprise AI deployment. While copilots assist humans who remain responsible for every action, autonomous agents take independent operational steps within defined parameters, learning and adapting as they work alongside human teams.
The shift has significant implications for how organizations approach reliability engineering and operational work. As AI systems demonstrate the ability to handle complex diagnostic tasks and accumulate operational knowledge, they create new models for structuring engineering teams and allocating human expertise.
The technology’s continuous learning capability means diagnostic accuracy improves with usage, building a repository of institutional knowledge that persists beyond individual team members. This addresses a common problem in engineering organizations: the loss of operational expertise when experienced engineers move to new roles or leave companies.
For enterprise technology buyers evaluating AI agents for production systems, the emergence of self-learning, autonomous capabilities marks a significant departure from previous generations of monitoring and observability tools. The question facing many organizations is no longer whether AI will play a role in reliability engineering, but how quickly autonomous agents will become standard components of production operations.


