white paper

When 90% accuracy is not good enough: The need for reliable AI Agents for buildings

AI agents in building energy management must bridge the gap between capability and reliability. With strategies like zero-shot forecasting, MCP Protocol, and IoT-LLM model, these agents can optimize energy use while ensuring consistent performance.

Shailendra Tomar

06 May 2025 — 5 min read

Consider that you made an online purchase using an AI agent. Behind the scenes, the agent processes the order and coordinates the delivery for this e-commerce transaction. The AI Agent assigns delivery routes, predicts package arrival times, and handles large-scale product distribution. The system has 90% accuracy, which might seem like a good starting point, meaning most orders arrive on time and at the correct address. In fact, this level of accuracy was highlighted in a recent research paper, AI Agents That Matter.

While the initial excitement about this AI Agent's capability sometimes led to overblown promises, once these automated orders are in place, what happens to the remaining 10% errors that can lead to thousands of misplaced, delayed, or lost packages? This can cause significant financial losses through refunds and reshipments, frustrate customers leading to decreased loyalty, and create operational inefficiencies by disrupting the supply chain for all orders.

Similarly, we cannot even imagine self-driving cars having a small failure rate, as the consequence involves life safety, or the bold claims of AI swiftly replacing highly skilled professionals, such as lawyers, have largely proven to be exaggerations. Even though these AI Agents are highly capable, their lack of perfect reliability makes them unusable at scale.

The same challenges exists in complex world of building energy management

Building management systems (BMS) were originally intended for installation in large buildings. Due to significant costs and complexity of BMS, along with their heavy management structure, small and midsize buildings have often been overlooked. Furthermore, these traditional systems operate based on fixed schedules and predefined settings for HVAC, which ensures high reliability but not necessarily efficient and often fail to incorporate new data sources such as smart meters, Internet of Things (IoT) sensors and Application Programming Interface (API). As buildings are now heading toward electrification, we have renewable energy and battery storage, as well as EV chargers, which makes it even more complicated. There are many challenges for smart building energy management:

Financial impact from the rising energy costs and wastage of such finite energy sources lead to high operational costs
Inefficient building systems with often have long payback period yet lock out new data sources necessary for energy demand response
Shortage of trained professionals for highly technical skills further limits the ability to improve these inefficiencies.
Environmental regulation demands reporting of the emission generated from high energy use

AI agents are capable enough to solve these challenges but not if it means sacrificing reliability. We must bridge the gap between capability and reliability to ensure AI agents work correctly every time, not just most of the time.

Strategies to achieve AI agent reliability for buildings

In order to reduce energy waste, cut costs, and improve efficiency, we need to build specialised, domain-specific AI agents with an improved system design to understand the physical context of the building. These AI-powered systems can adapt to conditions based on predicted data, instead of fixed settings, enabling dynamic temperature setpoints for heat pumps, smart load controlling actions for battery storage, and maximized use of renewable energy. This domain-specific AI agent can ensure precise control over energy consumption and system reliability. Furthermore, due to their transparency, AI agents also have the potential to demonstrate which clean energy sources created economic value and how that value was distributed equitably. Here are key strategies:

1. Zero-shot energy forecasting models: Studies suggest that a foundation model for probabilistic time series forecasting improves 20% in accuracy, when forecasting energy demand curve without prior knowledge of the specific building or its historical data. This is particularly noteworthy for its ability to generalise and make zero-shot learning capabilities as each building has distinct behaviour.

2. MCP Protocol for real-time data integration: To enhance reliability, AI agents must access live data from building smart meters APIs, IoT sensors, gateways, weather forecasts and other clean energy sources like heat pumps, solar and battery storage. It is possible to standardise this structure with Model Context Protocol (MCP) that enables a consistent way for AI models to access data. Think of MCP like standard APIs for APIs of different systems that process both, structured and unstructured data.

MCP Architecture: data sources for building energy management

3. IoT-LLM domain model: Generally Large Language Models (LLMs) are great at processing text but struggle with real-world numeric data and lack physical context of the building systems. This is because their expertise lies in understanding and generating text based on learned linguistic patterns, a different skill set than the logical, domain-specific and computational abilities required for numerical understanding. The research shows that IoT-LLM architecture has demonstrated a 60% improvement in domain specific output accuracy compared to standard LLM models that lack IoT context understanding.

Domain Specific LLM: Energy data + clear context = actionable insights

4. Dual mode AI Agent: For smart building energy management, we need to operate an AI agent in two modes: Proactive and Reactive. These modes help bridge the skill gap in energy management, functioning like a virtual colleague that always has your back, by alerting users to inefficiencies, explaining the root cause, and suggesting tailored fixes with minimal input (Proactive mode). It also responds to specific questions about actions, energy management topics, regulations, indicators, and more (Reactive mode).

5. Smart Automation with fail-safe control: Evaluating AI agent errors is complex due to their operation in the real world, requiring simulated environments. Nevertheless, we need to validate system safety, malfunctions, and AI-driven errors by enforcing safety parameters with multi-layered verification before implementing efficiency adjustments such as dynamically adjusting heating and cooling set-points to avoid extreme temperature fluctuations, or automated demand control. Additionally, facility managers always have fail-safe controls to override AI changes in case of anomalies so that human domain expertise is integrated into validation processes.

Smart control automation with fail-safe capabilities

Balancing capability and reliability of AI Agent

AI agents in building energy management must be both capable and reliable in real-world environments. To close the capability–reliability gap, AI agents need: zero-shot forecasting models for energy demand prediction; the MCP Protocol for open integration with lean data processing; IoT-LLMs for improved logical reasoning; a dual-mode AI Agent for reducing skill gaps; and fail-safe automation with dynamic adjustments.

A truly effective AI agent for buildings should not just be capable and cost-effective but equally reliable, performing complex tasks consistently. Even small failures, such as a 10% mismanagement in HVAC or power load balancing, can lead to massive inefficiencies, discomfort, and financial losses.

When 90% accuracy is not good enough: The need for reliable AI Agents for buildings

Shailendra Tomar

The same challenges exists in complex world of building energy management

Strategies to achieve AI agent reliability for buildings

Balancing capability and reliability of AI Agent

Read more

Know your performance (KYP) score for your building: a step toward energy efficiency

Energy savings verification: why getting it right matters more than ever?

Omnibus simplifies sustainability reporting

Demand Response: Managing energy use to improve grid stability and lower costs