Case study · Home Services · Tool Setup

How a regional HVAC company stopped losing same-day calls with AI-assisted dispatch triage

Same-day call closure rose from 41 percent to 78 percent across 30 days. Same dispatcher, same trucks, same software.

Same-day closure rate: 41% to 78%
Dispatcher triage time: 4 min to 45 sec per call
No-heat / no-cool miss rate: Cut by 71%

The situation

A residential HVAC company running 14 service trucks across two metro areas was losing same-day work in the most frustrating way: their dispatcher couldn't triage the inbound call queue fast enough during peak season. Calls came in by phone, web form, and a third-party booking widget, all routed through a single coordinator. The coordinator had to read each request, judge severity (no-heat in February vs. routine maintenance), check truck availability, and route to a tech.

The result was that critical jobs - elderly customers with no heat, families with no cooling on a 95-degree day - sometimes sat in the queue behind routine work because the coordinator was doing five things at once. Customers who didn't get a same-day call back called a competitor. The company tracked this loss but couldn't fix it without either adding staff or adopting a heavier dispatch system that the techs would resist.

What we built

We did not replace the dispatcher. We built a thin layer in front of her. Inbound requests from all three channels normalize into a single intake record. An LLM with a tightly scoped prompt classifies each request on three axes: severity (emergency / urgent / routine), system type (heat / cool / dual / water heater / other), and customer-stated preference (today / this week / flexible). Severity uses explicit rules - 'no heat' and 'no cool' with weather thresholds always escalate; everything else uses the model's judgment.

The classifier writes its suggestion - plus a one-line reason - into the dispatcher's existing queue view. She accepts, overrides, or escalates. Overrides feed back into a weekly review prompt so we can tune the classifier on the calls it got wrong. The field service software is unchanged; the techs see the same job tickets they always have.

The whole layer is one TypeScript service plus a Zapier-compatible webhook. No new app for the dispatcher to learn. No new screen for the techs.

What we measured

We measured three things over the first 30 days: same-day closure rate (closed jobs divided by total same-day-eligible requests), median triage time per call, and how often a job that the AI flagged 'routine' turned out to be an emergency once a tech was on site. The first two improved as expected. The third is the one we watched closely; misclassification of an emergency as routine was the failure mode that would have killed the project. In 612 calls, the AI under-classified severity twice. Both were caught at dispatcher review before a truck was assigned.

The dispatcher's own override rate stabilized at about 9 percent by week three, which the company treated as a healthy floor: too low and she's rubber-stamping; too high and the AI isn't earning its place.

Where AI didn’t help

We considered automating the dispatch decision itself - have the AI assign trucks based on location, skill mix, and load. We did not. Truck assignment depends on which technician is best at which equipment brand, which customers a tech has worked with before, and a dozen unwritten preferences the dispatcher carries. None of that lives in a system. Forcing it into a model would have produced confidently wrong assignments and slowly eroded customer relationships. The dispatcher kept the assignment decision; the AI kept the triage decision.

We also passed on a customer-facing chatbot. Homeowners with a broken furnace want a human voice on the phone, not a chat widget. The intake form is for the web channel only.

What we used

Multi-channel intake normalizer (custom, TypeScript)
LLM-based severity / system / preference classifier with explicit-rule overrides
Webhook into existing field service management software
Weekly override-review loop for classifier tuning

This case study describes a representative engagement. The company’s identifying details have been anonymized at their request. Outcome figures reflect that engagement’s actual measurements over the first 30 days post-launch.

Have a workflow like this?

The fastest way to find out if a similar approach fits your situation is a 30-minute call. No prep required.