Case study · Home Services · Tool Setup
How a regional HVAC company stopped losing same-day calls with AI-assisted dispatch triage
Same-day call closure rose from 41 percent to 78 percent across 30 days. Same dispatcher, same trucks, same software.
- Same-day closure rate
- 41% to 78%
- Dispatcher triage time
- 4 min to 45 sec per call
- No-heat / no-cool miss rate
- Cut by 71%
Measured across 612 inbound service requests in the 30 days post-launch.
The dispatcher reviews the AI-suggested priority and either accepts or overrides it.
Critical-priority jobs that previously slipped to next-day are now correctly flagged on intake.
The situation
A residential HVAC company running 14 service trucks across two metro areas was losing same-day work in the most frustrating way: their dispatcher couldn't triage the inbound call queue fast enough during peak season. Calls came in by phone, web form, and a third-party booking widget, all routed through a single coordinator. The coordinator had to read each request, judge severity (no-heat in February vs. routine maintenance), check truck availability, and route to a tech.
The result was that critical jobs - elderly customers with no heat, families with no cooling on a 95-degree day - sometimes sat in the queue behind routine work because the coordinator was doing five things at once. Customers who didn't get a same-day call back called a competitor. The company tracked this loss but couldn't fix it without either adding staff or adopting a heavier dispatch system that the techs would resist.
What we built
We did not replace the dispatcher. We built a thin layer in front of her. Inbound requests from all three channels normalize into a single intake record. An LLM with a tightly scoped prompt classifies each request on three axes: severity (emergency / urgent / routine), system type (heat / cool / dual / water heater / other), and customer-stated preference (today / this week / flexible). Severity uses explicit rules - 'no heat' and 'no cool' with weather thresholds always escalate; everything else uses the model's judgment.
The classifier writes its suggestion - plus a one-line reason - into the dispatcher's existing queue view. She accepts, overrides, or escalates. Overrides feed back into a weekly review prompt so we can tune the classifier on the calls it got wrong. The field service software is unchanged; the techs see the same job tickets they always have.
The whole layer is one TypeScript service plus a Zapier-compatible webhook. No new app for the dispatcher to learn. No new screen for the techs.
What we measured
We measured three things over the first 30 days: same-day closure rate (closed jobs divided by total same-day-eligible requests), median triage time per call, and how often a job that the AI flagged 'routine' turned out to be an emergency once a tech was on site. The first two improved as expected. The third is the one we watched closely; misclassification of an emergency as routine was the failure mode that would have killed the project. In 612 calls, the AI under-classified severity twice. Both were caught at dispatcher review before a truck was assigned.
The dispatcher's own override rate stabilized at about 9 percent by week three, which the company treated as a healthy floor: too low and she's rubber-stamping; too high and the AI isn't earning its place.
Where AI didn’t help
We considered automating the dispatch decision itself - have the AI assign trucks based on location, skill mix, and load. We did not. Truck assignment depends on which technician is best at which equipment brand, which customers a tech has worked with before, and a dozen unwritten preferences the dispatcher carries. None of that lives in a system. Forcing it into a model would have produced confidently wrong assignments and slowly eroded customer relationships. The dispatcher kept the assignment decision; the AI kept the triage decision.
We also passed on a customer-facing chatbot. Homeowners with a broken furnace want a human voice on the phone, not a chat widget. The intake form is for the web channel only.
What we used
- Multi-channel intake normalizer (custom, TypeScript)
- LLM-based severity / system / preference classifier with explicit-rule overrides
- Webhook into existing field service management software
- Weekly override-review loop for classifier tuning
This case study describes a representative engagement. The company’s identifying details have been anonymized at their request. Outcome figures reflect that engagement’s actual measurements over the first 30 days post-launch.
Have a workflow like this?
The fastest way to find out if a similar approach fits your situation is a 30-minute call. No prep required.
