Skip to main content

On This Page

Eliminating $1,250/Hour Losses: Implementing SIP Trunk Failover in VICIdial

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

How to Stop Losing $1,250/Hour When Your SIP Trunk Goes Down

VICIdial’s default configuration fails to automatically retry calls when a primary carrier returns a 503 error, leaving agents idle. For a 50-agent operation, this downtime costs $1,250 per hour in payroll alone, excluding lost revenue.

Why This Matters

While standard VoIP models assume carrier reliability, technical reality involves frequent upstream congestion and outages that VICIdial’s admin interface cannot natively handle. Organizations must move beyond static routing to implement Asterisk-level cascading dialplans that distinguish between trunk-level failures and end-user unavailability to maintain high-volume dialing throughput.

Key Insights

  • Asterisk health probing via the qualify=yes and qualifyfreq=30 settings sends SIP OPTIONS requests to detect unreachable peers every 30 seconds.
  • Effective failover logic must specifically target CHANUNAVAIL and CONGESTION statuses while ignoring BUSY or NOANSWER to prevent duplicate calls.
  • Proactive monitoring of the vicidial_log can trigger an automatic shift to secondary carriers if failure rates exceed a 30% threshold within five minutes.
  • High-capacity systems with 100+ agents should utilize Kamailio as a SIP proxy for sub-second, transaction-level failover that is transparent to Asterisk.
  • The nat=force_rport,comedia setting is critical in sip.conf to prevent one-way audio and registration failures in NAT-based deployments.

Working Examples

Asterisk cascading dialplan logic for trunk failover.

[outbound-failover]
exten => _1NXXNXXXXXX,1,NoOp(Outbound call to ${EXTEN} - trying primary)
exten => _1NXXNXXXXXX,n,Set(TRUNK_TRIED=primary)
exten => _1NXXNXXXXXX,n,Dial(SIP/${EXTEN}@carrier-primary,60,tT)
exten => _1NXXNXXXXXX,n,GotoIf($["${DIALSTATUS}" = "CHANUNAVAIL"]?try_secondary)
exten => _1NXXNXXXXXX,n,GotoIf($["${DIALSTATUS}" = "CONGESTION"]?try_secondary)
exten => _1NXXNXXXXXX,n,Goto(done)
exten => _1NXXNXXXXXX,n(try_secondary),Dial(SIP/${EXTEN}@carrier-secondary,60,tT)
exten => _1NXXNXXXXXX,n(done),Hangup()

External health monitor script using Asterisk CLI commands.

#!/bin/bash
# carrier_health_monitor.sh
check_carrier() {
local NAME=$1
OPTIONS_RESULT=$(asterisk -rx "sip show peer $NAME" | grep "Status")
if echo "$OPTIONS_RESULT" | grep -q "UNREACHABLE"; then
echo "$(date) CRITICAL: $NAME is UNREACHABLE" >> /var/log/carrier_health.log
return 2
fi
return 0
}

Practical Applications

  • Use Case: Asterisk-level failover for call centers; Pitfall: Failing over on BUSY status causes multiple carriers to dial the same lead simultaneously.
  • Use Case: SIP OPTIONS probing via sip.conf; Pitfall: Setting qualifyfreq too low can lead to carriers rate-limiting your IP and causing false-positive outages.
  • Use Case: iptables-based failover testing; Pitfall: Testing failover on production traffic without a dedicated test campaign can disrupt live agent sessions.

References:

Continue reading

Next article

Automating GitLab Bug Resolution with Claude-Powered AI Agents

Related Content