Skip to main content

On This Page

Building a DTMF Hand-Raise System for Twilio Conference Calls

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

DTMF Hand-Raise System

The DTMF Hand-Raise System manages interactive conference calls for 5–7 participants and a host. A critical constraint is that Twilio’s TwiML specification does not allow native DTMF detection while a caller is active inside a Conference block.

Why This Matters

Technical reality often conflicts with intuitive UI models; for instance, nesting a Conference inside a Gather tag is structurally invalid and causes the Twilio Node.js SDK to throw a TypeError immediately. Because keypad digits are transmitted only as audio tones within a conference, developers must implement complex workarounds like REST API redirects or server-side audio processing via Media Streams to maintain interactive features without breaking the communication flow.

Key Insights

  • Twilio’s TwiML schema enforces strict parent-child rules where Dial is the only valid parent for a Conference noun, prohibiting direct Gather usage.
  • The statusCallbackEvent parameter lacks a dedicated ‘unmute’ event, requiring developers to parse the ‘participant-mute’ webhook and check the Muted boolean string.
  • Real-time DTMF detection can be achieved without audio interruption by piping 8kHz mulaw audio from Media Streams to a WebSocket server running the Goertzel algorithm.
  • Pattern B redirects involve a 3–5 second audio disconnect as participants are temporarily pulled from the conference to a separate TwiML URL for input collection.
  • The Goertzel-based detector in Node.js requires custom debouncing logic to correctly interpret multi-digit signals like ‘*1’ from raw audio streams.

Working Examples

Correct implementation of a status callback handler to track participant mute/unmute states.

app.post('/webhooks/conference', (req, res) => {
  const { StatusCallbackEvent, CallSid, Muted } = req.body;
  if (StatusCallbackEvent === 'participant-mute') {
    const isMuted = Muted === 'true';
    updateParticipantState(CallSid, { muted: isMuted });
    broadcastToAdmins({ type: isMuted ? 'participant_muted' : 'participant_unmuted', callSid: CallSid });
  }
  res.sendStatus(200);
});

Pattern A: Using a Gather-Before-Conference window to collect DTMF input before joining the conference.

app.post('/voice/incoming', (req, res) => {
  const twiml = new VoiceResponse();
  const gather = twiml.gather({
    input: 'dtmf',
    action: '/voice/pre-join-dtmf',
    timeout: 4,
    numDigits: 2
  });
  gather.say('Press star 1 now to raise your hand, or hold to join.');
  const dial = twiml.dial();
  dial.conference({ muted: true }, 'MainRoom');
  res.type('text/xml').send(twiml.toString());
});

Server-side DTMF detection using Twilio Media Streams and a WebSocket server.

mediaWss.on('connection', (ws) => {
  ws.on('message', (raw) => {
    const msg = JSON.parse(raw);
    if (msg.event === 'media') {
      const audio = Buffer.from(msg.media.payload, 'base64');
      const digit = detector.detect(audio);
      if (digit) handleDTMFDigit(callSid, digit);
    }
  });
});

Practical Applications

  • Host-initiated polling: Using the REST API to redirect specific callers to a Gather prompt mid-call. Pitfall: Disconnecting the user from the conference audio for several seconds, potentially missing context.
  • Seamless Hand-Raising: Implementing Media Streams for real-time server-side audio processing. Pitfall: Increased server resource usage and complexity in handling multi-digit debouncing for symbols like ‘*1’.

References:

Continue reading

Next article

llm-costs: A CLI Tool for Real-Time LLM API Price Comparison

Related Content