SIP phones have a number of ways to handle DTMF (i.e. tone dialling).

DTMF was designed for audio connections really, and so does not compress that well. Thankfully one can use a-law everywhere these days so DTMF is fine in-band - i.e. just as audio. So that is the simplest way to handle it, in theory.

However, as SIP can use all sorts of compressed codecs there are different ways to do it - a common one being to use "telephone-events". This is a different coding for the stream of audio data to say "this is a DTMF digit" but still sent in the RTP stream in place of the audio.

The problem is when you bridge something that does understand telephone-events to something that does not.

The answer, which I coded this morning on the FireBrick, is to take each 20ms DTMF "message" and turn it in to 20ms of actual tone. As they say, simples!

Well, it works! I am quite pleased. I can bridge a device like a SNOM sending DTMF as telephone-events to a carrier that does not handle them and call an annoying call gate system and press keys and it works. Yay!

So, we tried with a gigaset DECT system. Does not work. Arrrrg. I have a packet dump and I can see why.

It sends the audio (i.e. from the microphone) at the same time as sending the DTMF signalling!

What is worse is the DTMF is sent 20ms in the past. If the DTMF was sent first and then audio with the same timestamp, I could simply discard duplicates and that would work. But no, there is audio, and then DTMF time stamped 20ms before the audio just sent. Then there is audio and then DTMF again, and so on.

Given that the gigaset can, instead, do INFO messages as a means to send key presses, and seems to default to it, I may just make that work rather than trying to bodge something for DTMF on the RTP stream.

Oh what fun.


  1. Sounds like it can be summarised as "legacy systems piled on legacy systems make the engineer's life awkward".

    I wonder if we're ever going to reach a point where we can get rid of legacy stuff? After all, in the TV world, we have extended legacy "drop frame" timings (59.94 Hz, 29.97 Hz and 23.97 Hz refresh) from the SDTV world (where they work around problems with 1940s designs of B&W TVs) to the HD world.

    Our next chance to fix this bit of legacy will be when we replace HD with future formats. I wonder when our chance will come to fix all the weirdness in SIP...

  2. @Simon Farnsworth when Skype takes over the world? :P


Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

ISO8601 is wasted

Why did we even bother? Why create ISO8601? A new API, new this year, as an industry standard, has JSON fields like this "nextAccessTim...