Saturday, 6 July 2013

*

We are trying to get more and more customers on the new VoIP server at A&A.

Most customers with a SIP phone or SIP client are just working, which is good news.

We are, however, trying to move asterisk customers over as well. This seemed to be working, but we are hitting some issues. We're not sure yet if we should use its registration feature to act like a SIP phone, or configure direct incoming calls from our call servers, or configure incoming authenticated calls. Getting asterisk to handle incoming calls is proving to be a real challenge. One of the issues with SIP as a protocol is that the INVITE does not have a setting to say "I can authenticate", so you have to work out from the headers if you want to challenge the caller or not. Getting that right in asterisk is definitely a trick, especially when the call server has multiple IP addresses in DNS. So we'll still working on the finding the right config for that. This plan is to set up asterisk on the bench and work it out next week.

However, even when we work around that, we ran in to a snag. Using asterisk and SNOM phones, we have had an issue this week where the SNOM phone could not put people on hold or allow call transfer.

Mike (my customer) and I have been working on this several times now, so I said we would find it one way or another this morning. After about 3 hours on it this morning we finally solved it.

The symptoms were complicated...
  1. This did not go wrong on a fresh new clean config on asterisk, but we needed to work with an existing large and complex config.
  2. This did not go wrong with all snom phones that are connected, and seemed it may be s/w version related, but not 100% clear and seemed like older software worked.
  3. This did not go wrong with all carriers. Calls from the old A&A call server, and from other carriers worked to the same phones.
  4. This did not go wrong with outgoing calls, even where they go via A&A's new servers.
  5. This did not go wrong on any internal calls on the asterisk box.
  6. This did not go wrong for all numbers on the A&A call server even, some worked, mostly the direct dial in numbers.
  7. When you press hold the phone says something like failed on the screen and drops the call, but the caller gets silence for a couple of minutes before the call clears.
We have finally found it. Well, found what causes it. It looks like an asterisk bug which has always been there.

After chasing a lot of wild red herrings, the final clue was when Mike sent me SIP traces from the phone. I wanted a trace of the failure that caused the phone to show an error. Mike sent me traces several times, and I kept berating him saying that they were just the trace of the call set up, not the hold. Eventually we realised that the snom did not even try to put the call on hold, so there was no trace of that! I was then looking at Supported: headers (not needed for hold) and wondering if OPTIONS could be an issue and all sorts. Holding a call should just work as it is simply a re INVITE with adjusted sdp. It was a while before I spotted the problem on the trace of the call from asterisk to the SNOM. I was comparing the headers for a call that worked and did not. It should be hit me sooner.
(IPs and numbers changed to protect the innocent).

Working:
Contact: <sip:0123456789@192.168.1.1>

Not working:
Contact: <sip:0123456789" <sip:0123456789@192.168.1.1>

How the hell did I not spot that before - I am going SIP blind I think. If the Contact: is broken, it is no wonder the snom cannot send the INVITE to put the call on hold, or a BYE, or anything. Poor thing. It is also possible a different version of snom code (an older one, as it happens) might have managed to parse that mangle.

So, I tracked down the original call from A&A to asterisk, and it looks like the display-name part of the From: or Remote-Party-ID: was the issue. This is put in to the calling number, i.e. the From: header when sent to the phone, or should be! When there was no display-name part, it worked.

Working:
From: <sip:0123456789@172.16.1.1>

Not working:
From: "DK:0123456789" <sip:0123456789@81.187.30.111>

Some experimentation shows it is the colon in the display-name. The RFC allows this in the display-name, unescaped, so it is 100% valid. We use a prefix tag like this ("DK" in this case) for people wanting to see which hunt group was called (hence working for DDI calls). I can only assume we did not use a colon on the old call server. The fact the clean SIP config on asterisk works suggests it is setting of the CLI or CLI name in the config which fails to process the name with a colon in it and generates a display-name which asterisk then fails to correctly escape when sending the call to the snom. Getting rid of the colon fixes it.

Also working:
From: "DK 0123456789" <sip:0123456789@81.187.30.111>

Causing:
Contact: "DK 0123456789" <sip:0123456789@192.168.1.1>

Arrrrrg!

1 comment:

  1. No complants for an hour so it must be fixed :)

    ReplyDelete