Saturday, 26 July 2014

SIP call flow diagrams

We have been working on ways to help us help customers debug SIP issues. SIP (Session Initiation Protocol) is what commonly is used for VoIP (Voice over IP). All of our VoIP services use SIP.

One of the challenges is training the staff to understand SIP properly. Following a SIP trace can be a tricky at the best of times. Some tools like wireshark do a good job as showing the call flow for a SIP call from a packet dump. But we needed something more generic that allowed us to look at SIP messages we capture for diagnostics and made it simple for staff to help a customer.

The real trick is presenting an overview of the whole call flow. We finally managed to get something pretty neat this week, thanks to some work by my son on the front end, which looks quite good. We have mouse-over to see each SIP message in full.

Here is a screen shot. You'll need to blow it up to see the detail.

Given that this was in fact just one junk call I got this morning, which I answered and told them to go away, it is quite impressive that it involves 8 separate (colour coded) call legs. It also involves 8 devices which we can see (plus many more before the call gets to us and more where it goes on to try calling my mobile). In this instance the call tried two SNOM phones and a SIP2SIM connection, and set up call recording in two separate places (one as part of the A&A service, and one on our call server in the office). Just getting the system to work out what packets are related to the overall call flow is a challenge in the first place!

I think this sort of thing will help staff understand SIP much more clearly and understand what is happening when customers have SIP related issues.


P.S. before you ask, no, this call flow trace is not available to customers - managing security on collecting and showing traces is proving to be a bit of a challenge as a call normally involves at least two separate parties and could involve many more. And no, we have not had a data retention notice so won't be storing these traces for any length of time - they are for diagnostics only

Explaining the diagram:-

The red on the left is the incoming call from an external call server for a call from the PSTN to B.voiceless. You can see the normal sequence of INVITE, 100 Trying, 180 Ringing, 200 OK, ACK to establish the call, and then BYE and 200 OK where the call was finally hung up.

The next part is b.voiceless passing that call to boxless. Here you see that it tries, and gets 401, so tries again with authentication details. The rest follows the same pattern with 180, 200, ACK, then BYE and 200.

From boxless it gets interesting. The yellow is a call to a SNOM at the office. It has INVITE, 100 Trying, 180 Ringing (repeatedly), but finally when answered on another phone is sent a CANCEL and gets 200 OK to that, so sends a 487 Request Terminated and gets an ACK to that.

The green call from boxless is just the same, but to a SNOM phone at home, except this goes 180 ringing and then 200 Ok, with ACK. That is where I answer the phone. Eventually there is a BYE and 200, showing I hung up the call from that phone. It is the 200 Ok on this call that prompts the CANCEL on the two other calls (yellow and blue).

The blue call from boxless is a call out to my mobile handled by A.voiceless that passes the call on to an external gateway (orange call). Both of these only get as far as 100 Trying, before there is a CANCEL because I have answered the call on another phone. This is because the mobile network takes time to start ringing, and I answered before the call got that far.

The light grey call from b.voiceless to noiseless is the call recording leg for the incoming call to the original number. Similarly, boxless also does recording and that causes the purple call to noiseless. Note, they were not the same noiseless - we have shown the calls based on known names where possible to simplify, but the call leg summary (not shown) and mouse-over text make it clear that we actually used two different recording servers in this case as noiseless is a pool of machines.

It is the BYE from my SNOM at home (the green call) that prompts the BYE back to b.voiceless and then on to the caller as well as the BYE to the call recording servers (noiseless).

Did you spot the error?

11 comments:

  1. Yes, very useful. Can it be opensourced?

    ReplyDelete
    Replies
    1. Possibly, but probably somewhat FireBrick specific. We are looking at ways to release supporting code for FireBricks. I.e. FireBrick logs the SIP packets, we have code to put in a database, and code to extract and sort and link and send via JSON to browser, and code to display on browser, so far. Would be nice to release that, and maybe other systems can feed in the packets from their logging.

      Delete
  2. My main headache with examining SIP traces, is when the Firebrick (or any registrar) replies with a 403 Forbidden, or maybe Not Allowed. I am then left wondering exactly what the registrar objected to and why. Forbidden is not specific enough. I am currently trying to get Snom phones to register with the firebrick. User names and passwords are definitely OK, and both parties agree on the SIP dialog, but the Firebrick just sits there saying "Forbidden" and doesn't tell me why!

    ReplyDelete
    Replies
    1. A firebrick is usually quite good with it's logs if you enable them, and often adds a message on the 503 reply to explain more. Let me know if you need any help with that one.

      Delete
    2. Here is the Log output. What is the problem?

      27 Jul 2014 08:02:21 voip-udp-rx VoIP Rx 17 217.169.16.153:3072->217.169.16.129:5060
      REGISTER sip:217.169.16.129 SIP/2.0
      Via: SIP/2.0/UDP 217.169.16.153:3072;received=217.169.16.153;branch=z9hG4bK-d22x93940ejm;rport=3072
      From: "Firebrick 4000" ;tag=wtbefi4r0b
      To: "Firebrick 4000"
      Call-ID: 07d3cf53d8a5-iys3iqo6spv1
      CSeq: 2109 REGISTER
      Max-Forwards: 70
      Contact: ;reg-id=1;q=1.0;+sip.instance="";audio;mobility="fixed";duplex="full";description="snom821";actor="principal";events="dialog";methods="INVITE,ACK,CANCEL,BYE,REFER,OPTIONS,NOTIFY,SUBSCRIBE,PRACK,MESSAGE,INFO"
      User-Agent: snom821/8.7.3.25
      Allow-Events: dialog
      X-Real-IP: 217.169.16.153
      Supported: path, gruu
      Authorization: Digest username="ext4000",realm="FireBrick",nonce="AAD69B6981924FDB",uri="sip:217.169.16.129",qop=auth,nc=00000001,cnonce="22ed361b",response="4a1dfc0d6acf7f1b99ebdcebf7072e50",opaque="27000142017620140727070221",algorithm=MD5
      Expires: 3600
      Content-Length: 0

      27 Jul 2014 08:02:21 voip-udp-rx VoIP Tx 17 217.169.16.129:5060->217.169.16.153:3072 Try 0 319
      SIP/2.0 403 Forbidden
      v: SIP/2.0/UDP 217.169.16.153:3072;received=217.169.16.153;branch=z9hG4bK-d22x93940ejm;rport=3072
      CSeq: 2109 REGISTER
      i: 07d3cf53d8a5-iys3iqo6spv1
      f: "Firebrick 4000" ;tag=wtbefi4r0b
      t: "Firebrick 4000" ;tag=938109489558573546
      l: 0

      Delete
    3. There are loads more log options you can set. I'd suggest turning more on. Otherwise email me some login details.

      Delete
    4. Yes, turned on two extra logs and the info they gave allowed me to find the problems. Snom phone AND sip2sim now registered to Firebrick - Thanks

      Delete
  3. I've had a fault logged with support since 15th July. I can receive incomming calls but can't make outgoing calls on a SNOM 300.
    I'm getting really pissed off.

    ReplyDelete
    Replies
    1. That is crazy - I'll get someone on to that.

      Delete
    2. We can't work out who you are - can you contact support, or advise the ticket number?

      Delete
    3. Ticket number is [4P14YR]

      Delete