Monday, 10 April 2017

Progress on iPhone roaming

For whatever reason, the instances of the roaming issue have massively reduced in my house. The main difference was that all APs on same PoE switch, but could be the phase of the moon for all I know at this stage. It is a bugger to track down this one.

This means it is taking days to "catch it in the act". The good news is that this happen last week, and I confirmed there was good signal but no connectivity - no IP or anything even to a device on same AP. So I changed config to be fixed IP.

Today it has happened again and we have learned some concrete details of the problem. Also, it has happened in my study, and so I have the phone in the state, captured, and on charge, sat here. It is not between two APs, so should stay broken.

So what have we learned so far?

The phone was set completely static IPv4 config, so no DHCP. This means the problem is not trigged by the way DHCP works or by the FireBrick or gateway doing DHCP in an odd way - that eliminates a load of possible concerns from previous testing. The fact that many people came forward with the same issue on non FireBricks was also a relief.

The controller for the APs claims the phone is not attached, it shows it was, but that it is not now. This is a clue. The phone thinks it is, and shows full signal. So the underlying issue here is a mismatch so the phone thinks it is associated and the APs think not. This has to be a big step forward and suggests it is the roaming process itself failing somehow.

In this state (perhaps unsurprisingly), even with the fixed config, we cannot get any packets to flow, even to another devices on the same AP (and subnet).

What next?

At this point, I am keeping the phone on charge in here in the broken state as long as possible, and have set up firewall access for Ubiquti engineers to have full access the APs and the controller and see what they can find. I hope they find more clues to the problem, but I appreciate it is tricky with some issues like this.

We're doing all we can to get to the bottom of this.

Update...

The phone was in the same state having left it all night. So I started to do monitor-mode wifi dumps on my MacBook as requested (wireshark is working quite well on MacOS now). On the AP in here I did not see the MAC of the iPhone at all. I've sent them the dump anyway.

Sadly, trying to get laptop on another channel to dump that I made a config change to APs, which made the phone spring in to life... That has to be a clue for them I suspect.

So...
  • Not DHCP related
  • Failure mode is phone things associated and AP thinks not
  • We know wifi off/on on phone fixes
  • We know roam to another AP on phone fixed
  • We now know reconfigured of AP (even leaving SSID in place) fixes it
Ubiquiti think that any packet from the phone which thinks it is associated should cause a de-auth from the AP which should cause the phone to re-connect. They can't dump that on the AP, hence monitor mode. Sadly I did not capture any packets from the phone on that channel so not conclusive.

23 comments:

  1. Could it be worth seeing if the iPhone prints anything useful to the console?

    ReplyDelete
    Replies
    1. That might require a jailbroken device which would require a reboot.

      Or maybe it would work with XCode from a Mac. I don't know.

      Delete
  2. > At this point, I am keeping the phone on charge in here in the broken state as long as possible, and have set up firewall access for Ubiquti engineers to have full access the APs and the controller and see what they can find. I hope they find more clues to the problem, but I appreciate it is tricky with some issues like this.

    Dedication

    ReplyDelete
  3. Replies
    1. Static in that they don't change, but assigned using DHCP.

      Delete
    2. What is the DHCP lease time for the AP IP addresses? Could it be that when the APs attempt to renew their IP address it somehow barfs the control connection (CAPWAP?) to the controller, but doesn't drop the session to the client?
      I'll admit that I have no experience of the Unifi APS, so it is just guesswork from experience with enterprise WiFi systems.
      Do you get this problem if *everything* (AP, switches, clients) is static configured (ie NO DHCP at all)?

      Just a thought...

      Delete
    3. They are 2 hour lease, 1 hour renewal, but I seriously doubt it. I'll see if Ubiquiti think that is worth a try.

      Delete
    4. If this was happening you'd see it in the Events/Alerts section of the controller. The UAP would show as having disconnected/connected - sometimes you just see the " Connected" message but when the heartbeat from a UAP goes AWOL then you'll see it in controller logs.

      Also its important to note that the Unifi kit doesn't REQUIRE a controller to be present to function as simple WAPs. Once provisioned they'll generally work forever although roaming will not be as seamless (unless its Gen1 kit with ZHO, bizarrely that just works regardless) & obviously you lose the site-wide management ability.

      As an aside Adrian, why do you have infrastructure items (UAPs) on such a short lease? Mine are all 7 days, its not like they're going anywhere :)

      Delete
    5. Default settings. All the IPs are sticky, so even if off line for days it will get same IP when it comes back, so no real need to change defaults.

      Delete
    6. DHCP server is in PI space/datacentre then?

      If not then hours seem a bit short for a home DHCP daemon.

      Something goes wrong (power cut/whatever) & you're not home but wife is & APs lose local lan connectivity?

      Don't get me wrong my wife is a s/w engineer but she wouldn't have a clue where to start on that - and we're talking Unifi here, not some generic router :)

      I just pool the infrastructure seperately with different lease times.

      NB - this is irrelevant to the Unifi problem for anyone reading this. We digress :)

      Delete
    7. Sorry - not sure of the case you are covering, the DHCP server is the gateway router (albeit a couple of miles away). It is common in domestic set-up for DHCP server to be the gateway router. If that is AWOL for any reason, then access to APs is not really an issue! Even so the APs having an IP, or not, has no real impact on their function, just management.

      Delete
    8. (Marginally) off topic - I hadn't realised that having a controller present impacted roaming around a set of Unifi APs. I only ever fire up the controller here just to check all is well and to do upgrades.

      So does the controller notice when a client is moving towards another AP and trigger the current one to actively reject the client somehow?

      Delete
  4. I don't think making a config change will help find this as any config change results in reprovisioning all the APs. That in turn causes them to be briefly unavailable, hence causing clients to reconnect.

    Is the signal strength normally at "full" when its in that position in your study? Not an Apple user myself but I assume (like Android) you can get the actual signal level in dBm rather than some vague bars? :)

    I only ask as I've seen phones (mainly droids) which have wifi "issues" show full signal when they're having issues & I know fine well there's no way they've got a full signal. Wife's Motorola was a classic example - signal level in dBm stayed the same everywhere when its wifi went titsup. Wifi on/off was the usual bodge. This had nothing to do with Unifi kit BTW, just wondering...

    ReplyDelete
    Replies
    1. Yeh, I did not realise it would do that as I did not touch that SSID, but now I know. There is a way to list the SSIDs and signal strength the iPhone ca see, which I did not think to check, sorry. Will next time. But yes, it is normally full signal strength and was 2.2m from the AP which is in the middle of ceiling.

      Delete
    2. There is a slight hole in the coverage on most UAPs when mounted horizontally - to the right of the "U" on the front IIRC - but at 2.2m it'll make bugger all difference.

      Sorry if you've said all this before but the iphone is connecting on 5GHz and forcing it to connect at 2.4GHz makes no odds?

      Radiation patterns for the UAPs :

      https://help.ubnt.com/hc/en-us/articles/115005212927-UniFi-UAP-Antenna-Radiation-Patterns

      Looks like the HD is the one to get for max coverage, Lite for multiple APs so they don't overlap too much. Well at 5GHz anyway.

      Do keep us updated on this, there's been a few threads on the Ubiquiti forums over the years regarding iphones & problems...

      Delete
    3. This SSID is only on 5GHz, but I am pretty sure testing long ago showed it was no different on 2.4GHz. HD looks nice, may do some time if we ever crack this issue.

      Delete
  5. 4x4 MU-MIMO hence the good coverage.

    At prices around £250-275 (trade, ex-VAT) and no 5-packs available yet then I think I can wait a while :)

    ReplyDelete
  6. You can get some extra info from the iPhone using the console (install XCode and use the Devices window or just install the Apple Configurator from the app store and you can access it this way)

    If you have access to an apple developer account you can install the wifi debugging profile (it generates logs and extra console logging) and/or perform on-device tcpdump/wireshark.

    https://developer.apple.com/bug-reporting/profiles-and-logs/ - you can probably find these without a developer account if you hunt around (e.g. https://useyourloaf.com/blog/remote-packet-capture-for-ios-devices/)

    You don't need a rooted phone for any of that.

    ReplyDelete
  7. Since the AP says the phone isn't connected and the phone appears to not be transmitting any packets (it should at least be sending DHCP requests), it stands to reason that the phone's wifi firmware probably also thinks it isn't connected. My money is on the OS getting confused and thinking the wifi is associated when it isn't. The OS would be generating packets to be transmitted, but since the wifi isn't actually associated the firmware would probably just drop them.

    ReplyDelete
  8. Can you take Pcap on quiet network segment when this happens? And maybe run arp-scan from wired segment and reconcile the Macs & IPs? I am wondering if the phone has switched to a new Mac due to its crazy privacy feature (anti-Mac tracking). Or is that only used when scanning for networks (802.11 Probe Request broadcasts)? Maybe worth arp pinging its normal Mac too.

    ReplyDelete
  9. By the way... if the phone's network stack is still alive , then I'd expect to see a bunch of Probe Request frames when you press the Home button, if you watch in monitor mode from a laptop sniffing the air promiscuously. But it might not send the Src Mac that you normally see.

    ReplyDelete
  10. "Ubiquiti think that any packet from the phone which thinks it is associated should cause a de-auth from the AP which should cause the phone to re-connect."

    802.11w? If the deauth isn't being accepted for some reason (like the iPhone expecting an authenticated deauth packet rather than an insecure legacy one) that might explain this...

    (Hoping you get to the bottom of this. I'm another of the non-Firebrick users having exactly the same issue with my Unifi...)

    ReplyDelete
  11. I think iPhones do weird ARP tricks (before DHCP!) when they connect to a new network—that couldn't have anything to do with this could it? https://cafbit.com/entry/rapid_dhcp_or_how_do

    ReplyDelete