2017-03-30

Where are we with Unifi and iPhone roaming?

As you will know I have spent a long time trying to understand the issues we see with the Unifi access points and roaming between them using an iPhone.

A&A sell these, and some of their PoE switches as well. We may start selling more stuff in due course. Overall the Ubiquiti stuff is pretty impressive and there is an increasingly large range of devices. The WiFi is technically very good at the hardware level, and we sell in boxes of three even for businesses.

So it is important to us that they work. I also use them at home, and my family treat me as tech support (obviously) so it is important to me if I want a quite life. They were all round this Sunday - we had sort of cancelled Mothering Sunday for obvious reasons, but everyone came round and we had pizza and chatted. They all told me in no uncertain terms that the WiFi here is crap and they even turn off on their phones and use 3G/4G when round the house. They all use iPhones. That really is a bad sign.

I myself spend a lot of my time in my office at home, but whenever I leave for the rest of the house I find I have to turn wifi off and back on. Though, technically, it is far from every time and can even be the odd day with no apparent problem, whilst other days I see many times. The problem is, as always, you remember the times it breaks.

This also makes testing hard - something changes and you watch it, and see you spend all day with no issues and think it fixed, when actually it is just intermittent, still, just as before.

I have an AC Pro and two AC LR in the house, and they are now on latest firmware. I thought that may have helped, but no. We also tried changing switches, and thought that had helped, but no.

The current state is that I have managed to mess with wiring enough in the house to actually have all three APs off a single Ubiquiti EdgeSwitch8 - one of their switches - so as to eliminate the switches as the cause of the issue.

Tip: Some of the Ubiquiti kit is still passive 24V PoE, and their switches are great as they support that, but you have to configure on the switch! It is not automatic as PoE normally is.

We also did tests with just IPv4 on the LAN, only for a few days, but that seemed to just work. This means the current thinking is that it is the IPv6 being present that is causing the issues. It could be some combination of bugs in iPhone, Ubiquiti, and even FireBrick code, for all we know. Reports from others that use this kit say no problems. We did a lot on FireBrick to try and eliminate that as the cause. However, with IPv6 on the LAN, even with IPv4 being static on the iPhone and no DHCP, it can still fail. Setting up DHCPv6 on the LAN does not seem to change things, we normally use just RA/SLAAC.

The symptoms are a sudden lack of connectivity when it roams. For a few seconds the phone may show the old IP addresses, but quickly switches to showing no IPs and then to showing the 169 auto addresses. Wait as long as you like, it is broken. You need to turn WiFi off and on (on the phone) to fix it.

Part of the reason for writing this up again is for the engineers at Ubiquiti - they are trying to fix this. Good news (though I seem to have to poke on twitter to get things progress, sorry guys). They sent me some switches and a router and gateway. Big thank you - nice to eval the kit as some of it we may start selling. We sent them a fully loaded FireBrick FB2700.

At this point the next stage is for me to try and create a setup using their kit as the gateway on the LAN and so doing IPv4 DHCP and IPv6 RA/SLAAC, and see if that breaks still. It is a pain as I cannot exactly replace my router as it is the office router. So I have set up a new IPv4 and IPv6 subnet for WiFi use. Not ideal, but will do for testing.

They, for their part, need to try and set up with a FireBrick to do the same. Can they make it break? Obviously I am on hand to help them set that up.

So setting up the Edge Router. It is a simple set up. No NAT. Fixed IP /24 IPv4 and /64 IPv6 on LAN with DHCP serving IPv4,and RA for SLAAC doing IPv6. On WAN is a simple IPv4 which can be DHCP client or static, and a simple IPv6 which can be SLAAC or static. Obviously need to set IPv6 DNS servers for RA on LAN.

So far I have managed to set up:-
  • Firewall off
  • NAT off
  • Static IPv4 on WAN (a /24 for testing)
  • Gateway 0.0.0.0/0 route on WAN, can ping out to internet
  • Static IPv6 on WAN (a /64, obviously, from my PI block)
  • Gateway for IPv6 on WAN
  • Static IPv4 on LAN
  • DHCP IPv4 on LAN
  • Static IPv6 on LAN
  • RA on LAN configured by ubiquiti for me
And I am stuck. So waiting on Ubiquiti at this stage. Suffice to say I don't think they are a threat to FireBrick as this is all pretty simple on a FireBrick.

No word on where they are with FireBricks. Obviously keen to help them test the other way around. To be fair, if this is either a bug in FireBrick in some way, or more likely, something we can work around by changing FireBrick in some way, I am more than happy to do the work to make that happen. We have implemented a number of "pragmatic" aspects to the way the FireBrick works (sometimes on a config setting so as to be "standard" by default) and I'd really like this WiFi kit to work...

I think best if I update this post as we make progress for a bit rather than new posts. Let's get to the bottom of this, shall we?

Updates:
  • From comments, it is not just FireBrick, but is some rare combination of things clearly, and seems to be Ubiquiti APs and iPhones and "something" else.
  • IPv4 gateway not working was user error, I mistyped as 0.0.0.0/24 for some reason
  • Someone from Ubiquiti, in Austin, Texas, in the middle of the night, is working with me on this now.
  • IPv6 gateway was not working as I was using the zero address in the /64 which the ER had assumed it can have making it a router on the WAN side, which is unexpected. I changed to the ::1 in the /64.
  • Now wifi all on ER not using FireBrick, thanks to guys from Ubiquiti working in middle of the night. Roaming appears to be working, more testing to do. I am being sent a cap of working roaming as seen by ER, and will get same from FireBrick.
  • We now have two interchangeable set-ups. Both on same sets of IPs as a separate subnet for my WiFi run as a LAN side of a router. I have the ubiquiti EdgeRouter set up, and the same set up on an FB2700. At present both seem to "just work" but as I say, this can take a whole to see the fault. I have lots of logging. One clue is that I am sure I have seen the iPhone re-do DHCP on roam, and the current testing (on both set-ups) does not do that - it just flips over to new AP basically seamlessly. So, just more testing for now. If both these "just work" we have to go back and see what else on the main LAN could be upsetting things in any way.
  • This morning (Saturday), still no apparent roaming issues! This is using a FireBrick but on a separate LAN the same as the ER set-up. Again, if the roaming happens without involving the gateway router, no way the FireBrick can be to blame. If it is OK for a few days I look to swap back to main LAN and see if that shows the problem again.
  • Sunday, still using a separate FireBrick as gateway, and have set up the second VLAN that was being used before on it. Still not failing. This makes no sense at all.

63 comments:

  1. Can you temporarily replace your home firebricks with alternative kit in order to see if the problems melt away?

    ReplyDelete
    Replies
    1. If you read ^ you would see that (a) no, as it is actually the main office router, and (b) that is basically what I am doing under my home network to test wifi using an Edge Router.

      Delete
    2. "If you read..."? Cheeky sod, I read the lot. a) It simply wasn't clear to me that your previous mentions of using a Firebrick for your home installation and "the office router" are the same thing, and b) if you can't replace it because it's the office router than that's NOT what you're doing, is it? I was referring to eliminating your firebrick code involvement entirely, which is quite possible for testing how wifi and iphones behave locally. My suspicion is that either or both your firebrick's and the ub's implementation of IPv6 at some level are making assumptions which are not valid.

      Delete
    3. Sorry :-) What we are doing is making the APs on a LAN that is *not* using FireBrick. We have that now, and are testing.

      Delete
    4. No worries. It'll be interesting to see what it ends up being. iOS 10.3 has come out, with 10.3.2 in beta. They often tweak operation in updates without explicitly announcing a change, so that's worth trying if you're not already there.

      Delete
  2. I have a FireBrick, UniFI APs, a ToughSwitch and an iPhone, and I am using both IPv6 and IPv4 so, if I can help with any defined testing since I have a similar combination of stuff, please do just say.


    (I was using an EdgeRouter before moving to the FireBrick and, yes, IPv6 is much easier on the FireBrick than on the EdgeRouter. But I did like the EdgeRouter.)

    ReplyDelete
  3. I definitely see a similar (but perhaps less severe) problem when roaming between my 2 APs. I have a Mikrotik router, so I don't think this is can all be in the Firebrick department. AP's are pretty close to being chucked at this point.

    ReplyDelete
    Replies
    1. Brandon, take note, not just FireBrick.

      Delete
    2. Dan, do let me know the symptoms. Obviously if this is even one case of the same issue with non FireBrick, it shows FireBrick is not the cause, but perhaps some network set-up (perhaps even one that is common and default with FireBricks). If that is the case, then maybe we can compare notes on network set-up that works and does not and find a clue.

      Delete
  4. I haven't got a firebrick to try this, but I can't say we see this issue at all. Despite having lots of UniFi kit in lots of deployments - such as hotels, where users do roam around a lot - especially if they're staying for functions and the like.

    We run both IPv4 and IPv6 deployment in those environments too - so we'd expect this to cause us issues if so - typically there is a Mikrotik involved somewhere in our deployments too as it happens - we tend to use them for the gateway with the connectivity usually via PPPoE to whatever is appropriate for the connection used.

    If I had a firebrick here, and an A&A connection (or I guess an L2TP from A&A) I'd happily test this out as I'd love to see this issue happen or not - it'd be useful to see what can trigger it given we have so much UBNT kit out there.

    ReplyDelete
  5. If it helps, I have 4 sites using Unifi with pfSense as the gateway, and upto 10 Unifi APs, lots of them with lots of iPads etc, and lots of staff with iPhones, and I have not seen this issue at any site, nor have I had a single reported wifi issue at any of the sites since switching to Unifi.

    Which is more than can be said for the systems they replaced!

    ReplyDelete
  6. I have 4 Unifi AP-LR and 1 Unifi AP-AC-LR connected to a Debian Linux box acting as a router. They are connected together using a Ubiquiti Toughswitch. The network is currently IPv4 only and issues addresses using DHCP. NAT is applied by the Linux box. I have no issues with roaming and we have a lot of Apple devices (iPhones, iPads etc.) which all seem to work fine.

    ReplyDelete
  7. I had a similar issue to this but using some DrayTek APs in my home. Whilst I don't have them set up for true roaming, I do have them on the same VLAN, same SSID, key etc.

    What I realised was that the signal was too strong (believe it or not) and there was not enough of a drop off in strength between the APs to cause the iPhone to start searching for another SSID to connect to.

    To solve my issue I simply reduced the TX power from 100% to 80% on both APs. The switch over between the two WAPs is now much smoother with barely any interruption to service (nothing like whats described above), and I still have full coverage within my humble abode.

    Just a thought, could work for you...

    ReplyDelete
  8. I have been using Ubiquiti's UNiFi equipment, and used their PTP and PTMP equipment when living in the country, for some time.

    It is full of potential, that is never realised. Mostly because their software development process, as far as the UniFi range is concerned, is deficient in all respects.

    So after six years I've switched. I'm currently evaluating Juniper and Meraki for my home office.

    In the real world we have a much more complex set of requirements than UniFi can handle, regardless of promises.

    Most people with a home office have Office365 accounts, which involves Azure AD integration. Which is not a stable feature of UniFi firmware/software. Like it or not, multicast is here to stay, good luck with that on UniFi, same thing goes for pure IPv6.

    And don't get me started on the nonsense produced by the UniFi controller as operational information.

    The iPhone issue continues to rear its head over the years, given that Robert Pera (UBNT's founder) started at Apple, one has to wonder.

    You can fight, or switch. I decided to switch. I'm happy to pay more for stuff that works.

    ReplyDelete
  9. Just a quick note, LR access points are pointless in the UK. At our power limits (certainly for 5GHz) they're limited to 200mW on the indoor channels - the non LR ones can easily do this. I guess there is a slightly better recieve gain as well but it's probabyl not worth the cash.. Unless you set them to a differet country code which is illegal and affective.

    ReplyDelete
  10. What do you use for your ADSL modem RevK? I'm one of those who's had IPv6 (not related to wifi) issues with the VMG1312 and periodically checks https://support.aa.net.uk/VMG1312:_Bugs to see if there's been any progress - do you feel like you're getting anywhere with ZyXEL / can you recommend anything better?

    ReplyDelete
    Replies
    1. I don't use ADSL. I have glass, connects me via office to a datacentre which is linked to edge BGP routers on to transit and peering with my own PI space on IPv4 and IPv6. Local router is not even in my house, but at the office, and is a FireBrick.

      Delete
    2. So just your basic, bog standard, home set-up, really :)

      Delete
  11. I was warned off Ubiquiti by RevK's earlier posts. I use a Firebrick and some Apple iOS kit and I currently have some ZyXel NWA-3560-N WAPs which are superb. Am about to try out some Cisco 1830 WAPs for something a bit faster.

    ReplyDelete
    Replies
    1. Zyxel are a POS matey. You just try getting them to fix something one of their ISPs hasn't reported & see where you get.

      Zyxel only give a damn about their ISP customers - consumers "shouldn't be buying their products as they're meant for OEMs/ISPs". That's a statement from Zyxel in 2014.

      Cisco? You having a laugh or what? Cisco APs are complete crap AND you get price-gouged on licensing. Better off going with Ruckus - you still get price-gouged on licensing but they do work a hell of a lot better than the borg's shite.

      Disclaimer - I used to contract for Cisco a long time back (Triangle Park when it was a LOT smaller). They used to be a good company back then too. Not so much these days.

      Must be getting old, seem to find myself saying "not so much these days" when people ask me how good companies are :)

      Delete
    2. I can form my own opinions and don't require advice from 'anonymous'. I don't know anything about ZyXel the company. I was talking about one of their products only. As for Cisco APs, I am about to find out for myself and don't require advice from the nameless and which is devoid of references.

      Delete
  12. It works fine using a Sky VDSL2 connection - dynamic IPv4 address and a /56 PD setup. SR102 router (locked-down busybox) with wifi off.

    Daughters housemates (they all use iphones/ipads/ipods) have had zero problems with gen2 Unifi AC Lites.

    You should note that if you had any generation 1 APs with zero-handoff/min RSSI setup then those retain that setup until you re-enable "Advanced settings" in the controller and then turn it off. This is a bit of a "gotcha" and probably needs a bug report because the controller won't reprovision the AP with ZHO/minRSSI unless they're enabled on the controller - which they're not on current versions.

    ReplyDelete
  13. Also avoid the Unifi/Toughswitch PoE switches like the plague.

    There's known issues with failure modes being induced into APs when the switch is powered off/on for the Unifi PoE switches and verious revisions of the Toughswitch overheat.

    Its very dependent on hardware revisions but AFAICT you run a decent risk of bricking various APs if you use Unifi switches with PoE. There's even been verified reports of the switches frying Intel NICs on blades.

    Ubiquiti seem very good at dealing with those issues in the USA - they'll RMA it no problem. In Europe, not so much as the distributors are stuck with the dodgy units and are (in general) denying there's any issues.

    When all's said & done though the RF frontend on APs is very good & the controller software is getting there (although for the love of gods Brandon get a fucking move on with IPv6-PD and DHCP options for the USG. The CLI & config json bodge is an accident waiting to happen!)

    For similar remote deployment/config features you'll be paying yearly licensing fees. That's where Ubiquiti win.

    ReplyDelete
  14. The more I read about Ubiquiti the more I am put off ever using them. As it happens I bought Apple Airport Extremes (v4 and v5) before I'd heard of the Ubiquiti, and they work great even for a network with a wifi hop in the middle of it ("Extend a wireless network" option). I just wish they'd let me turn off the unwanted 2.4GHz networks, but the Extreme doing the extending creates one whatever I do.

    I also have broken IPv6 because of the VMG1312 problems. I bought a Firebrick 2700 a year ago to replace the Zyxel router (I have another running as a PPPOE modem), but the FB2700 configuration is complex and obscure I've never managed to get my head round it. So there it sits gathering dust.

    ReplyDelete
    Replies
    1. The RF side of Ubiquiti is excellent. The controller side not so much but its getting better since they poached the pfSense project leader :)

      The switching side of the Unifi range with PoE has had its fair share of problems - I'd suggest that nobody really did any significant failure analysis on any of that stuff, hence the multiple failure modes affecting a lot of people.

      Having said that its really the only game in town for anyone wanting enterprise-level networking without paying extortionate licensing fees.

      I honestly don't think Ubiquiti kit - even with the (prone to failure on powerloss) CloudKey - is suited to home users. Even ones who believe they do have a clue but in reality don't know how things work :)

      Delete
    2. If it helps, once I got started with the FireBrick, I've found configuration generally pretty easy, especially when combined with A&A's willingness to help, provide code samples etc. I too was a little daunted at first, but it does make sense once you get over any initial hump.

      Obviously, I can't provide the kind of knowledge that A&A support could offer but, if you wanted a potential steer or two from someone who came from the same starting point, do you say.

      Delete
    3. The problem with the FB2700 is that everything has to be configured before it does anything. The initial mountain to climb is huge. Plus the config is in xml which I detest, so that doesn't help.

      What is needed is a comprehensive beginner's guide, which starts from assuming all you know is how to change the DHCP reservations and map a port through the firewall on something like a VMG1312. That's about as complex as most people get with a home router. The FB2700 manual is a reference manual, it's no help to beginners.

      Delete
    4. Ok let's be fair - if you plug in to a PPPoE connection and plug a laptop in, you can be on-line with no config at all, even with IPv6! It has some sane starting defaults.

      The XML is there, but there is a web interface to use to edit config with links to reference documentation, pre-defined field names and types and pull downs to make it simple.

      Some guides for people used to other systems is a good idea though.

      Delete
    5. As I say, I had exactly the same fear as you — mine sat on a shelf for longer than I had wanted!

      I found that simply plugging it in with the default config was sufficient to get me online, and manually setting an IP on my local machine and connecting over ethernet got me onto LAN1, if I remember correctly.

      I do like the idea of a "beginner's manual", and I'm happy to contribute to one, if that helps!

      Delete
    6. Just to add, the EdgeRouter looks competent, but one issue is you cannot do config in one place. I would have to go to some command line to set IPv6 RA, and routes and stuff. Why not via the web interface. FireBricks have it all in one config with XML or web interface to edit and covering everything. We did try, honest.

      Delete
    7. I *think* you can do everything on the ER via the CLI, so it might be "all in one place" from that perspective. But, yes, one of the things I disliked about it was that the web GUI was not available for everything.

      Delete
    8. Is there such a thing as a hackathon for documents? If there is a enough call for beginner documentation for FireBrick, it might be a reasonably interesting day to get some would-be users /beginners (who know what questions they have) together with some people who know what they are talking about, and some others willing to write things up in a user-friendly fashion, and thrash out a beginner's guide?

      (And I can't decide if it should be a beginner's guide — a guide for a beginner — or a beginners' guide, for all beginners.)

      Delete
    9. On the ER all of the CLI settings are available in the GUI via the config tree, which sounds like it might be similar to your interface for manipulating the XML on the firebrick (without the links to the documentation). The only bit you can't do in the GUI is upload files such as certificates for OpenVPN.

      In reality once you know your way around the CLI, there's not much you'd want to bother with the GUI for. Also the GUI isn't responsive which is​ a PITA in the modern world!

      The real problem with them from a business point of view is they are pretty time consuming to set up, especially if you want to use features like ipv6, so unless i have a specific use case such as requiring an OpenVPN server, i tend to use MikroTik's for most clients.

      Delete
  15. This unifi thread is very similar

    https://community.ubnt.com/t5/UniFi-Wireless/Firmware-bug-3-7-49-6201-on-UAP-AC-HD-iOS-devices-receive/m-p/1878953#U1878953

    ReplyDelete
  16. I replied straight away, asked some questions, once again it is me waiting for answers.

    ReplyDelete
  17. No, I asked how the Internet connection works so that I could advise you, as you wanted specific answers. It is relevant if it works using PPPoE or if you have an Ethernet WAN with DHCP/DHCPv6 or what. The answers are different. It was not a complex question. I also suggested you could email me a config and I would go thought it and be even more specific. Ignoring my email and not replying does not really help matters.

    ReplyDelete
  18. Running with Samsung - no issues
    Running with any other Android - no issues.
    Ergo issue is apple.

    We can do this all day.

    I can ask firebrick customers who have CISCO or Ruckus APs to see if they have no problems - we have not had any reports of them.

    This is not helping actually find the cause - this is trying very hard to bury you head in the sand and insist it has to be FireBrick to blame.

    I ask you once again - if, as it seems to be expected, when an iPhone roams from one AP to the next, there is NO INTERACTION whatsoever with the gateway (no re-do of DHCPv4 or router solicitation) - then exactly how can the gateway actually be, in any way, to blame, for the roaming appearing to fail and leave the iPhone on the new AP with no connectivity at all? What exactly could a FireBrick be doing here to cause the roam to fail?

    ReplyDelete
  19. Sorry this is getting silly - You asked those questions. I said that in order to answer them specifically as you asked I need to know what sort of internet connection setup you are using, and sending a config would be helpful. I feel we are going around in circles here. You appear to be refusing to answer me at this point, not the other way around. I do not understand why you think repeating the question rather than answering me so I can answer you, is sensible.

    ReplyDelete
  20. No, once again, you did not "resolve" the issue. Two things were changed. 1 - moving APs to separate subnet, and 2 - using ER. That fixed it. However, putting FireBrick in, on separate subnet *ALSO* fixed it. Buy you logic we just SOLVED the issue by INSTALLING AN FB2700 FIREBRICK.

    As I have explained several times, there are clearly some specific cases that this happens and most cases it does not. EVERY case involves the unifi APs and apple, and ALMOST every case involves a firebrick. Constantly repeating your specific test cases which are clearly not the whole picture does not change anything.

    ReplyDelete
  21. 1. I will not approve any more posts where you keep repeating you list of test cases. They do not need repeating, OK?

    2. To answer the question tell me how the internet connection is set up please?

    ReplyDelete
  22. I still have no idea if Brandon is using PPPoE or a straight Ethernet with DHCPv4, RA/SLAAC, or if he is using L2TP or what. Shame, as if I knew I could tell him exactly what config he needs and where. Oh well.

    ReplyDelete
  23. I'll continue with testing here for a while.

    ReplyDelete
  24. By all means tell me to keep my nose out, but might a 10 minute call resolve potentially days of going backwards and forwards (and many tens of comments!) over what information is needed to answer which questions to set up which devices?!

    ReplyDelete
  25. Very possibly, though not sure I want a call with Brandon right now. Basically just beed to know if PPPoE or what, then I can answer.

    ReplyDelete
  26. We used irc, it worked very well, got the ER set up quickly.

    ReplyDelete
  27. I approve posts, but continually posting the same list of test cases is not helping anyway and I advised you of that in advance.

    ReplyDelete
  28. I did not delete any post of yours which I had approved and published.

    ReplyDelete
  29. I have not deleted any posts of yours that I have approved and published. I say you say native IPv6. We as an ISP do native IPv6 which is of course over PPP. People also do IPv6 via tunnels, and I am glad that is not the case as that is way more complicated.

    ReplyDelete
  30. Third time now - I did not delete any post of yours which I had approved and published. You do like repeating yourself.

    ReplyDelete
  31. So the long post covering two different scenarios (because you refused to actually say how the internet was connected) is not answering. And the email saying the same is not answering. This really is getting silly now.

    ReplyDelete
  32. Native IPv6 over what? Ethernet or PPPoE? That did not say. That was the main thing I was trying to find out, and no, IPv6 over Ethernet as an ISP is not the most common thing in the world. Indeed, we have not encountered it at all before selling to many countries - (almost) everyone uses PPP (which is PPPoE on the Ethernet, obviously). Even FireBricks we have sold in to China use PPPoE.

    Lots of people do native IPv6 over PPPoE.

    Again this issue here only happens with Apple and Unifi.

    This is to help find the issue which has been reported with apple+unifi+firebrick, and been reported will apple+unifi+mikrotik and we are trying to help you (and would be happy to help apple) with that. You now have in your hands a case where you see this happen with a FireBrick. This should be good news.

    You have more tools to see the WiFi side than we do and a much greater understanding of the WiFi side. You are the experts on the WiFi, and we are not. I have no problem conceding that fact. I do not know how AP to AP roaming works at all.

    Now you have this test case in your lab that fails, which you do, as you said you do, you can see on the WiFi side how it is failing, I assume.

    I am especially interested to know that because, my understand is, that a roam between two APs does not involve any packets going to/from the gateway (the FireBrick in that example), and so I struggle to see how any gateway can cause the roam to fail. Do correct me if I am wrong - but is that the case? - a roam between two APs does not involve packets to/from the gateway, does it?

    If so, what do you see happening in the failure case?

    ReplyDelete
  33. To be clear, I warned you in advance that if you keep repeating your test cases in your posts, I will not approve it. You chose to ignore that warning. Up to you.

    In summary:
    The issue only happens on Unifi APs - do you agree?
    The issue only happens on Apple devices - do you agree
    The issue has been seen using FireBrick and non FireBrick gateways - do you agree?
    There are many test cases where the issue has not been seen - do you agree?

    See - state some actual facts like that ^ and the "blame" for the issue looks different doesn't it?

    Now I have
    - Declined to approve a post that keeps repeating tests cases over and over again, and insisting, contrary to comments from third parties, that this only happens with FireBrick after I warned you I would decline to approve such a post
    - Cursed you once after you repeated such assertions lots of times on twitter - but hey - freedom of speech. I did apologise.
    - I have repeatedly correctly blogged about the facts of the matter. Interesting that you think it is a configuration issue. This may be progress.
    - Have answered your questions on the blog and by email in spite of the fact you would not answer my questions about the internet connection.

    Now, I am intrigued by the comment of it being a configuration issue. Can you elaborate?

    Given that an AP to AP roam does not involve any packets to or from the gateway, what configuration issue, exactly, do you feel could be the cause. Maybe if you know of such a configuration we can look in to it in more detail.

    Am I right that an AP to AP roam involves no packets to/from the gateway router? Is that correct? You are the experts here...

    ReplyDelete
  34. It gets an IP using RA and SLAAC on the WAN, out of the box - is that not the case here?

    It gets an IPv4 using DHCP on the WAN, out of the box - is that not the case here.

    It does not do DHCPv6 client on the Ethernet directly, only on PPPoE. If that needs adding, we can look in to that. But as I say, we have not encountered anyone that does Internet connections like that before, in many countries! Interesting that it is not the case in US - no it has not been tested in the US, and as it does not even try to do DHCPv6 client on ethernet it has not been tested anywhere else. It is not a feature. But one we may add.

    Saying something that is not a feature "has never been tested before" is not that fair.

    I answered Jeff's email the same as the details I posted on the blog, at 16:17, 30 minutes ago.

    ReplyDelete
  35. Having said that - whilst I am happy to work on getting this going on what you say is a "normal" US internet connection, and that would be good, I am not sure it helps with the WiFi issue. You said you re-created that. That would not actually need external connectivity even - when this fails you cannot talk to devices on the LAN even, where the gateway is not involved.

    ReplyDelete
  36. 1. OK we had a post from someone saying they had seen this on non FireBrick. I prefer not to call him a liar just yet, and hopefully he can clarify the symptoms.

    Even so, for config issues, surely the AP to AP roaming is transparent to the gateway - the MACs do not change? Would not such settings cause issues all the time, not just when roaming AP to AP? Are there config changes that can actually stop an iPhone correctly roaming AP to AP? What are they?

    Yes we have seen silly configs, people get subnets masks wrong, etc. That has permanent impact though, not just on AP roaming.

    2. As I say, you are the expert on the WiFi side, and the fact that there is a complex state machine does sort of fit - the phone gets in a state that is somehow ill - with the AP associated with the iPhone, and staying like that until wifi off/on or a successful roam to something else, but no packets flowing. I am more than happy to believe this is, at the crux of the matter, a bug in apple. I'd love to find a way to stop it happening even if a "work around" for apple.

    3. You list those test cases yet again, but they appear in this one blog post alone six times from you. I am getting fed up - keeping saying that list of test cases is not adding to the diagnosis, and again, if you include that in your post I will not approve it.

    I am glad you confirmed the issue is only with IPv6 enabled, that agrees with my findings.

    You just kept saying native IPv6. We do native IPv6 over PPPoE. Native as opposed to tunnelled. Saying native IPv6 was not really answering the question. I'll put that down to geographic norms for now. You don't expect PPPoE, we don't expect anything other than PPPoE.

    I did go in to some detail on the answers, so not sure why asking again.

    Now, yes, back to the issue, assigning an IP involves the gateway/router, agreed. But once the IPv4 and IPv6, and netmask/length, and DNS, and gateway are all assigned, surely that is it (at least for the lease time). The phone is then working. Only on an AP hand over does it stop (sometimes).

    Do you agree that at that point in time, for an AP to AP roam, there is no involvement of the gateway - no re-assignment of IPv4 or IPv6 addresses or credentials needed - no packets to/from the gateway?

    Now, you say "of course the gateway is involved in roaming". This is where the pcaps that Jeff sent disagree. He was clear that the gateway was not involved. Or so I thought.

    So I am confused. And maybe there is a clue here.

    iPhone to gateway does not need to re do any ND (like ARP but for IPv6) as the MAC of the gateway is actually in the RA packets for SLAAC, not an IP. The phone continues to know the gateway MAC and no packets need be exchange to find them.

    Finding the MAC of the phone - are you suggesting the phone changes MAC when it roams? If that is the case, then indeed that is an interesting case as there is no way the gateway would know that other than by timing out on existing ARP or ND cache entries. The iPhone nor the APs tell it that is the case do they? If the MAC is changing we would see this issue on every network with every router as caching IP to MAC mappings is normal, and indeed CISCO tend to cache for really long periods.

    So are you saying the MAC changes? If not, in what way is ARP or ND a factor here?

    If not, then the existing ARP/ND caches remain valid, and would continue to check again on timeout. If that was the issue, after a few minutes all would work again, which is not what we found.

    Are you saying the MAC changes when roaming?

    The failure case is a permanent lock up until wifi off/on or roaming to another AP successfully. It is not simply a brief period of no connectivity. Also the phone loses the IP being unable to communicate and decides to drop the IPv6 address even.

    Also, we have seen this when not even using DHCP but fixed IP on phone.

    ReplyDelete
  37. I don't see an email on DHCP throttling, when did you send it? Can you elaborate? It sounds interesting but as we have seen this roaming issue when DHCP is not in us, and not seen any DHCP issues otherwise, I doubt it is the cause.

    Happy to look in to any possible cause though. Will be interesting if you have found it.

    ReplyDelete
  38. Sorry, still at a loss, we do not know of this feature of "DHCP throttling", and we have not yet seen any email on you explaining it.

    It is not a "feature" of FireBricks and so has no setting to turn it on or off, we have no clue what you are actually talking about here. The email you references would be helpful. Maybe you can re-send it?

    It is not ISC dhcpd, no. It is FireBrick DHCP server.

    And, as explained, we did tests confirming the roaming issue happened when DHCP not being used by the phone at all - that was some time ago, so we are going to repeat that test to confirm.

    We'll let you know when we have managed to do that test.

    ReplyDelete
  39. No, still using SLAAC on IPv6 in that case. We have not tested static IPv6 address yet (assuming an iPhone can do that), but happy to try that as well.

    ReplyDelete
  40. Ok great - but seeing the email where they say what they are seeing that they are calling "DHCP throttling" would be handy. You said you sent an email and we have not seen it yet.

    And no, this was, as explained, some time ago (years) when first investigating this and eliminating DHCP. We set static IPv4 on the iPhone and left IPv6 using SLAAC so no DHCP involved. The problem persisted.

    However, as I have said, we will be doing that test again.

    ReplyDelete
  41. I had this. Tried pretty much everything mentioned above. Roaming Apple Devices kept disconnecting. Nothing helped. Until I changed the default beacon setting (it’s in the wireless settings somewhere) from default 1 to a setting of 3 (preferred by Apple Devices). And no more disconnects since then.

    ReplyDelete
    Replies
    1. Yup DTIM 1 to 3, and disable fast roaming 802.11r.

      Delete
  42. Not sure where you got to on this. However I have just been through the mill troubleshooting a seemingly similar issue on my multi AP unifi setup (in fact two in two different countries!).

    For ref I am using Edgerouter X, cloudkey with a mixture of AC lite, AC pro and AC outdoor. 6 APs UK (Cisco 3750 switched) and 4 here in France (off the Edgerouter/a commodity zyxel PoE 8 port).

    Seeing the same symptoms, IOS devices dropping off and rejoining when woken, everything else just fine.

    Two things improved the situation for me:

    1) Change the DTIM under the WiFi settings (per SSID) from 1 to 3. This is something to do with how long broadcast/multicast are help before being dumped (1=100ms) and IOS refusing to wake up to check the WiFi more than 3 times a second.

    2) Disable fast roaming (802.11r) on the unifi (advanced settings under WiFi). This is recommended to be disabled but a recent sw update apparently randomly enables it, in my case it was on purely from a ‘that seems like a good thing to tick’.

    All good now, now random drops, roams perfectly in both locations.

    ReplyDelete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

TOTSCO changing the rules again

One of the big issues I had in initial coding was the use of correlationID on messages. The test cases showed it being used the same on a se...