Thursday, 30 March 2017

Where are we with Unifi and iPhone roaming?

As you will know I have spent a long time trying to understand the issues we see with the Unifi access points and roaming between them using an iPhone.

A&A sell these, and some of their PoE switches as well. We may start selling more stuff in due course. Overall the Ubiquiti stuff is pretty impressive and there is an increasingly large range of devices. The WiFi is technically very good at the hardware level, and we sell in boxes of three even for businesses.

So it is important to us that they work. I also use them at home, and my family treat me as tech support (obviously) so it is important to me if I want a quite life. They were all round this Sunday - we had sort of cancelled Mothering Sunday for obvious reasons, but everyone came round and we had pizza and chatted. They all told me in no uncertain terms that the WiFi here is crap and they even turn off on their phones and use 3G/4G when round the house. They all use iPhones. That really is a bad sign.

I myself spend a lot of my time in my office at home, but whenever I leave for the rest of the house I find I have to turn wifi off and back on. Though, technically, it is far from every time and can even be the odd day with no apparent problem, whilst other days I see many times. The problem is, as always, you remember the times it breaks.

This also makes testing hard - something changes and you watch it, and see you spend all day with no issues and think it fixed, when actually it is just intermittent, still, just as before.

I have an AC Pro and two AC LR in the house, and they are now on latest firmware. I thought that may have helped, but no. We also tried changing switches, and thought that had helped, but no.

The current state is that I have managed to mess with wiring enough in the house to actually have all three APs off a single Ubiquiti EdgeSwitch8 - one of their switches - so as to eliminate the switches as the cause of the issue.

Tip: Some of the Ubiquiti kit is still passive 24V PoE, and their switches are great as they support that, but you have to configure on the switch! It is not automatic as PoE normally is.

We also did tests with just IPv4 on the LAN, only for a few days, but that seemed to just work. This means the current thinking is that it is the IPv6 being present that is causing the issues. It could be some combination of bugs in iPhone, Ubiquiti, and even FireBrick code, for all we know. Reports from others that use this kit say no problems. We did a lot on FireBrick to try and eliminate that as the cause. However, with IPv6 on the LAN, even with IPv4 being static on the iPhone and no DHCP, it can still fail. Setting up DHCPv6 on the LAN does not seem to change things, we normally use just RA/SLAAC.

The symptoms are a sudden lack of connectivity when it roams. For a few seconds the phone may show the old IP addresses, but quickly switches to showing no IPs and then to showing the 169 auto addresses. Wait as long as you like, it is broken. You need to turn WiFi off and on (on the phone) to fix it.

Part of the reason for writing this up again is for the engineers at Ubiquiti - they are trying to fix this. Good news (though I seem to have to poke on twitter to get things progress, sorry guys). They sent me some switches and a router and gateway. Big thank you - nice to eval the kit as some of it we may start selling. We sent them a fully loaded FireBrick FB2700.

At this point the next stage is for me to try and create a setup using their kit as the gateway on the LAN and so doing IPv4 DHCP and IPv6 RA/SLAAC, and see if that breaks still. It is a pain as I cannot exactly replace my router as it is the office router. So I have set up a new IPv4 and IPv6 subnet for WiFi use. Not ideal, but will do for testing.

They, for their part, need to try and set up with a FireBrick to do the same. Can they make it break? Obviously I am on hand to help them set that up.

So setting up the Edge Router. It is a simple set up. No NAT. Fixed IP /24 IPv4 and /64 IPv6 on LAN with DHCP serving IPv4,and RA for SLAAC doing IPv6. On WAN is a simple IPv4 which can be DHCP client or static, and a simple IPv6 which can be SLAAC or static. Obviously need to set IPv6 DNS servers for RA on LAN.

So far I have managed to set up:-
  • Firewall off
  • NAT off
  • Static IPv4 on WAN (a /24 for testing)
  • Gateway 0.0.0.0/0 route on WAN, can ping out to internet
  • Static IPv6 on WAN (a /64, obviously, from my PI block)
  • Gateway for IPv6 on WAN
  • Static IPv4 on LAN
  • DHCP IPv4 on LAN
  • Static IPv6 on LAN
  • RA on LAN configured by ubiquiti for me
And I am stuck. So waiting on Ubiquiti at this stage. Suffice to say I don't think they are a threat to FireBrick as this is all pretty simple on a FireBrick.

No word on where they are with FireBricks. Obviously keen to help them test the other way around. To be fair, if this is either a bug in FireBrick in some way, or more likely, something we can work around by changing FireBrick in some way, I am more than happy to do the work to make that happen. We have implemented a number of "pragmatic" aspects to the way the FireBrick works (sometimes on a config setting so as to be "standard" by default) and I'd really like this WiFi kit to work...

I think best if I update this post as we make progress for a bit rather than new posts. Let's get to the bottom of this, shall we?

Updates:
  • From comments, it is not just FireBrick, but is some rare combination of things clearly, and seems to be Ubiquiti APs and iPhones and "something" else.
  • IPv4 gateway not working was user error, I mistyped as 0.0.0.0/24 for some reason
  • Someone from Ubiquiti, in Austin, Texas, in the middle of the night, is working with me on this now.
  • IPv6 gateway was not working as I was using the zero address in the /64 which the ER had assumed it can have making it a router on the WAN side, which is unexpected. I changed to the ::1 in the /64.
  • Now wifi all on ER not using FireBrick, thanks to guys from Ubiquiti working in middle of the night. Roaming appears to be working, more testing to do. I am being sent a cap of working roaming as seen by ER, and will get same from FireBrick.
  • We now have two interchangeable set-ups. Both on same sets of IPs as a separate subnet for my WiFi run as a LAN side of a router. I have the ubiquiti EdgeRouter set up, and the same set up on an FB2700. At present both seem to "just work" but as I say, this can take a whole to see the fault. I have lots of logging. One clue is that I am sure I have seen the iPhone re-do DHCP on roam, and the current testing (on both set-ups) does not do that - it just flips over to new AP basically seamlessly. So, just more testing for now. If both these "just work" we have to go back and see what else on the main LAN could be upsetting things in any way.
  • This morning (Saturday), still no apparent roaming issues! This is using a FireBrick but on a separate LAN the same as the ER set-up. Again, if the roaming happens without involving the gateway router, no way the FireBrick can be to blame. If it is OK for a few days I look to swap back to main LAN and see if that shows the problem again.
  • Sunday, still using a separate FireBrick as gateway, and have set up the second VLAN that was being used before on it. Still not failing. This makes no sense at all.

93 comments:

  1. Can you temporarily replace your home firebricks with alternative kit in order to see if the problems melt away?

    ReplyDelete
    Replies
    1. If you read ^ you would see that (a) no, as it is actually the main office router, and (b) that is basically what I am doing under my home network to test wifi using an Edge Router.

      Delete
    2. "If you read..."? Cheeky sod, I read the lot. a) It simply wasn't clear to me that your previous mentions of using a Firebrick for your home installation and "the office router" are the same thing, and b) if you can't replace it because it's the office router than that's NOT what you're doing, is it? I was referring to eliminating your firebrick code involvement entirely, which is quite possible for testing how wifi and iphones behave locally. My suspicion is that either or both your firebrick's and the ub's implementation of IPv6 at some level are making assumptions which are not valid.

      Delete
    3. Sorry :-) What we are doing is making the APs on a LAN that is *not* using FireBrick. We have that now, and are testing.

      Delete
    4. No worries. It'll be interesting to see what it ends up being. iOS 10.3 has come out, with 10.3.2 in beta. They often tweak operation in updates without explicitly announcing a change, so that's worth trying if you're not already there.

      Delete
  2. I have a FireBrick, UniFI APs, a ToughSwitch and an iPhone, and I am using both IPv6 and IPv4 so, if I can help with any defined testing since I have a similar combination of stuff, please do just say.


    (I was using an EdgeRouter before moving to the FireBrick and, yes, IPv6 is much easier on the FireBrick than on the EdgeRouter. But I did like the EdgeRouter.)

    ReplyDelete
  3. So Brandon here from the UniFi team. We're trying to work with RevK on the issues seen here. Using EdgeRouter, USG, and pfSense the issues don't show up.

    We have one of RevK's routers and we're trying to reproduce the issue. So far as we can tell, the issue only occurs w/ RevK's router and UniFi.

    When using UniFi with any other router the issue doesn't show up.

    We have one of RevK's routers now, and we're trying to get it setup to try to reproduce the issue.

    Thanks,
    Brandon from UniFi.

    ReplyDelete
  4. I definitely see a similar (but perhaps less severe) problem when roaming between my 2 APs. I have a Mikrotik router, so I don't think this is can all be in the Firebrick department. AP's are pretty close to being chucked at this point.

    ReplyDelete
    Replies
    1. Brandon, take note, not just FireBrick.

      Delete
    2. Dan Graville - can you shoot me an email at brandon at ubnt dot com with details?

      We are seeing this on RevK's router, and we have your report of Mikrotik (but from our testing, it's also fine) and one current case w/ Windows DHCP server (previous cases, we've found this to be configuration errors).

      Thanks,
      UBNT-Brandon

      Delete
    3. Dan, do let me know the symptoms. Obviously if this is even one case of the same issue with non FireBrick, it shows FireBrick is not the cause, but perhaps some network set-up (perhaps even one that is common and default with FireBricks). If that is the case, then maybe we can compare notes on network set-up that works and does not and find a clue.

      Delete
    4. Yes, also could you make a thread on community.ubnt.com and tag me at UBNT-Brandon?

      Delete
    5. Yes, also could you make a thread on community.ubnt.com and tag me at UBNT-Brandon?

      Delete
  5. I haven't got a firebrick to try this, but I can't say we see this issue at all. Despite having lots of UniFi kit in lots of deployments - such as hotels, where users do roam around a lot - especially if they're staying for functions and the like.

    We run both IPv4 and IPv6 deployment in those environments too - so we'd expect this to cause us issues if so - typically there is a Mikrotik involved somewhere in our deployments too as it happens - we tend to use them for the gateway with the connectivity usually via PPPoE to whatever is appropriate for the connection used.

    If I had a firebrick here, and an A&A connection (or I guess an L2TP from A&A) I'd happily test this out as I'd love to see this issue happen or not - it'd be useful to see what can trigger it given we have so much UBNT kit out there.

    ReplyDelete
    Replies
    1. Thanks for the post here. It is seeming this is an issue w/ RevK's router. We have tested his router at our site - and we see the issue immediately.

      Using another brand router (Cisco, pfSense, USG, ER, Linux, etc.) does not show the problem.

      So far we can only get the problem to occur using RevK's router.

      Thanks,
      UBNT-Brandon

      Delete
  6. If it helps, I have 4 sites using Unifi with pfSense as the gateway, and upto 10 Unifi APs, lots of them with lots of iPads etc, and lots of staff with iPhones, and I have not seen this issue at any site, nor have I had a single reported wifi issue at any of the sites since switching to Unifi.

    Which is more than can be said for the systems they replaced!

    ReplyDelete
  7. I have 4 Unifi AP-LR and 1 Unifi AP-AC-LR connected to a Debian Linux box acting as a router. They are connected together using a Ubiquiti Toughswitch. The network is currently IPv4 only and issues addresses using DHCP. NAT is applied by the Linux box. I have no issues with roaming and we have a lot of Apple devices (iPhones, iPads etc.) which all seem to work fine.

    ReplyDelete
  8. I had a similar issue to this but using some DrayTek APs in my home. Whilst I don't have them set up for true roaming, I do have them on the same VLAN, same SSID, key etc.

    What I realised was that the signal was too strong (believe it or not) and there was not enough of a drop off in strength between the APs to cause the iPhone to start searching for another SSID to connect to.

    To solve my issue I simply reduced the TX power from 100% to 80% on both APs. The switch over between the two WAPs is now much smoother with barely any interruption to service (nothing like whats described above), and I still have full coverage within my humble abode.

    Just a thought, could work for you...

    ReplyDelete
  9. I have been using Ubiquiti's UNiFi equipment, and used their PTP and PTMP equipment when living in the country, for some time.

    It is full of potential, that is never realised. Mostly because their software development process, as far as the UniFi range is concerned, is deficient in all respects.

    So after six years I've switched. I'm currently evaluating Juniper and Meraki for my home office.

    In the real world we have a much more complex set of requirements than UniFi can handle, regardless of promises.

    Most people with a home office have Office365 accounts, which involves Azure AD integration. Which is not a stable feature of UniFi firmware/software. Like it or not, multicast is here to stay, good luck with that on UniFi, same thing goes for pure IPv6.

    And don't get me started on the nonsense produced by the UniFi controller as operational information.

    The iPhone issue continues to rear its head over the years, given that Robert Pera (UBNT's founder) started at Apple, one has to wonder.

    You can fight, or switch. I decided to switch. I'm happy to pay more for stuff that works.

    ReplyDelete
    Replies
    1. Hi there,

      Do you have any specific information on the problems you have encountered? We have ~300,000 installations every month and many happy customers.

      Would love to get this sorted for you.

      And as to your complaints on data visibility - could you share data there?

      Thanks in advance,
      UBNT-Brandon

      Delete
  10. Just a quick note, LR access points are pointless in the UK. At our power limits (certainly for 5GHz) they're limited to 200mW on the indoor channels - the non LR ones can easily do this. I guess there is a slightly better recieve gain as well but it's probabyl not worth the cash.. Unless you set them to a differet country code which is illegal and affective.

    ReplyDelete
  11. What do you use for your ADSL modem RevK? I'm one of those who's had IPv6 (not related to wifi) issues with the VMG1312 and periodically checks https://support.aa.net.uk/VMG1312:_Bugs to see if there's been any progress - do you feel like you're getting anywhere with ZyXEL / can you recommend anything better?

    ReplyDelete
    Replies
    1. I don't use ADSL. I have glass, connects me via office to a datacentre which is linked to edge BGP routers on to transit and peering with my own PI space on IPv4 and IPv6. Local router is not even in my house, but at the office, and is a FireBrick.

      Delete
    2. So just your basic, bog standard, home set-up, really :)

      Delete
  12. I was warned off Ubiquiti by RevK's earlier posts. I use a Firebrick and some Apple iOS kit and I currently have some ZyXel NWA-3560-N WAPs which are superb. Am about to try out some Cisco 1830 WAPs for something a bit faster.

    ReplyDelete
    Replies
    1. Zyxel are a POS matey. You just try getting them to fix something one of their ISPs hasn't reported & see where you get.

      Zyxel only give a damn about their ISP customers - consumers "shouldn't be buying their products as they're meant for OEMs/ISPs". That's a statement from Zyxel in 2014.

      Cisco? You having a laugh or what? Cisco APs are complete crap AND you get price-gouged on licensing. Better off going with Ruckus - you still get price-gouged on licensing but they do work a hell of a lot better than the borg's shite.

      Disclaimer - I used to contract for Cisco a long time back (Triangle Park when it was a LOT smaller). They used to be a good company back then too. Not so much these days.

      Must be getting old, seem to find myself saying "not so much these days" when people ask me how good companies are :)

      Delete
    2. I can form my own opinions and don't require advice from 'anonymous'. I don't know anything about ZyXel the company. I was talking about one of their products only. As for Cisco APs, I am about to find out for myself and don't require advice from the nameless and which is devoid of references.

      Delete
    3. Hi Cecil,

      That's too bad to hear. So as it turns out this seems to only happen with RevK's router.

      Using Cisco, pfSense, Linux, USG, ER, etc. the problem doesn't occur.

      Put RevK's router in the mix, and right away you see the problem.

      We have unconfirmed reports of Mikrotik and Windows DHCP server - but we've had these before - and in those cases we've helped the admins find and correct configuration errors, which resolved their problems.

      Expect the same here.

      Summary: happens w/ RevK's router, not with ER, USG, Cisco ATA, pfSense, Linux, etc.

      Thanks,
      Brandon

      Delete
  13. It works fine using a Sky VDSL2 connection - dynamic IPv4 address and a /56 PD setup. SR102 router (locked-down busybox) with wifi off.

    Daughters housemates (they all use iphones/ipads/ipods) have had zero problems with gen2 Unifi AC Lites.

    You should note that if you had any generation 1 APs with zero-handoff/min RSSI setup then those retain that setup until you re-enable "Advanced settings" in the controller and then turn it off. This is a bit of a "gotcha" and probably needs a bug report because the controller won't reprovision the AP with ZHO/minRSSI unless they're enabled on the controller - which they're not on current versions.

    ReplyDelete
  14. Also avoid the Unifi/Toughswitch PoE switches like the plague.

    There's known issues with failure modes being induced into APs when the switch is powered off/on for the Unifi PoE switches and verious revisions of the Toughswitch overheat.

    Its very dependent on hardware revisions but AFAICT you run a decent risk of bricking various APs if you use Unifi switches with PoE. There's even been verified reports of the switches frying Intel NICs on blades.

    Ubiquiti seem very good at dealing with those issues in the USA - they'll RMA it no problem. In Europe, not so much as the distributors are stuck with the dodgy units and are (in general) denying there's any issues.

    When all's said & done though the RF frontend on APs is very good & the controller software is getting there (although for the love of gods Brandon get a fucking move on with IPv6-PD and DHCP options for the USG. The CLI & config json bodge is an accident waiting to happen!)

    For similar remote deployment/config features you'll be paying yearly licensing fees. That's where Ubiquiti win.

    ReplyDelete
  15. The more I read about Ubiquiti the more I am put off ever using them. As it happens I bought Apple Airport Extremes (v4 and v5) before I'd heard of the Ubiquiti, and they work great even for a network with a wifi hop in the middle of it ("Extend a wireless network" option). I just wish they'd let me turn off the unwanted 2.4GHz networks, but the Extreme doing the extending creates one whatever I do.

    I also have broken IPv6 because of the VMG1312 problems. I bought a Firebrick 2700 a year ago to replace the Zyxel router (I have another running as a PPPOE modem), but the FB2700 configuration is complex and obscure I've never managed to get my head round it. So there it sits gathering dust.

    ReplyDelete
    Replies
    1. The RF side of Ubiquiti is excellent. The controller side not so much but its getting better since they poached the pfSense project leader :)

      The switching side of the Unifi range with PoE has had its fair share of problems - I'd suggest that nobody really did any significant failure analysis on any of that stuff, hence the multiple failure modes affecting a lot of people.

      Having said that its really the only game in town for anyone wanting enterprise-level networking without paying extortionate licensing fees.

      I honestly don't think Ubiquiti kit - even with the (prone to failure on powerloss) CloudKey - is suited to home users. Even ones who believe they do have a clue but in reality don't know how things work :)

      Delete
    2. If it helps, once I got started with the FireBrick, I've found configuration generally pretty easy, especially when combined with A&A's willingness to help, provide code samples etc. I too was a little daunted at first, but it does make sense once you get over any initial hump.

      Obviously, I can't provide the kind of knowledge that A&A support could offer but, if you wanted a potential steer or two from someone who came from the same starting point, do you say.

      Delete
    3. The problem with the FB2700 is that everything has to be configured before it does anything. The initial mountain to climb is huge. Plus the config is in xml which I detest, so that doesn't help.

      What is needed is a comprehensive beginner's guide, which starts from assuming all you know is how to change the DHCP reservations and map a port through the firewall on something like a VMG1312. That's about as complex as most people get with a home router. The FB2700 manual is a reference manual, it's no help to beginners.

      Delete
    4. Ok let's be fair - if you plug in to a PPPoE connection and plug a laptop in, you can be on-line with no config at all, even with IPv6! It has some sane starting defaults.

      The XML is there, but there is a web interface to use to edit config with links to reference documentation, pre-defined field names and types and pull downs to make it simple.

      Some guides for people used to other systems is a good idea though.

      Delete
    5. As I say, I had exactly the same fear as you — mine sat on a shelf for longer than I had wanted!

      I found that simply plugging it in with the default config was sufficient to get me online, and manually setting an IP on my local machine and connecting over ethernet got me onto LAN1, if I remember correctly.

      I do like the idea of a "beginner's manual", and I'm happy to contribute to one, if that helps!

      Delete
    6. Just to add, the EdgeRouter looks competent, but one issue is you cannot do config in one place. I would have to go to some command line to set IPv6 RA, and routes and stuff. Why not via the web interface. FireBricks have it all in one config with XML or web interface to edit and covering everything. We did try, honest.

      Delete
    7. I *think* you can do everything on the ER via the CLI, so it might be "all in one place" from that perspective. But, yes, one of the things I disliked about it was that the web GUI was not available for everything.

      Delete
    8. Is there such a thing as a hackathon for documents? If there is a enough call for beginner documentation for FireBrick, it might be a reasonably interesting day to get some would-be users /beginners (who know what questions they have) together with some people who know what they are talking about, and some others willing to write things up in a user-friendly fashion, and thrash out a beginner's guide?

      (And I can't decide if it should be a beginner's guide — a guide for a beginner — or a beginners' guide, for all beginners.)

      Delete
    9. On the ER all of the CLI settings are available in the GUI via the config tree, which sounds like it might be similar to your interface for manipulating the XML on the firebrick (without the links to the documentation). The only bit you can't do in the GUI is upload files such as certificates for OpenVPN.

      In reality once you know your way around the CLI, there's not much you'd want to bother with the GUI for. Also the GUI isn't responsive which is​ a PITA in the modern world!

      The real problem with them from a business point of view is they are pretty time consuming to set up, especially if you want to use features like ipv6, so unless i have a specific use case such as requiring an OpenVPN server, i tend to use MikroTik's for most clients.

      Delete
  16. This unifi thread is very similar

    https://community.ubnt.com/t5/UniFi-Wireless/Firmware-bug-3-7-49-6201-on-UAP-AC-HD-iOS-devices-receive/m-p/1878953#U1878953

    ReplyDelete
    Replies
    1. Yes, we are working with that user. We have seen similar and in the past it was a Windows DHCP server configuration issue.

      Many DHCP/router misconfigurations can cause such behavior.

      Delete
  17. So wanted to circle back here: running the EdgeRouter we sent you, roaming is now OK?

    ReplyDelete
  18. Also, anyone experiencing issues, could you please email me at brandon at ubnt? For whatever reason this page will not let me reply to comments directly.

    We'll get everyone taken care of.

    Thanks,
    The UniFi Team

    ReplyDelete
  19. Also, Dan Granville - what Mikrotik are you using, and could you send over a sanitized config file?

    As others have noted here, the problem does not exist with USG, ER, pfSense, Cisco ATA.

    So far we have seen confirmed issues with RevK's router, and unconfirmed cases (likely other configuration issues) on Windows DHCP server and Mikrotik.

    Thanks,
    Brandon

    ReplyDelete
  20. So to follow up here:

    Also on our site, we tested with the following routers:
    USG - problem does not occur.
    ER-PRO - problem does not occur.
    pfSense (multiple versions) - problem does not occur.
    Linux (various configs) - problem does not occur.
    Adrian's company's router - problem occurs immediately.


    Thanks,
    Brandon

    ReplyDelete
  21. Also, just to follow-up here. One of our engineers tried to help Adrian resolve the issues w/ his Firebrick router.

    Both of the important questions needed to come to a resolution were ignored.

    1. How do I enable the dhcpv6 client (w/PD) on the WAN so my home devices get IPv6 addresses? I’ve read through https://support.aa.net.uk/FireBrick_2700_Configuration_run-through#Native_IPv6 but don’t believe I should have to enter my IPv6 subnet somewhere in order to get this to work.

    2. What kind of a route do I need to associate to one of my shaping rules to just limit my entire home to 10 Mbps up/down, for example? Once again, please be specific, I’ve read through the docs, and they only talk about “customer routes” and I’m just interested in the default route.

    Anyways, in summary:

    With these routers roaming works fine:
    Running with pfSense - no issues.
    Running with Cisco ATA - no issues.
    Running with Linux (Debian/Ubuntu) - no issues.
    Running with Mikrotik - no issues.
    Running with USG - no issues.
    Running with EdgeRouter - no issues.

    We tried to help RevK figure out the bug w/ the Firebrick, but didn't get our questions answered.

    For anyone experiencing issues - please create thread on community.ubnt.com and tag me (UBNT-Brandon) - or just shoot me an email at brandon at ubnt.com.

    Thanks,
    Brandon

    ReplyDelete
    Replies
    1. I replied straight away, asked some questions, once again it is me waiting for answers.

      Delete
  22. Replies
    1. No, I asked how the Internet connection works so that I could advise you, as you wanted specific answers. It is relevant if it works using PPPoE or if you have an Ethernet WAN with DHCP/DHCPv6 or what. The answers are different. It was not a complex question. I also suggested you could email me a config and I would go thought it and be even more specific. Ignoring my email and not replying does not really help matters.

      Delete
  23. Anyways, in summary:

    With these routers roaming works fine:
    Running with pfSense - no issues.
    Running with Cisco ATA - no issues.
    Running with Linux (Debian/Ubuntu) - no issues.
    Running with Mikrotik - no issues.
    Running with USG - no issues.
    Running with EdgeRouter - no issues.

    ReplyDelete
    Replies
    1. Running with Samsung - no issues
      Running with any other Android - no issues.
      Ergo issue is apple.

      We can do this all day.

      I can ask firebrick customers who have CISCO or Ruckus APs to see if they have no problems - we have not had any reports of them.

      This is not helping actually find the cause - this is trying very hard to bury you head in the sand and insist it has to be FireBrick to blame.

      I ask you once again - if, as it seems to be expected, when an iPhone roams from one AP to the next, there is NO INTERACTION whatsoever with the gateway (no re-do of DHCPv4 or router solicitation) - then exactly how can the gateway actually be, in any way, to blame, for the roaming appearing to fail and leave the iPhone on the new AP with no connectivity at all? What exactly could a FireBrick be doing here to cause the roam to fail?

      Delete
  24. 1. How do I enable the dhcpv6 client (w/PD) on the WAN so my home devices get IPv6 addresses? I’ve read through https://support.aa.net.uk/FireBrick_2700_Configuration_run-through#Native_IPv6 but don’t believe I should have to enter my IPv6 subnet somewhere in order to get this to work.

    2. What kind of a route do I need to associate to one of my shaping rules to just limit my entire home to 10 Mbps up/down, for example? Once again, please be specific, I’ve read through the docs, and they only talk about “customer routes” and I’m just interested in the default route.

    Neither of these were answered, or are yet answered. We actually went as far to try to help you fix what is going wrong with the router... but first we need answers to those questions.

    Are you going to answer them?

    Anyways, in summary:

    With these routers roaming works fine:
    Running with pfSense - no issues.
    Running with Cisco ATA - no issues.
    Running with Linux (Debian/Ubuntu) - no issues.
    Running with Mikrotik - no issues.
    Running with USG - no issues.
    Running with EdgeRouter - no issues.

    Thanks,
    Brandon

    ReplyDelete
    Replies
    1. Sorry this is getting silly - You asked those questions. I said that in order to answer them specifically as you asked I need to know what sort of internet connection setup you are using, and sending a config would be helpful. I feel we are going around in circles here. You appear to be refusing to answer me at this point, not the other way around. I do not understand why you think repeating the question rather than answering me so I can answer you, is sensible.

      Delete
  25. Also - most importantly, we resolved the issue you are seeing by sending you a free EdgeRouter. And also by checking your network config - you seem to have resolved the issue you are seeing with the Firebrick as well, right?

    Anyways, in summary:

    With these routers roaming works fine:
    Running with pfSense - no issues.
    Running with Cisco ATA - no issues.
    Running with Linux (Debian/Ubuntu) - no issues.
    Running with Mikrotik - no issues.
    Running with USG - no issues.
    Running with EdgeRouter - no issues.

    ReplyDelete
    Replies
    1. No, once again, you did not "resolve" the issue. Two things were changed. 1 - moving APs to separate subnet, and 2 - using ER. That fixed it. However, putting FireBrick in, on separate subnet *ALSO* fixed it. Buy you logic we just SOLVED the issue by INSTALLING AN FB2700 FIREBRICK.

      As I have explained several times, there are clearly some specific cases that this happens and most cases it does not. EVERY case involves the unifi APs and apple, and ALMOST every case involves a firebrick. Constantly repeating your specific test cases which are clearly not the whole picture does not change anything.

      Delete
  26. Also, we are fine keeping it as a lab router for a few days, as long as you’ll help me with a few things:
    1) Need native IPv6 working (as roaming only failed in his environment when he had that on, there’s no point in testing w/o IPv6)
    2) Need traffic-shaping working
    3) Need port-reflection (pinning, whatever) working
    4) Need incoming VOIP calls audio working (two-way audio works on outgoing calls, neither direction audio works on incoming calls)
    Until you can clearly explain how to get these things working, it’s staying off the network, but we can wait forever :slightly_smiling_face:

    Summary right now:

    With these routers roaming works fine:
    Running with pfSense - no issues.
    Running with Cisco ATA - no issues.
    Running with Linux (Debian/Ubuntu) - no issues.
    Running with Mikrotik - no issues.
    Running with USG - no issues.
    Running with EdgeRouter - no issues.

    Problems w/ the FireBrick.

    ReplyDelete
    Replies
    1. 1. I will not approve any more posts where you keep repeating you list of test cases. They do not need repeating, OK?

      2. To answer the question tell me how the internet connection is set up please?

      Delete
    2. By all means tell me to keep my nose out, but might a 10 minute call resolve potentially days of going backwards and forwards (and many tens of comments!) over what information is needed to answer which questions to set up which devices?!

      Delete
    3. Very possibly, though not sure I want a call with Brandon right now. Basically just beed to know if PPPoE or what, then I can answer.

      Delete
    4. Neil, we had asked to do a Skype session (or similar) with RevK several times. Via twitter, email, and other means we requested this.

      Instead we saw continued tweeting and blogging about the issue, and were forced to interact with RevK on twitter and this forum.

      Often with him bashing us, and sometimes resulting in him using profane language against my team and me, and then threatening censorship of my posts here.

      Anyways, we're going out of our way now to help him try to solve a bug on his product.

      Currently this issue only occurs on the Firebrick, when IPv6 is enabled on it, and does not occur on Cisco, OpenWRT, pfSense, etc.

      Thanks,
      Brandon

      Delete
    5. We used irc, it worked very well, got the ER set up quickly.

      Delete
    6. Ah, some of my posts are showing up now. Wonder how many will be blocked by RevK.

      Did you see my post on details of the connection RevK?

      Delete
    7. I have not deleted any posts of yours that I have approved and published. I say you say native IPv6. We as an ISP do native IPv6 which is of course over PPP. People also do IPv6 via tunnels, and I am glad that is not the case as that is way more complicated.

      Delete
  27. I still have no idea if Brandon is using PPPoE or a straight Ethernet with DHCPv4, RA/SLAAC, or if he is using L2TP or what. Shame, as if I knew I could tell him exactly what config he needs and where. Oh well.

    ReplyDelete
  28. I'll continue with testing here for a while.

    ReplyDelete
  29. So are you still deleting my posts?

    I had answered your questions and provide a summary.

    ReplyDelete
    Replies
    1. I approve posts, but continually posting the same list of test cases is not helping anyway and I advised you of that in advance.

      Delete
  30. And it looks like these posts are gone now.

    ReplyDelete
    Replies
    1. I did not delete any post of yours which I had approved and published.

      Delete
  31. So are you going to answer the questions? Or will you just keep deleting my posts?

    ReplyDelete
    Replies
    1. Third time now - I did not delete any post of yours which I had approved and published. You do like repeating yourself.

      Delete
  32. We're asking because it didn’t work out of the box, like he said it “should". It doesn’t look like he’s going to help us figure it out, so it’s staying off.

    It sounds like you are not interested in our help. And you deleting my comments here about the status of the issue - and what we've found - makes clear you do not have interest in allowing the status here to be shared.

    Thanks,
    Brandon

    ReplyDelete
    Replies
    1. So the long post covering two different scenarios (because you refused to actually say how the internet was connected) is not answering. And the email saying the same is not answering. This really is getting silly now.

      Delete
  33. We told you it’s just native DHCP6 IPv6, and normal DHCP IPv4. Probably the most common thing in the world. Connected over Ethernet.

    Again - the issue here only happens with Firebrick when IPv6 is enabled - and does not happen on other routers.

    So we need to get firebrick running IPv6 in our lab (which is in the US) if we want to help you fix your issue w/ the Firebrick.

    Thanks,
    Brandon

    ReplyDelete
    Replies
    1. Native IPv6 over what? Ethernet or PPPoE? That did not say. That was the main thing I was trying to find out, and no, IPv6 over Ethernet as an ISP is not the most common thing in the world. Indeed, we have not encountered it at all before selling to many countries - (almost) everyone uses PPP (which is PPPoE on the Ethernet, obviously). Even FireBricks we have sold in to China use PPPoE.

      Lots of people do native IPv6 over PPPoE.

      Again this issue here only happens with Apple and Unifi.

      This is to help find the issue which has been reported with apple+unifi+firebrick, and been reported will apple+unifi+mikrotik and we are trying to help you (and would be happy to help apple) with that. You now have in your hands a case where you see this happen with a FireBrick. This should be good news.

      You have more tools to see the WiFi side than we do and a much greater understanding of the WiFi side. You are the experts on the WiFi, and we are not. I have no problem conceding that fact. I do not know how AP to AP roaming works at all.

      Now you have this test case in your lab that fails, which you do, as you said you do, you can see on the WiFi side how it is failing, I assume.

      I am especially interested to know that because, my understand is, that a roam between two APs does not involve any packets going to/from the gateway (the FireBrick in that example), and so I struggle to see how any gateway can cause the roam to fail. Do correct me if I am wrong - but is that the case? - a roam between two APs does not involve packets to/from the gateway, does it?

      If so, what do you see happening in the failure case?

      Delete
    2. Like I said, Ethernet.

      Yes, need to be shown how to get native DHCP6 IPv6, and normal DHCP IPv4 working on the firebrick. Out of the box this is not working.

      It seems it's never been tested because (as you, Adrian said) the UK (and EU) uses primarily PPPOE, and the US essentially never uses this (we use native IPv6, and really IPv6 in the US is rarer). So it's likely that native IPv6 is not working on the firebrick (and has likely never been tested, as again you, Adrian, said, the sales of Firebrick are primarily UK, and perhaps EU - and to his knowledge no sales in the US).

      So it's likely this has never been tested before. As there are no internet providers in the US (at least in our area) that use PPPOE, we need native DHCP6 IPv6, and normal DHCP IPv4 working on the firebrick to be able to help.

      Maybe want to answer Jeff's email?

      Thanks,
      Brandon

      Delete
    3. It gets an IP using RA and SLAAC on the WAN, out of the box - is that not the case here?

      It gets an IPv4 using DHCP on the WAN, out of the box - is that not the case here.

      It does not do DHCPv6 client on the Ethernet directly, only on PPPoE. If that needs adding, we can look in to that. But as I say, we have not encountered anyone that does Internet connections like that before, in many countries! Interesting that it is not the case in US - no it has not been tested in the US, and as it does not even try to do DHCPv6 client on ethernet it has not been tested anywhere else. It is not a feature. But one we may add.

      Saying something that is not a feature "has never been tested before" is not that fair.

      I answered Jeff's email the same as the details I posted on the blog, at 16:17, 30 minutes ago.

      Delete
    4. Having said that - whilst I am happy to work on getting this going on what you say is a "normal" US internet connection, and that would be good, I am not sure it helps with the WiFi issue. You said you re-created that. That would not actually need external connectivity even - when this fails you cannot talk to devices on the LAN even, where the gateway is not involved.

      Delete
  34. Also, to be clear 'not approving' - i.e. taking what I've spent time writing, and throwing it away - is the same as deleting it.

    There is useful debug information, and data, in my posts. And many of them are direct offers for help on this thread, to other members (members who have now directly emailed me as a result of your censorship).

    In summary:
    Issue only happens on firebrick router w/ IPv6 configured and w/ perhaps some other option.

    We are even trying to help you fix this on your router, despite all other routers we've tested with not having the problem.

    Now you have:
    - Deleted my posts where I am offering help to firebrick users, to you.
    - Used profanity against my team and I.
    - Repeatedly incorrectly blogged about the source of the configuration issue (which falls on the side of the Firebrick).
    - Failed to answer how to make the firebrick work via native DHCP6 IPv6, and normal DHCP IPv4. It doesn't work out of the box. Maybe never tested?

    Thanks,
    Brandon

    ReplyDelete
    Replies
    1. To be clear, I warned you in advance that if you keep repeating your test cases in your posts, I will not approve it. You chose to ignore that warning. Up to you.

      In summary:
      The issue only happens on Unifi APs - do you agree?
      The issue only happens on Apple devices - do you agree
      The issue has been seen using FireBrick and non FireBrick gateways - do you agree?
      There are many test cases where the issue has not been seen - do you agree?

      See - state some actual facts like that ^ and the "blame" for the issue looks different doesn't it?

      Now I have
      - Declined to approve a post that keeps repeating tests cases over and over again, and insisting, contrary to comments from third parties, that this only happens with FireBrick after I warned you I would decline to approve such a post
      - Cursed you once after you repeated such assertions lots of times on twitter - but hey - freedom of speech. I did apologise.
      - I have repeatedly correctly blogged about the facts of the matter. Interesting that you think it is a configuration issue. This may be progress.
      - Have answered your questions on the blog and by email in spite of the fact you would not answer my questions about the internet connection.

      Now, I am intrigued by the comment of it being a configuration issue. Can you elaborate?

      Given that an AP to AP roam does not involve any packets to or from the gateway, what configuration issue, exactly, do you feel could be the cause. Maybe if you know of such a configuration we can look in to it in more detail.

      Am I right that an AP to AP roam involves no packets to/from the gateway router? Is that correct? You are the experts here...

      Delete
  35. I was summarizing so that users who are new to this, know that this only applies to the Firebrick, not other routers.

    1. Disagree. Have seen this on other APs w/ such misconfiguration. Common actually, if DHCP, DNS, ARP settings are bad - you'll easily configure yourself into this corner.
    2. Apple devices have more intelligent roaming, so they are more sensitive to router misconfigurations - as it throws of their sophisticated roaming state machine. There are ways to get their roaming state machine logs, these help to show why/how the router can cause problems.
    3. Disagree - like I said (and you deleted the posts about) - roaming works fine with the following:
    USG
    ER
    Cisco ATA
    Linus (Ubuntu, Devian, OpenWRT).
    Mikrotik

    We are unable to repeat the issue w/ those... only able to repeat it with the Firebrick -and- only when IPv6 is enabled on the Firebrick.

    Also - we did answer the internet connection question multiple times (several of which were deleted, and now you claim - accurately, in the eyes of the casual observer - the we never responded. We did, and you deleted the post. Here is the response:

    It's just native DHCP6 IPv6, and normal DHCP IPv4. Probably the most common thing in the world. Over Ethernet - also super common.

    1) Need native IPv6 working (as roaming only failed in his environment when he had that on, there’s no point in testing w/o IPv6)
    2) Need traffic-shaping working
    3) Need port-reflection (pinning, whatever) working
    4) Need incoming VOIP calls audio working (two-way audio works on outgoing calls, neither direction audio works on incoming calls)

    So the assignment of IP addresses does indeed involved the router.

    On elaborating on the configuration issue - we are trying to help show you what the issue is, but we can't get your router to work properly w/ IPv6 - and need to do so first, before we can really dig in.

    Of course the gateway is involved in roaming. It has to get information via ARP to know the MAC address that is associated w/ incoming IP(v6) packets. If it doesn't do this properly, than data will not properly flow to the station, all sorts of annoying things will happen - and it will be intermittent (like you've seen) - as it will only happen when roaming and refresh intervals happen at the same time.

    So - like you've seen - you'll probably see some periodicity as to when this happens.

    Hoping this post doesn't get deleted (my other informative ones were deleted).

    Thanks,
    Brandon

    ReplyDelete
    Replies
    1. 1. OK we had a post from someone saying they had seen this on non FireBrick. I prefer not to call him a liar just yet, and hopefully he can clarify the symptoms.

      Even so, for config issues, surely the AP to AP roaming is transparent to the gateway - the MACs do not change? Would not such settings cause issues all the time, not just when roaming AP to AP? Are there config changes that can actually stop an iPhone correctly roaming AP to AP? What are they?

      Yes we have seen silly configs, people get subnets masks wrong, etc. That has permanent impact though, not just on AP roaming.

      2. As I say, you are the expert on the WiFi side, and the fact that there is a complex state machine does sort of fit - the phone gets in a state that is somehow ill - with the AP associated with the iPhone, and staying like that until wifi off/on or a successful roam to something else, but no packets flowing. I am more than happy to believe this is, at the crux of the matter, a bug in apple. I'd love to find a way to stop it happening even if a "work around" for apple.

      3. You list those test cases yet again, but they appear in this one blog post alone six times from you. I am getting fed up - keeping saying that list of test cases is not adding to the diagnosis, and again, if you include that in your post I will not approve it.

      I am glad you confirmed the issue is only with IPv6 enabled, that agrees with my findings.

      You just kept saying native IPv6. We do native IPv6 over PPPoE. Native as opposed to tunnelled. Saying native IPv6 was not really answering the question. I'll put that down to geographic norms for now. You don't expect PPPoE, we don't expect anything other than PPPoE.

      I did go in to some detail on the answers, so not sure why asking again.

      Now, yes, back to the issue, assigning an IP involves the gateway/router, agreed. But once the IPv4 and IPv6, and netmask/length, and DNS, and gateway are all assigned, surely that is it (at least for the lease time). The phone is then working. Only on an AP hand over does it stop (sometimes).

      Do you agree that at that point in time, for an AP to AP roam, there is no involvement of the gateway - no re-assignment of IPv4 or IPv6 addresses or credentials needed - no packets to/from the gateway?

      Now, you say "of course the gateway is involved in roaming". This is where the pcaps that Jeff sent disagree. He was clear that the gateway was not involved. Or so I thought.

      So I am confused. And maybe there is a clue here.

      iPhone to gateway does not need to re do any ND (like ARP but for IPv6) as the MAC of the gateway is actually in the RA packets for SLAAC, not an IP. The phone continues to know the gateway MAC and no packets need be exchange to find them.

      Finding the MAC of the phone - are you suggesting the phone changes MAC when it roams? If that is the case, then indeed that is an interesting case as there is no way the gateway would know that other than by timing out on existing ARP or ND cache entries. The iPhone nor the APs tell it that is the case do they? If the MAC is changing we would see this issue on every network with every router as caching IP to MAC mappings is normal, and indeed CISCO tend to cache for really long periods.

      So are you saying the MAC changes? If not, in what way is ARP or ND a factor here?

      If not, then the existing ARP/ND caches remain valid, and would continue to check again on timeout. If that was the issue, after a few minutes all would work again, which is not what we found.

      Are you saying the MAC changes when roaming?

      The failure case is a permanent lock up until wifi off/on or roaming to another AP successfully. It is not simply a brief period of no connectivity. Also the phone loses the IP being unable to communicate and decides to drop the IPv6 address even.

      Also, we have seen this when not even using DHCP but fixed IP on phone.

      Delete
  36. OK, here's the latest, like the underlying issue with the setting on the FireBrick. Shared this via email and twitter as well:

    "From the sound of the DHCP reply throttling he has in the FireBrick, it feels like that is the ultimate cause."

    I would recommend looking into the DHCP throttling setting you have on the Firebrick, and how these scale w/ numbers of users/roaming. This seems like it will lead to the solution to the problem you are seeing with the Firebrick + IPv6 + iOS roaming.

    Thanks,
    Brandon

    ReplyDelete
    Replies
    1. I don't see an email on DHCP throttling, when did you send it? Can you elaborate? It sounds interesting but as we have seen this roaming issue when DHCP is not in us, and not seen any DHCP issues otherwise, I doubt it is the cause.

      Happy to look in to any possible cause though. Will be interesting if you have found it.

      Delete
  37. Yes, it sounds as though there is a default-on DHCP throttling setting on the device.

    Do you see where this is? I don't personally have the device so I don't know where this would be set (or unset) - so I'd need your expertise with the device to figure out how to turn it off.

    Also, not sure what DHCP server that's using, but if it's ISC dhcpd (good chance) and it's not setting `authoritative` in the conf, then this could cause slow DHCP (one of the things we saw) upon initially giving out DHCP.

    Which DHCP server is being used?

    Thanks,
    Brandon

    ReplyDelete
    Replies
    1. Sorry, still at a loss, we do not know of this feature of "DHCP throttling", and we have not yet seen any email on you explaining it.

      It is not a "feature" of FireBricks and so has no setting to turn it on or off, we have no clue what you are actually talking about here. The email you references would be helpful. Maybe you can re-send it?

      It is not ISC dhcpd, no. It is FireBrick DHCP server.

      And, as explained, we did tests confirming the roaming issue happened when DHCP not being used by the phone at all - that was some time ago, so we are going to repeat that test to confirm.

      We'll let you know when we have managed to do that test.

      Delete
  38. So when you are disabling DHCP - are you also setting a static IPv6 address?

    ReplyDelete
    Replies
    1. No, still using SLAAC on IPv6 in that case. We have not tested static IPv6 address yet (assuming an iPhone can do that), but happy to try that as well.

      Delete
  39. Also, this is comment from engineering team digging into the Firebrick... said they saw DHCP throttling occurring. Maybe it is not a setting but happening unintentionally?

    Anyways, need to know if you set static IPv6 address when you did static IP - and how the static address was set.

    ReplyDelete
    Replies
    1. Ok great - but seeing the email where they say what they are seeing that they are calling "DHCP throttling" would be handy. You said you sent an email and we have not seen it yet.

      And no, this was, as explained, some time ago (years) when first investigating this and eliminating DHCP. We set static IPv4 on the iPhone and left IPv6 using SLAAC so no DHCP involved. The problem persisted.

      However, as I have said, we will be doing that test again.

      Delete