As you will know I have spent a long time trying to understand the issues we see with the Unifi access points and roaming between them using an iPhone.
A&A sell these, and some of their PoE switches as well. We may start selling more stuff in due course. Overall the Ubiquiti stuff is pretty impressive and there is an increasingly large range of devices. The WiFi is technically very good at the hardware level, and we sell in boxes of three even for businesses.
So it is important to us that they work. I also use them at home, and my family treat me as tech support (obviously) so it is important to me if I want a quite life. They were all round this Sunday - we had sort of cancelled Mothering Sunday for obvious reasons, but everyone came round and we had pizza and chatted. They all told me in no uncertain terms that the
WiFi here is crap and they even turn off on their phones and use 3G/4G when round the house. They all use iPhones. That really is a bad sign.
I myself spend a lot of my time in my office at home, but whenever I leave for the rest of the house I find I have to turn wifi off and back on. Though, technically, it is far from every time and can even be the odd day with no apparent problem, whilst other days I see many times. The problem is, as always, you remember the times it breaks.
This also makes testing hard - something changes and you watch it, and see you spend all day with no issues and think it fixed, when actually it is just intermittent, still, just as before.
I have an AC Pro and two AC LR in the house, and they are now on latest firmware. I thought that may have helped, but no. We also tried changing switches, and thought that had helped, but no.
The current state is that I have managed to mess with wiring enough in the house to actually have all three APs off a single Ubiquiti EdgeSwitch8 - one of their switches - so as to eliminate the switches as the cause of the issue.
Tip: Some of the Ubiquiti kit is still passive 24V PoE, and their switches are great as they support that, but you have to configure on the switch! It is not automatic as PoE normally is.
We also did tests with just IPv4 on the LAN, only for a few days, but that seemed to just work. This means the current thinking is that it is the IPv6 being present that is causing the issues. It could be some combination of bugs in iPhone, Ubiquiti, and even FireBrick code, for all we know. Reports from others that use this kit say no problems. We did a lot on FireBrick to try and eliminate that as the cause. However, with IPv6 on the LAN, even with IPv4 being static on the iPhone and no DHCP, it can still fail. Setting up DHCPv6 on the LAN does not seem to change things, we normally use just RA/SLAAC.
The symptoms are a sudden lack of connectivity when it roams. For a few seconds the phone may show the old IP addresses, but quickly switches to showing no IPs and then to showing the 169 auto addresses. Wait as long as you like, it is broken. You need to turn WiFi off and on (on the phone) to fix it.
Part of the reason for writing this up again is for the engineers at Ubiquiti - they are trying to fix this. Good news (though I seem to have to poke on twitter to get things progress, sorry guys). They sent me some switches and a router and gateway. Big thank you - nice to eval the kit as some of it we may start selling. We sent them a fully loaded FireBrick FB2700.
At this point the next stage is for me to try and create a setup using their kit as the gateway on the LAN and so doing IPv4 DHCP and IPv6 RA/SLAAC, and see if that breaks still. It is a pain as I cannot exactly replace my router as it is the office router. So I have set up a new IPv4 and IPv6 subnet for WiFi use. Not ideal, but will do for testing.
They, for their part, need to try and set up with a FireBrick to do the same. Can they make it break? Obviously I am on hand to help them set that up.
So setting up the Edge Router. It is a simple set up. No NAT. Fixed IP /24 IPv4 and /64 IPv6 on LAN with DHCP serving IPv4,and RA for SLAAC doing IPv6. On WAN is a simple IPv4 which can be DHCP client or static, and a simple IPv6 which can be SLAAC or static. Obviously need to set IPv6 DNS servers for RA on LAN.
So far I have managed to set up:-
- Firewall off
- NAT off
- Static IPv4 on WAN (a /24 for testing)
- Gateway 0.0.0.0/0 route on WAN, can ping out to internet
- Static IPv6 on WAN (a /64, obviously, from my PI block)
- Gateway for IPv6 on WAN
- Static IPv4 on LAN
- DHCP IPv4 on LAN
- Static IPv6 on LAN
- RA on LAN configured by ubiquiti for me
And I am stuck. So waiting on Ubiquiti at this stage. Suffice to say I don't think they are a threat to FireBrick as this is all pretty simple on a FireBrick.
No word on where they are with FireBricks. Obviously keen to help them test the other way around. To be fair, if this is either a bug in FireBrick in some way, or more likely, something we can work around by changing FireBrick in some way, I am more than happy to do the work to make that happen. We have implemented a number of "pragmatic" aspects to the way the FireBrick works (sometimes on a config setting so as to be "standard" by default) and I'd really like this WiFi kit to work...
I think best if I update this post as we make progress for a bit rather than new posts. Let's get to the bottom of this, shall we?
Updates:
- From comments, it is not just FireBrick, but is some rare combination of things clearly, and seems to be Ubiquiti APs and iPhones and "something" else.
- IPv4 gateway not working was user error, I mistyped as 0.0.0.0/24 for some reason
- Someone from Ubiquiti, in Austin, Texas, in the middle of the night, is working with me on this now.
- IPv6 gateway was not working as I was using the zero address in the /64 which the ER had assumed it can have making it a router on the WAN side, which is unexpected. I changed to the ::1 in the /64.
- Now wifi all on ER not using FireBrick, thanks to guys from Ubiquiti working in middle of the night. Roaming appears to be working, more testing to do. I am being sent a cap of working roaming as seen by ER, and will get same from FireBrick.
- We now have two interchangeable set-ups. Both on same sets of IPs as a separate subnet for my WiFi run as a LAN side of a router. I have the ubiquiti EdgeRouter set up, and the same set up on an FB2700. At present both seem to "just work" but as I say, this can take a whole to see the fault. I have lots of logging. One clue is that I am sure I have seen the iPhone re-do DHCP on roam, and the current testing (on both set-ups) does not do that - it just flips over to new AP basically seamlessly. So, just more testing for now. If both these "just work" we have to go back and see what else on the main LAN could be upsetting things in any way.
- This morning (Saturday), still no apparent roaming issues! This is using a FireBrick but on a separate LAN the same as the ER set-up. Again, if the roaming happens without involving the gateway router, no way the FireBrick can be to blame. If it is OK for a few days I look to swap back to main LAN and see if that shows the problem again.
- Sunday, still using a separate FireBrick as gateway, and have set up the second VLAN that was being used before on it. Still not failing. This makes no sense at all.