It has become a bit of a tradition for us now that, on a cruise with my mates, we spend maybe a day on some serious FireBrick development that is relevant to the cruise. In the past we have greatly improved the PPP stack. Last time we greatly improved the SIP handling of retries and duplicate packets and dropped packets (I had a working VoIP phone on my desk, and even got a junk call). Basically, the internet access on a cruise ship is, for want of a better word, "special". It can have horrid packet loss, and typically has 700ms latency but can be way more (as I type this I am seeing 2s). It creates problems for a variety of networking.
So this time, I have coded a special new feature on the FireBrick to make UDP in to fake TCP. The reason was that at one port (Guatemala) even UDP was being a problem.
Let's just be clear here first, this is not hacking in any computer misuse sense - we have paid for the most expensive internet. Not just the premium "unlimited", but the premium "unlimited" with steaming. That's right, "unlimited" somehow has a limitation of not allowing streaming! It specifically says it allows VPN but they are unable to say which VPN, and out-of-the-box IPsec on my phone or laptop simply does not work (even when it uses UDP). Oddly, unlike previous cruises, simple L2TP was blocked, meaning it had to be mapped to another random UDP port to work. But when even UDP struggled somehow, we considered TCP. Let's get that VPN, for which we have paid, working.
One idea was simply to open a TCP connection and push UDP packets through it. Not hard, but has issues. For a start, the entire latency / throughput has to be buffered on the firewall for that to work. But also, one dropped packets causes a backlog of all streams until received, as TCP is always in order. At 700ms+ that matters. So we wanted UDP behaviour but something that the ship would think is TCP.
Dumb idea (thanks Mike) was simply change protocol tag from UDP to TCP. After all the ports are in the same place at the start for both. Unfortunately even the dumbest NAT will look at the TCP flags for SYN, FIN and RST at the least.
So, less dumb, change to a TCP header with sensible flags (SYN on first, SYN+ACK on first reply, and ACK on the rest). But pack the otherwise lost 12 bytes (TCP uses 20, UDP 8), in to the SEQ, ACK, Window, and Urgent fields in TCP. That way, NAT can play with ports and look at the TCP flags but pass through the same data with no extra overhead. I am quite sure that would work on some NAT. I think FireBrick NAT would pass that with no problem.
However, we found that the ships system has a rather heavy handed NAT that not only changes ports but also sequence numbers (why?!). It also expects valid sequence numbers to be used in SEQ and in ACKs. Simply resending a SYN with a different SEQ or sending an ACK that is before the SEQ of the SYN caused a RST and dropped session. So we had to make the TCP look the part.
The answer was that the first packet was changed to just a SYN (and later updated to a SYN with window scale, just in case it cared). At the start we set a random sequence and store it, and for SYN send the SEQ to that minus 1. Only once we had a SYN+ACK did we consider the session properly started and we could send the UDP packets as TCP, moving on the stored SEQ for each to look like a legit TCP stream. On the other side, for a SYN, we generate a SYN+ACK response, and consider the session started. Any protocol over the top loses the first packet to the SYN, but as it is UDP it will resend, which is exactly what L2TP does. Obviously the MTU was dropped by 12, but we were working on 1280 to get through whatever shit the ship was using between us and dry land anyway. The ACK was then based on highest SEQ+length received regardless and so there is no buffering or resending - that is all handled by the VPN and ultimately individual TCP sessions over the VPN.
End result, after more hours than I would have hoped, it works. It is in FireBrick release1.53.025 Flint+ Alpha, as experimental. Part of firewalling rules allowing a protocol 6 to be set to 17, or 17 to be set to 6. Have fun. Likely to change some time, perhaps with option to try using the 12 bytes in TCP header to avoid extra overhead.