Friday, 31 May 2013

Calling SIP URI

I now have a SIP URI on my business card - it is actually shorter than my phone number by quite a bit :-) I am showing the phone number as well.

It is a bit of an experiment to see if people ever use it, and obviously is a bit geeky.

But it does strike me as progress - it is a way of calling me that does not involve a telco or OFCOM.

That means no Data Retention Directive records for the call at all.

I think we may want to do a wiki page on how to set up your domain and FireBrick to accept SIP URI based calls directly.

At the moment it is a tad messy - a real SIP phone tends to want to call numbers, and via a proxy, so I can get calls but can't yet call people back. I suspect we'll refine this a bit over time to make direct SIP URIs easier to manage.

All good fun.

Thursday, 30 May 2013

Watchdog annoying me

I should not watch watchdog!

They had a worrying article of banks freezing accounts for suspected fraud, for days, weeks, months.

If my bank did that I'd be in the branch asking for my money and if not given it, calling the police and demanding they be arrested for theft! No idea if that would work, but ultimately if someone has my money and will not give it to me that has to be a criminal offence one way or another.

Then they have a case of of Oyster cards in same wallet as contactless debit cards- saying you cannot have non-contactless debit cards. Do that to me and they would get a dispute, via the financial ombudsman, every single charge they make. Simple as that.

Grrr...

Unfiltered or not unfiltered

We do unfiltered Internet access - simple.

But is it?

We do BCP38 on all end user lines, but reasonably sensibly allowing any source IPs from any of a customer's lines including 2002::/16 equivalent IPv4 in IPv6 addresses, and so on. This is part of the definition of "Internet Access" in our terms. We think it is sane and is obviously "Best Current Practice". You get Internet Access for (and from) your IP address(es).

But what of traffic from the Internet to end users. Until now this has been totally unfiltered. This includes spoofed source IP traffic such as from RFC1918 addresses, and unrouted blocks, and traffic used for amplification attacks, TCP SYN flood attacks and so on.

We are, today, putting in place a simple additional source IP check on ingress from transit. It is simply that the source is routable. This covers a lot of duff traffic, e.g. RFC1918 source addresses.

We think this is sensible and a simple step to avoid some of the spoofed attacks. But we offer "unfiltered Internet". Is that right that we filter this. Do customers have a right to spoofed source traffic from the Internet to their line?

I am genuinely interested in feedback on this...

There is then a further, operational aspect. We have seen significant attacks since Saturday with TCP port 80 SYN floods with a variety of valid and invalid TCP options which appear to be designed to crash TCP stacks. They are working on lots of our customer to crash very old ZyXEL P660 routers where we have external access to administer the router. This is not a password attack, but an attack on the routers TCP stack and takes customers off line totally, sometimes for hours.

The source is a fixed single IP - possibly the real source or possibly the target or some reflection attack. So, with our new filtering we could black hole that /32 address which will now cause blocks to incoming packets from that address. We would use this specifically to mitigate attacks on customers during the attacks or the overall period that attacks are happening (i.e. may be several days).

But that is not unfiltered Internet is it?

It is a specific administrative and short term filter, but it is a filter?

So is that valid? Should we do it? We are talking about only specific cases for attack conditions. It is, by no means, a general means to filter stuff, and very very much not a URL filter.

Comments please?

P.S. Whatever we do - we aim to be completely transparent - if we do block an IP for an attack we will say why and for how long and so on...

Update: In light of some feedback I have disabled this source check. At least we have the feature now if we need to do something in an emergency to protect the network as a whole, i.e. during an attack.

Saturday, 25 May 2013

BACS/RTI hash

All employers that pay using BACS with their own service user number (SUN) have to include a hash in the data they send to HMRC RTI.

But what does this check? It checks no more than the submitter has access to BACS. The hash covers all sorts of data (sort codes, amount, etc) but all HMRC see is the hash and the amount (as confirmed by an FoI request). They check nothing more. They explicitly do not see any bank details!

To pass the HMRC tests, if you don't have a BACS SUN you do nothing, but if you do, all you have to do is make a payment matching HMRCs expectation and create the hash to submit. The payment can be to anyone (even yourself, which is perfectly valid as a BACS payment). The real payments to staff do not have to flag as payroll payments and so do not get seen by HMRC.

So, if you have any issues managing the hash and the payments you can fudge it making payments to yourself. Quite separately you can pay staff which may or may not match what and when HMRC expect. As long as the hash matches the RTI submission is valid.

The HMRC system for RTI makes it a nuisance to make adjustments at the last minute or retrospectively or even to commit some frauds. Well, it would, if the checking was not so trivially thwarted and pointless.

Anyone wanting to play the system for any reason, fraudulent or not, can do so. The only people actually caught out by this are those poor saps trying and failing to meet their requirements. Real fraudsters have no problem as it is simple for them to pass the tests for the hash.

So, how much has this crazy system cost? How much do HMRC pay VOCA? How much has BACS s/w changes cost? Payroll systems changes? Payroll bureaus? I am submitting another FoI for that,.

And what of companies inconvenienced by the changes even if we did not pay staff by BACS (we happen to do so), like us. Lloydslink pulled their BACS system as not HMRC RTI compliant - costing us a small fortune to re-work it all. That is a cost for this mad system and would apply to us even if we did not pay staff by BACS because we used Lloydslink for DD collections and were forced to change.

Why make a system that is so easily thwarted? Why make a system that is hard to code and comply with? Why make a system that costs lots of people money? Why?

SIP and NAT

NAT is evil. SIP and NAT do not mix.

But we have been testing - we put half a dozen different makes of phone on a simple NAT which just does the port mapping dynamically, nothing else, and which was running a 5 minute timeout of UDP.

First issue is detecting NAT. That is easier than it sounds - as SIP control messages all have a Via header saying what IP they are supposed to be from. If that does not agree with the IP it actually comes from, then some mapping has happened. The reply actually has details of the IP and port from which we received, so the requesting device can also tell if NAT was used if it wants.

So, we detect NAT on a REGISTER, and record that. We log not only the stated Contact to which to send the incoming calls, but also the IP and port from which the REGISTER came which we will use instead. That way incoming calls work.

We can do the same when we get an outgoing call, an INVITE. This means we can tell that the call leg is NAT either way.

The next trick is the RTP - the actual audio stream. If we know the call is using NAT, we wait for incoming packets and send out packets back to the IP and port from which it came. This assumes, as seems to normally be the case, that devices send from the port/IP to which they want audio to be sent. It is an assumption, but for the range of phones we tested it works.

The final trick, knowing a REGISTER was NAT, is to send a dummy packet every minute to keep the NAT alive. The RFC recommendation for UDP timeouts is a minimum of 2 minutes for a high numbered port like 5060. Some phones, on detecting NAT, also do this, or send an OPTIONS message.

Of course, there is a lot that can go wrong. NAT is always a pain in the arse.
  • The phone could detect it is behind NAT and decide to try and work around it in some way, messing things up
  • The phone could send audio from IP/port that is not those to which it expected to get audio, messing up our assumptions
  • The NAT gateway to try and do some ALG rewrite on the SIP, and not get it right, which seems to happen on some devices
  • The NAT gateway could have a low timeout of less than a minute, dropping the registration session
  • The REGISTER Contact could be some other IP and port than those from which the REGISTER is sent, which would be odd, but is valid.
There is another case, which seems to be what the Technicolor router does, is a proper ALG that is managing to handle the SIP and RTP well enough that neither side actually knows there is any NAT.

So, Home::1 customer with a Technicolor can use a VoIP phone without any real problems, or so it seems. We have one set up and it seems to work rather well! Of course, if you can find an IPv6 SIP phone (and we have a couple) then that can work without NAT.

It is, as ever, a huge load of work around that is not needed when things follow the basic design principles of IP, something that is easy for IPv6 but not so easy with a lack of IPv4 addresses.

And no, I have no idea how well any of this would work via CGN.

Unexpected night shift

Well, I ended up working over night, so it is nice that people on irc are saying "what action?" when someone said they slept through all the action.

In practice customers will have seen no more of a problem than a PPP restart or two during the early hours, but for some of us it was quite a busy night.

We did one thing wrong, over 4 years ago. It seems some code that was initially just a bit of a test, was then incorporated in to our TCP stack. The code did not have the same attention to detail that we normally expect. That was the mistake. Sadly mistakes can happen but we are going through the code again now to see if any other mistakes exist like this. I won't post the exact details here yet as there are a few people that still need to do a planned upgrade. A mistake like this would normally be picked up when it is written and tested, but once an alpha is put on to the live Internet you expect any issues to be picked up pretty quickly. It is one of the reasons we have extensive pre-release testing.

At around 00:36 there was a pretty concerted scan/attack that exploited this specific mistake. We really can't imagine someone targeted FireBrick kit specifically - if they did, then that is a sign we have got big enough to be a target, and I really doubt that. That means there is other kit out there with the same mistake. Interestingly, during the attack, which lasted until around 6am, we saw several unexplained "blips" which appear to be within BT's network, so maybe there is other kit that had issues as well or maybe BT were just doing some work. I'll be interested to hear if other ISPs had problems as the attack did not seem to be just A&A IPs. Even though the bug has been there for over 4 years, this is the first time we have seen the specific (invalid) packets that cause these problems.

So that was the mistake, what did we do right?

The FireBrick code is pretty defensive. For a start, the bug was picked up by the software watchdog. Had it not been, the hardware watchdog would have picked it up.

The watchdog caused the FireBrick to restart. Unlike many devices that take minutes to restart a FireBrick is back as soon as the ports negotiate, which is a couple of seconds.

The watchdog/restart causes the FireBrick to email the support team automatically. This is the default (you can turn it off). We are all used to network appliances freezing, locking up, crashing, and needing rebooting - but not FireBricks. If they crash we get an email and we look in to it as a matter of course. Even with hundreds of FireBricks out there we don't get many emails, and those we do are usually from people choosing to run test/alpha code. But the email includes details of what exactly happened, and normally allows us to pinpoint the problem within seconds.

So, at 00:36 we start getting crash report emails. We ended up with over 200 from FireBricks all over the country on different ISPs and networks. This is where we end up working all night!

We quickly identified that there was clearly a problem, and the crash logs pinpointed the cause. We had just released a beta version that was being tested ready to be made a factory release, but the crash logs confirmed that it was not the cause. All versions of code out there were crashing, and all current makes of FireBrick. We found the cause in the TCP code, made the change, with two people reviewing the change carefully even though it was only one line, and made a new release which was tested and issued by 04:11.

Given the severity of this issue and the fact that the attack was on-going, I made the decision to release this as a factory issue immediately. We had been running the current code without this patch on most A&A routers for a while so this seemed a safe bet. We upgraded all of the A&A LNSs and routers (over 20 FireBricks), and by 5am we had everything stable. Because of the factory release, any FB2500 or FB2700 that crashed would immediately pick up the new software and be fixed. Over the next 24 hours all FB2500 and FB2700's would update themselves (unless specifically disabled in the config).

We spent the next hour checking the operation of all of the upgraded FireBricks in A&A, all seemed well, but by 06:30 we realised that just one of the A&A boxes - the main office firewall, was not quite right. It was working, but it had an issue with its Firewall session table. This was one of the recent changes added before the beta release, and is exactly the sort of thing we expected to pick up by having a beta release. The TCP issue had meant we promoted that to a factory release a bit too quickly. A new release was issued for FB2500 and FB2700 (the firewalling models) by 06:44.

Obviously the FireBrick announcement mailing list has been sent full details.

We are monitoring the situation now, obviously.

Whilst we did make a mistake, 4 years ago, I think we managed to get a lot of other things right over this and show that we can react very quickly when there is an issue, even at silly O'Clock in the morning.

I'll be on irc most of the day if anyone has any questions.

Wednesday, 22 May 2013

Another SIP challenge

I am finally sorting the new A&A voice server based on FireBrick SIP. The FireBrick work has been to create a reliable platform for high capacity call routing and this stage is using that to provide an actual service. It is a bit incestuous as the "real world usage" of the FireBrick provides a lot of feedback in to the FireBrick code design, but the end result is a good A&A service and a good FireBrick VoIP platform.

A bit of history...

First we used asterisk, and we did a lot of custom back end scripts and database work to provide the features of our voice service. This worked well but does not scale well. Asterisk is an open source (ish) PABX, and ideal for a small office, but not that it scales to a general voice service that A&A offer. These days I would, of course, suggest a FireBrick based VoIP PABX instead.

Then we did my own SIP platform from scratch on linux. This was based on the RFCs, but SIP RFCs are hard work, and the code has got a bit worn out now, but it worked well. The nice bit is that SIP allows a pool of registration servers, call routing boxes, and so on - all very scalable. Sadly it turns out that most phones refuse to take calls from a different IP than they registered to. This thwarted that plan and multiple servers collapsed to one. The scaling plan is to put numbers on specific servers. It works well, but does not handle NAT connected endpoints, and there are still issues with firewalls on some kit.

The new FireBrick system is good (it better be, I wrote it). It is a completely new implementation based on a second look at the RFCs. Some key decisions based on the experience of the linux based solution and, importantly, the way devices work, has meant we operate in a specific way. We are no longer working as a relaying proxy but as an endpoint. i.e. devices make calls to the FireBrick and the FireBrick makes calls to devices, and it joins them together at the audio level. This solves many problems, and also means the audio is always to/from the same IP at the FireBrick end which helps firewalls and may allow some degree of NAT handling.

A key feature we are using is that FireBrick SIP works with IPv6 perfectly, as you would expect.

It also avoids all sorts of issues with switching the audio feed. Previously we passed on all SDP renegotiations end to end, but this caused issues as it meant changes at the RTP level in terms of sequence and source ID, and timing, as well as IP and port and some race conditions. This sometimes broke things. The FireBrick generates the audio stream with one source ID and sequence regardless, handling changes, call transfer, re-invites, and even tone generation seamlessly as one output stream. We should not have to go to such extremes but doing so creates the most reliable calls. It also means, importantly, that we can see a loss of media as we are always in the media path allowing calls to close when no BYE is received (kit reboots, stuff disconnects, etc) and so not incorrectly billing calls.

The FireBrick also does recording. It allows a call leg to be tee'd off to a separate SIP connection which can be a simple SIP endpoint (like asterisk), but if the endpoint claims to handle stereo a-law then the tee'd off call puts each side of the call on a separate channel of a stereo call. We have a simple linux endpoint that does that and emails the calls, and we are basing our A&A call recording on that.

Most of the work on the A&A side is RADIUS server based. The FireBrick allows all calls, registrations, and so on, to be validated by RADIUS. This allows a pool of servers to handle call routing based on our database back end, and to handle the call logging and billing.

One of the key things, on re-reading the RFCs, is a new way to handle scaling of the service. The RFCs describe a really useful concept of a redirect server. The idea is that registrations and calls go to that server which does not really do calls at all - it just responds with 302 redirect messages telling the caller where to connect to. This means we can share the customers between servers, and take servers out for maintenance and so on.

Sadly we have, again, been thwarted. Whilst the SNOM phones we tested have no issue with this plan it seems a lot of SIP devices get confused and assume the 302 response is an error and give up, or pop up a "retry" box. The carriers that handle our inbound calls also don't like 302's. So yet again a key part of SIP design that would allow elegant scaling and redundancy is screwed. Why do I even bother? FFS.

So now we are testing using DNS changes to manage pools of registration servers. At least we can have pools of RADIUS servers, and call recording servers and so on as they are no longer in the SIP proxy path.

But we are having fun doing SIP/NAT testing. We want to see if the FireBrick code can work where there is NAT. We have done some testing today using the Technicolor routers we ship as standard on Home::1 lines. To our surprise the ALG in the Technicolor is not bad, and means our end does not see anything odd or NATty. It kind of "just works" which was a slight surprise. We have tested against SNOM and Gigaset kit so far.

We'll be testing with non-ALG NAT devices soon to see if we can make them work. When we are happy with all of the testing we can move customers over to the new service which will be more scalable and reliable.

Regression testing

Thursday, 16 May 2013

Monopoly collection

It is slightly odd to have more than one monopoly but I seem to be making quite a collection now, and somewhat out of the blue Mikey has just got me an original John Waddington Ltd (pat. app. for No. 3796-36) 1930's monopoly board, in original box with original tokens and dice and instructions.


I am impressed! Thanks Mikey.

The rules are very slightly different to the modern game, but not fundamentally. They are, if anything, slightly clearer. They don't, of course, have any of the silly variants like "fines collected and paid on landing on free parking", or "must do one turn of the board before buying property" which are all too common.

Interestingly they do have a separate little "Q&A" for the rules which clarifies a couple of points I did not know. You can sell (to the bank) the houses or hotel on one site without selling the other houses on other sites in the group - the "even building" rule only applies to building not selling! You also cannot sell the logical 5th house that is a hotel leaving 4 houses left - you have to sell the hotel and separately buy houses at full price, having only got half price back for the hotel (half of cost of 5 hours).

Oh, an unlike the british Bletchley Park set it does, of course, use POUNDS.

Very cool!

Saturday, 11 May 2013

One day, technology will "just work"

I have to admit it is getting simpler, but sometimes technology does not work so well.

James has been watching the F1 Qualifying on iPlayer in Sweden, as you do, and iMessaged to say it was worth watching. Sandra still has no idea how to get such things on TV, so asks me. How to make me look like an idiot...

So...
  1. Sky box, on demand, catch up, sky sports: "The F1 show". No sign of qualifying. Grr
  2. Sky box, on demand, iplayer, BBC one, today: Some breakfast show. No sign of F1. Grr
  3. TV, iplayer, channels, BBC one, today: F1 Qualifying. Yay! "Sorry there is a problem". WTF?
  4. iPad mini, iplayer, bbc one, today: F1 Qualifying. Yay! Plays! Woot! Small screen though.
  5. Finally: Air play to AppleTV to play on real TV
Sorted. And almost simple... :-)

Just start to watch and get a call, someone needs a lift, so on pause for next half hour. Grrr. If only families were as simple as technology...

Friday, 10 May 2013

New VoIP platform

We have an existing VoIP platform which has various features and which has evolved over a few years. It has registered SIP phones, relaying to configured SIP endpoints, multiple "also ring" targets with delays and all sorts. Works well. I coded it all on linux from scratch.

We are working on moving to a new platform, whilst retaining the features of the existing system. This is going to be fun. I blogged on code wearing out before - well this code is a tad dusty to say the least.

The new system uses FireBrick VoIP platform, which, from a VoIP point of view, is completely new code. It is a lot better than my previous linux code. It works well and we have been using it for what?, a year, in the office.

The newest code works by using RADIUS to make call routing decisions. This is really good from a redundancy and scalability basis as it allows the call servers(s) to talk to multiple RADIUS servers to find the answer. If there is a problem with one then within hundreds milliseconds they are trying another. There is even an RFC covering the SIP based authentication logic (http auth).

The FireBrick code is pretty much ready, but I expect changes when I am using it "in anger" on the new system. The plan is to mimic the current operation of "speechless" by providing suitable RADIUS replies to the new "voiceless" system. As I type I am already pondering a few minor tweaks.

Once I have this code sorted over the next few days, we can get traillists on board. Even so, we are missing a key feature which is call recording, so that is the next job on the FireBrick side. I am very keen to keep something of a unique feature in that we make stereo a-law call recordings.

There are a lot of advantages to moving to a new system. The separation of control and authentication logic from the actual call routing will help reliability and scalability. The new system should easily allow multiple servers at the VoIP level, RADIUS level, and even for call recording and voicemail. The newer FireBrick SIP code is nicer and we are hoping we can even make it work with some styles of NAT (yuck).

The challenge is the current "speechless" code is all integrates VoIP/SIP and routing decisions. I have to understand my code, and make new code for my RADiUS servers. It is not too bad, but the code is a tad old and grey, so will be a fun challenge making it new and shiny.

Nose back to the grindstone over the weekend. That is, assuming the whole idea of a VoIP service is not screwed up by OFCOM.

Banks bouncing shit

How hard is it really?

If you are going to bounce something because it is over limit, bounce it. Or don't.

Dont fucking show it over limit on the on-line banking - let me transfer money from another account to cover it - and then, later, still bounce the original damn payment leaving the account in credit but payments bounced.

Arrrrrg!

Back to finding banks with clue...

Bank with clue?

I am looking for a bank that can provide me with any sort of real time electronic notification of incoming "faster payments".

Basically, I'm trying to find a way to make it easy for people to pay us for things using FP without the current manual checking process with our on-line banking.

I am not getting any where with google. So anyone got any clues?

I don't mind how it notifies: email, text (to and 01/02 number), SOAP/XML, or what, as long as we can, in some sort of real time, extract the details of incoming payments as they happen.

In an ideal world it would be SOAP/XML such that we can even provide the response message or even reject an invalid payment.

I have various business ideas that could make good use of this, including pre-pay data SIMs and so on.

Thursday, 9 May 2013

Future of voice calls?

Having made some suggestions of how OFCOM could tackle number allocation issues I do wonder where things have to end up on voice calls in the long run.

I am seeing a massive reduction in voice calls myself. People call to chat and gossip, but that is moving to facebook. People talk to businesses, but that is moving to web sites. Businesses talk to businesses, but that is moving to B2B systems and echat and so on. I am hard pushed to find a reason to talk to someone on the phone myself. Sometimes I want a chat with a friend or relative which is more face to face but I cannot be there - so I FaceTime them - we used FaceTime to show my parents their new great grandson. Most voice calls I have these days are junk calls ringing me and getting abuse.

So what has to happen to voice - long term?

In my opinion the long term for voice calls has to be that it becomes just another IP based protocol. We already use complex protocols for web pages, email, and much else. Voice can work over IP (VoIP/SIP). What it means is that a physical phone line becomes just a very restricted type of Internet access that only talks voice. The numbering becomes just a matter of DNS and domain names - something you pay for as a way of indexing your contact endpoint just like getting a co.uk domain for a web site.

It does not mean it becomes "free". Like any protocol over IP there are some costs, but many Internet access packages work with a large amount included for a fixed fee. To be honest "voice" is no longer the bandwidth hog (that's video, iPlayer, and so on). I send larger emails than the data in phone calls I make, without a thought, or a bill. As I record all my calls, I also send emails for each call I make/receive as well - it is good that emails don't have similar interconnect settlement fees!

Eventually, the idea of a physical landline that only talks voice protocol to the Internet, and not a general IP access, will seem strange and unnecessary, like faxes and telex. I wonder how long it will take.

The problem is, of course, a huge industry all over the world with a vested interest in considering "voice" to be a special protocol with special regulation, special number allocation controls, special interconnects and lots of money. To telcos the standard idea of "settlement free interconnect points" that are common in Internet terms (like LINX) is an abomination. They will not want this. But ultimately it will happen without them - we see that with iMessage and FaceTime now, traditional SMS and calls that bypass the telco monopolies.

They move with the times or they are left behind.

How OFCOM should handle UK number allocation

There is an issue with how they manage numbers, and areas running out. How would I solve it?

I think the only long term answer, which also addresses the "porting" nightmare, is to have a central per-number database. This means when any call is made by any telco they have to check the database to find where they send that call. The destination could still be an old fashion SS7 "point code" or whatever. The mechanism can be DNS, and have multiple servers and caches and even private intranet as used by mobile operators for things like GTP. DNS is such a well proven system that works for the Internet and works for a lot more DNS lookups per second than telephone calls per second.

A central database has several advantages:-
  • Allocation on per number basis removes any wastage
  • Porting is simply a database update and does not involve original telco in call routing
  • The database can have a few critical extras such as emergency services contact data and TPS/FPS flag
  • Numbers can be charged for per number not per block, which manages number hogging, but scales costs to customers so small telcos are not disadvantaged.
There are, however, problems with this idea - the main one is "how do you get there from here", and also "the existing telcos cannot cope". It may also mean that there are "golden" numbers that cost more, sadly.

So, here is the idea of how you get there from here:-
  1. Define a simple number allocation API and DNS style lookup - try avoiding having a huge committee for this - it can be standard enum DNS, and simple SOAP/XML API.
  2. Get a company to run the database as a pilot - there are a lot of companies that could do this. It is not rocket science.
  3. Allocate a block, as a pilot - perhaps in a conservation area, but maybe in 03 or some such.
  4. Data fill the block conventionally with a point code of a company that can handle the DNS lookups and forward correctly to the real endpoint (maybe same company running database)
  5. Define an extra high interconnect cost for routing the block via that legacy gateway
  6. Make it that legally the numbers are per number designated to the telco even though one telco is handling the legacy routing for the block
  7. Make it that legally the database is run on behalf of OFCOM and owned by OFCOM, just subcontracted
  8. Don't make it expensive for telcos, even small telcos, to access API and DNS - ideally that should be free. Just charge per number allocated.
This means telcos that do nothing will simply data fill and route conventionally, but be paying extra per call. That gives them an incentive to do something longer term. Companies that upgrade to handle the DNS style lookups can route directly and pay normal (low) interconnect rates. Any telco can get numbering that will work, on a per-number basis, and without the usual long delay for data fill (though they will have to data fill per number themselves for incoming calls).

Run pilot in stages:-
  • One block for testing everything
  • All conservation areas for all remaining blocks in those areas
  • All areas for all new blocks
  • Changing some existing blocks over to the new system
  • Changing mobile operators to same system, so same porting rules
There are a number of small telcos with a serious interest in making this happen and who would provide consultancy, specifications and test systems for OFCOM for a very reasonable rate, or even free.

Recruitment agencies

I get loads of email from recruitment companies trying to get me to hire people. One particularly annoying bunch are Huxley Associates.

So I sent them a reply stating that they are not to send any more unsolicited marketing emails, as per section 22(3)(c) of The Privacy and Electronic Communications (EC Directive) Regulations 2003. They even replied to that email asking if I had previously been in touch about this, thus proving they had received it.

A couple of weeks later, and another junk email from them. So I am invoicing them for damages. Only £5 for now. It is more of the principle of the thing in this case. I have made it very clear that we'll pursue this through the county courts.

I am in the middle of invoicing them when I get a junk call from another recruitment company, think it was "ITT" or something, who had no clue they were meant to check with the TPS before making junk calls, and then refused to give me their details so I can sue them...

Ironically, this story does not really fit the definition of the word "irony", even though many would think it should.

I'll let you know if we get the £5 :-)

Wednesday, 8 May 2013

OFCOM destroying competition

At the moment there is a lot of choice for VoIP telephone services in the UK. Many small players adding value to simple telephony. There is competition in the market, which is good for everyone, and (I believe) something OFCOM are meant to encourage.

To give you an idea, here is what you needed to set up as a VoIP operator providing calls and geographic numbering:-
  • Technical and business experience - that is perhaps "the tricky bit"
  • A VoIP server, maybe £1000 for linux based box, and a few hundred a month to have it hosted in a rack on the Internet.
  • Blocks of telephone numbers, £0. Yes that is £0.
  • Hosting blocks of numbers with a larger carrier, £0.
  • Incoming call routing for those numbers to your VoIP server, £0.
  • Outgoing call routing to the PSTN via larger carriers, £0 (plus call costs).
  • Joining an ADR scheme (OFCOM requirement), few hundred a year
As you can see, whilst you do had to know what you are doing, and be able to afford hosting of boxes on the Internet, it is actually very cheap to start up as a VoIP provider. Carriers will usually host number blocks free as they make the interconnect revenue for incoming calls.

Small operators give a lot of choice and add a lot of value. At A&A we have things like Centrex short code dialling, ringing multiple phones at once (e.g. mobile and desk phone). hunt groups, call divert and transfer, live billing details for calls, and even call recording. Until recently we even integrated mobiles in the service. There is a rich variety of choice of different operators and features and prices.

There are two changes OFCOM have instigated. Remember, OFCOM are meant to be acting in your interests and encouraging competition.
  • Charging for numbering. Whilst a pilot this year, rolled out to just geographic numbering this is looking like £65,000 a year for the smallest allocation in each area (650 areas of 1,000 numbers). A huge entry price to the market
  • Reducing interconnect fees on geographic numbering calls. This is making the hosting-for-free model less viable for carriers and looks like we may have to start paying for incoming calls.
The numbering thing is huge. It makes any small VoIP providers business model break badly. It is a charge for something we already have, not just new blocks or new telcos. We are trying to work out what to do. We may have to give the blocks to a larger carrier as the larger carriers with more paying customers may find it viable - but these blocks are no longer an asset we could sell, but a liability we are trying to avoid. So this may not be possible and we may have to hand them back to OFCOM. That will stop service for all customers in those blocks even if ported out to another provider already. Clearly this is not in the consumer's interests.

The interconnect pricing issue is also pretty huge. If we had to pay for incoming calls, we could not really sell a service. With the numbering charges, we'd just need a heck of a lot more paying customers, but charging for incoming calls is not going to be something our customers would tolerate, or something that is easier with more customers.

We have seen a model where customers have to make enough outgoing (chargeable) calls to balance the incoming calls and hence not pay for incoming calls. This may work, but it ruins competition too. Instead of us, or a customer choosing a carrier for outgoing calls (on a per call basis even) based on price, reliability, quality, etc, they will find they are tied in to a carrier that provides the incoming calls to balance the calls out. This breaks competition and ultimately means higher call prices and locked in contracts.

When speaking to OFCOM they seemed to treat numbering like radio spectrum, as a limited resource that they should use charging as a means to control. Of course, with radio spectrum, you don't suddenly get a huge bill for spectrum you already have allocated, but importantly you can't just add an extra digit to radio spectrum and have lots more, like you can with numbers.

I have pondered how I think they should do this. So here are some of my musings on this. It is a bit late, as OFCOM seem uninterested in these ideas and determined to ruin the market...
  • The interconnect rates have to be at a level that a carrier can host numbers and pass on inbound calls for free. Any lower and you have a serious issue. That may be possible at lower rates than previously, but they may have gone too far. Why are they not allowing the free market to set rates anyway?
  • Numbers could be allocated in smaller blocks. Apparently the larger telcos can't handle this. So, simple, make anyone that can't handle smaller blocks pay a penalty until they can. Don't make the small telcos that would be happy to have smaller blocks and not hog lots of numbers pay extra. If we could get numbering in 100 number blocks the bill at current prices would be nearer £6,500 a year which is expensive compared to £0, but more of a viable model.
  • Ideally they should allocate blocks with a DNS style (possibly actually DNS like enum) routing to a point code, and an API to allocate (and port) numbers on a per number basis. We'd be happy to design such a system for them! Charging for numbers would not be ideal, but charging only for actual in-use numbers would be a viable model for most people. This would also solve the serious issue with porting that it relies on the original number block owner - porting would be a change in the OFCOM maintained numbering database, that is all.
  • Make more numbers. Simply do number changes in the congested areas, as was done in London. Even go for longer numbers. It takes time, and is some disruption, but is how it was always done by the post office and BT.
  • Allow variable length numbers as used in many other countries. This solves a hell of a lot of the problems with number allocations.
The variable length numbers thing is quite nice. I have seen cases where a town would have, say, 6 digit residential numbers, but large companies have 4 digit "main number", which is nice, but 7 or 8 digit DDI numbers within the company. Areas with a high demand for numbers would simply have longer numbers. It would allow areas to retain their familiar area codes even, and still allow local dialling.

Anyway, over the next few months we should be able to find out the position taken by each of the carriers with which we work, and understand whether we can make any sort of business model to continue providing VoIP. I suspect that, even in a worst case scenario, we'll still offer some VoIP services - but possibly with incoming charges or "balanced usage" terms, and possibly using carrier's numbering and not our own. I really hope we don't have to hand back our numbering and kill off the numbers in use by our customers. Talking to other small telcos this seems like it may be happening a lot. So, goodbye competition in the UK. Well done OFCOM.

Sunday, 5 May 2013

Terrorism

When will people get the clue that terrorists are just criminals like anyone else. They need action taking against them, but no reason for it to be out of proportion to what they do. They are lower risk of hurting people than car drivers by orders of magnitude.

I have no idea if this story is true (see image). But the concern is that stories like this, and similar stupid stories of abuse of terrorism laws, are serving to change the way people behave in their daily life. People are scared of speaking their mind. The fact that other laws like the Telecoms Act were used to attack a tweeter is all part of this crap.

The fact that there are groups of people scaring the population in to changing the way they live, creating fear and intimidation, means there are terrorists - people creating terror. Sadly it is the governments and law makers and police that are causing this fear. People are afraid they will be victim of terrorism laws or interpretation of those laws by police. So they (the governments and the police) are, by any definition, the terrorists - creating the terror - no? Or have I misunderstood what terrorism means.

I am not at all afraid of being the victim of some bombing - I know the stats - it is not going to happen. But I worry about taking my big camera to London for fear of an argument. I feel sure I'll win the argument (or get a good blog post out of it), but that feels to me like a real risk of something happening. It is probably also statistically unlikely, but that is far more of a fear for me than being bombed. So who is creating the terror there exactly?

I thought that America was one place you could speak your mind - freedom of speech. Is that no longer true?


Friday, 3 May 2013

OFCOM stitch up

OFCOM have advised us today, for the first time, that, since 1st April, we are running up a bill for some of the telephone number blocks we have in some 30 area codes.

This is a slight shock - whist I try and keep a rough eye on some of the consultations it seems a document published on 27th March states a pilot scheme starts 1st April, yet instead of contacted anyone affected in advance they have left it until a month after it started to mention it directly.

It may surprise people to know that telephone number blocks don't have a cost, or didn't until now. Only telcos can get them, and we have a block in all area codes (which is a lot). The blocks are at least 1,000 numbers and some are 10,000.

OFCOM have decided to charge 10p/year/number for some 30 area codes. If this was just for numbers in use for which we had paying customers it might be viable, but it is not. It is for the whole block, and we can't get under 1,000 in each area. The 30 areas are a pilot, and I cannot see anything to stop OFCOM charging for the whole country if they want to screw up small telcos.

As a small telco getting even the smallest block for each area to provide a national service, that is about 650 area codes each costing at least £100/yea, so a minimum of £65,000/year for the smallest of telcos offering geographic numbers if OFCOM did all area codes. In practice, for many areas, we have 10,000 number blocks. So this is scary if it grows.

It is a pilot now, but we have to tell OFCOM what we think. If we allow the pilot to work it will grow to more areas and screw up a lot more telcos.

But this is a fundamental change!

Why? Well, until now, a telco had no reason to hand a block back. Even a telco that was sold or went bust could sell its blocks and paying customers to another telco. A block had some value if only because it was already allocated and saved applying for it.

Once blocks have an ongoing cost then they are a liability. This means, for the first time, it is commercial viable to hand blocks back to OFCOM. This is new. This is serious.

The main reason it is serious is the way numbers work, and especially ported numbers. If you have number from a small telco, even if ported away, if that block is handed back your number stops working. Simple as that. It is a stupid system, but as long as numbers were never handed back it sort of works. OFCOM are making a system that will mean numbers are handed back.

We are looking in to what to do. We may be giving customers 30 days notice on some numbers, and porting away will not help them. Any of the few customers that have ported away will simply find they stop working and nothing they can do.

One option is to stop doing VoIP at all. I hope it does not come to that.

This is somehow in the interests of consumers, is it? OFCOM?

English as a first language

I am crap at English! Yes, I was born here, and grew up here, but did not do well at English at school, failed all my English exams (I think you can fail a CSE), and have only really picked up any sense of grammar, punctuation, and overall "English" by experience though the years. Also, my spelling is not good, and my typing is worse. I am happy to be corrected :-)

However, I find some of the English of my suppliers a tad hard to follow at times.

This is an example that is better than most, and my reply, for amusement. Once again, the supplier had difficulty with replies that are not top-posted. Having used email and usenet for so many decades I have difficulty with replies that are top posted. Perhaps I am turning in to a grumpy old man...
AS suggested earlier your reply mechanism to our Emails is not in
the similar pattern and every time a new Email is sent which will
not give any history of the email Chain.

Also the format replied is very confusing and would suggest to have
a look on this please which will help both of us to be clear.
Thank you for the reply.

Yes, the format of emails is very confusing - you keep putting your
replies at the TOP of the email to which you are replying. This goes
against long established email and post etiquette and is very
confusing. Please stop doing it. Put replies under the text to which
you are replying so that the email can be read as normal, from start
to finish, and that context is clear (without having to search though
the email, or read it backwards).

I have, for example, a book here, and I find that I can read it from
start to finish, from the top of each page to the bottom. The author
does not put later things earlier in the book than previous things. It
is quite a common convention (in the UK at least) to read from top to
bottom and left to right. 
Cue: long debate on top and bottom posting.

Thursday, 2 May 2013

The rules of FTTC

FTTC (Fibre to the Cabinet) is the more technical term for services offered by many ISPs as "super fast". It is usually provided via BT wholesale who buy components from themselves (Openreach) to make a wholesale service they sell to ISPs. ISPs then sell Internet connections and the like. The ISPs are not really "re-selling" BT here, as BT just provide a link (ISP to end user). The ISP provides a lot more on top, not least of which is connecting to the actual Internet!

There is a handbook for the service BT offer ISPs, and it is written by BT. The FTTC handbook.

The handbook defines a few rules. These are rules for BT Wholesale customers not rules the ISP has to pass on. At A&A we'll pass on anything useful like these rules to our customers where we are able to enforce them on BT.

One of the concerns a lot of end users have is what speed they will get. FTTC allows much higher speeds than conventional ADSL services. BT operate a speed forecast (a line and address checker). This allows end users to see what speed they can expect. It is only a forecast, but is based on the line length and other details BT have.

There are some rules in the FTTC handbook:-

1. If the install does not meet the forecast speed the ISP is given option to re-appoint or cancel. There appears to be no option to accept the slower service! I don't think I am misreading this, honest.

2. If an installed service falls to less than 50% of original forecast, or drops 25% in 14 days then BT will investigate, and if they can't fix within 90 days of install the install can be cancelled and refunded. Obviously an ISP may have had lots of other costs in providing a service to an end user during that time and in reinstating an ADSL service. Thankfully we have not had a case of applying this rule, yet.

Now, the first of these rules is odd. It really does seem to lack an option for accepting an install at less than the forecast speed. Not 95%, not 90%, not 50%, but the forecast speed. The line has to do at least that at install or BT are not meant to complete the install. They are meant to ask the ISP if they wish to cancel or re-appoint. However, first BT error here is that we have never seen them do this, as far as I know. Even where a line is slower, they just close the install.

Right now, we have a customer with a forecast of 29Mb/s and synced at install at 13Mb/s. Not good. BT are giving us the run around. They closed the install. So we are insisting the install is not complete, it can't be. Indeed, we are saying we want a re-appoint of the install, every day if necessary, until they manage the forecast speed, as per the rules BT wrote.

We have been pushing hard on this to get it fixed. BT have sent lots of engineers, but seem uninterested in actually doing anything, even properly investigating it. The latest is BT are saying these rules, the FTTC handbook, are only a guideline, and, basically, if it suits them, they will ignore them.

How can the industry work when the main carrier won't even follow the rules it wrote?!?!

I think we need to see if ISPA can investigate. Given the pressure OFCOM give ISPs on broadband speeds, I would hope OFCOM can investigate. How can ISPs be expected to offer end users any sensible terms and manage expectations when BT are happy to mess ISPs around like this.

At the end of the day BT did not have to write rules like this - they could have said "you get what you get, tough!" but they chose to define rules, and then conveniently just ignore them!

Oh well, we'll see what happens. Happy to discuss with with ISPA or OFCOM if they are interested.

Update: It turns out that Bt did not reply to my emails on this sooner as they are only capable of reading email replies if they are top-posted. I have tried to educate them. But they claim they are following the handbook, so I have asked them to confirm when and how they offered the choice of cancel or re-appoint. Should be interesting.

Clue bat!

I'd like to thank Alex for finally getting an actual clue bat for the office. It is in safe hands, and we hope we do not have to apply it literally to anyone.

Wednesday, 1 May 2013

Snowed under

Working through the night to sort the code is tedious, but paid off.

The issues I found in my RADIUS code were subtle. And were down to state machines and some assumptions in original design (valid at the time). The error needed a complex combination of enough customers on line and the right chances that one of them is a wholesale (L2TP relay) customer at the right point in time. It involved timing of the RADIUS and L2TP and thwarted all my off line testing.

I plugged several loopholes, but this is a fun bit of software design logic. I have made a number of changes that are "defensive coding"...

My older code made assumptions and was thwarted by changes due to the way the new RADIUS code worked that undermined those.

The new changes now are to make the code safe for those assumptions. Make the code not make them and work regardless of state change logic that I may dream up in the future.

It is like assuming that there are other developers on the project that are "out to get you" and making sure that nothing they can do will break things. The irony is where that other developer is another instance of yourself, a few years later, making new code for good reasons and forgetting the constraints on the old design.

The end result is code that is safer for future developments, which is good. It is technically less efficient, but I still find it hard to get core CPU usage over 1% on the most heavily loaded LNSs, so I really did not need to bother.

Anyway, end of the day, I have been snowed under. I am no longer writing more broken lines of code than working lines of code, and I have a bottle of SoCo that is a lot less full than it was...

Now I have to catch up! We are rolling out the updates. I have a huge list of stuff to do that I have been putting off. Some odd RADIUS handling on the servers (linux) I need to re-check. Summer time upsetting that! Much still to do.

When I am finally caught up I can progress the use of this new, well tested and robust, RADIUS code for VoIP server use, at last. I am several weeks behind my plan on this, but it is all progress.

The new VoIP system will, when finished, be ready to replace out core VoIP services. That will be another challenging time and lots of advance testing. It will create a new platform that can scale massively.

On top of all of that we finally have a new logo :-)

It would help if the cat did not leap over my desk, take out a mug of soup, and make my computer all sticky!

Shame