Friday, 21 July 2017

warning: comparison between signed and unsigned integer expressions

This is one of the stupidities in the C language and it bugs me because it would be so simple for C to just code it correctly. I'd really like a gcc option to do this!

When you store whole numbers in binary you usually have a choice of signed or unsigned. The signed version allows negative numbers but at the cost of the range of positive values possible.

For example a signed char allows values -128 to +127, but an unsigned char allows values 0 to 255.

If you compare them, using ==, !=, >, or < for example, the operation converts the signed value to an unsigned value and then compares.

Example

signed int a = -1;
unsigned int b = 1;  
if (a > b)
   printf ("a>b\n");
if (b > a)
   printf ("b>a\n");

This print a>b even though a is -1 and b is 1!

This is because -1, converted to an unsigned value, is a big number, in fact the biggest an unsigned int can be.

What pisses me off is that, even when C was invented, the code to make the comparison work would have been one check of one bit extra. Basically, whatever the comparison, you just have to check the signed value is negative or not before making the comparison. If it is negative that means it is not equal to the unsigned value, and is smaller than the unsigned value, so whatever comparison you were doing is decided by the signed value being negative before going on to do the comparison as normal.

To me this would have been a far more logical behaviour than changing the value of the signed variable by making it unsigned.

Thursday, 20 July 2017

More on pitfalls of redundancy...

Hindsight is a wonderful thing!

I have been having long discussions this week and today. Many have the benefit of hindsight.

For kit we have in Maidenhead we have two possible ways to connect to the world! One if via the local transit, a single point of failure link. Another if via multiple (well, soon to be) diverse fibre lines to different London data centres where we have multiple transit and peering.

Even before the second leg of our ring, one leg is a single transit and the other is several transit and peering links via multiple (pairs of) routers. And even that allows fallback via the single transit link, just in case.

The problem, as ever, is a partly ill link: one that seems to be valid for traffic but is not. We had that today.

Announcing their routes primarily via local transit could work, but transit back out to the world being local is more complex. We would be offering a less redundant, and somewhat specialised, solution...

So we have the issue of hindsight verses reality. Ongoing, a link with more redundancy is better. The last few days is was not...

So do we offer knee-jerk services that are technically worse looking forward? Or do we say no, this is shit that happened, and was only wrong in hindsight?

It is almost like the good old days, err...

Today we (A&A) had another brief outage impacting broadband, ethernet and hosted customers, and VoIP. It was a bit complicated as it was one side of an LACP and so probably half of things were working and half not, and it looks like pretty much all broadband went down.

It was an error on our part - the ops team have been working hard all week, and working with a consultant, to help us investigate last week's issues with the CISCO switches. They have done a number of changes (adding more logging, etc) and diagnostics during the week. At each stage they have to assess the risk and decide if they can go ahead or wait until evening or even over night. A change today to bring back one of the links between the London data centres (one shut down on Friday) so we can test it independently of the normal operation resulted in breaking the switch links. Even the consultant thought it would be OK.

I think I can elaborate a tad more on things we know. I am sure the ops team will shout if I have misunderstood. At this stage there are aspects of what happened that are still unclear. This means we are adding some "defensive" config to try and address possible causes for the future.

The main issue, it now seems, was that the BGP links to all of our carriers from all of our switches all failed at the same time. Yeh, so much for redundancy! These are private links on a separate VRF and not connected to other BGP. The BGP is with routers on the end of locally connected single fibre links (of which we have many), not LACP or anything complicated. So the failure has to be entirely within the cisco switches. We can almost certainly rule out hardware impacting all at once. Also, being on separate VRF and not seeing Internet traffic at all, it seems unlikely some attack from outside. This leads us with the possibility of some sort of unstable config on the switches, maybe something spanning tree related (I hate spanning tree), or maybe some BGP issue with routes received from carriers, which seems unlikely, but maybe not impossible. So there is a lot of careful review of things like BGP filters from carriers, and spanning tree config, and so on.

The "fix" was rebooting half a dozen cisco switches. On Thursday this worked, but it took some time to conclude that was a sane thing to do, when other options were exhausted.

As I am sure you can appreciate, just "turning it off and back on again", or rebooting the switches, really is a last resort. We have highly skilled engineers who spent some time trying to diagnose the actual issue before taking such a step, and that is one reason these issues can take some time to fix. Sometimes a reboot can fail to solve anything but lose valuable clues.

On Friday that worked too, again we tried to understand the issue first, and got a lot more information. The reboots seems to have triggered a second issue with one of the switches being stupid (as per my other blog post) and coming up in a half broken state. Rebooting that one switch again sorted it. It is almost unheard of to have two different issues like this, one after the other, and that really threw us as well.

A lot of this week has been understanding the way the cisco switches are set up in much more detail, and adding more logging, and updating processes so we have a better idea what to do if it ever happens again - both fixing things more quickly, and finding more clues as to the cause. It may be that we have mitigated the risk of it happening by the changes being done. We hope so.

Obviously this sort of thing is pretty devastating - I am really unhappy about this, and really sorry for the hassle it has caused customers.

As I say, it is not really like the "good old days" when BT would have a BRAS crash pretty much every day. These days we expect more, and our customers expect more.

So, please do accept my apologies for the ongoing issues, and my reassurance that they are being taken very seriously.

Adrian
Director, A&A

Wednesday, 19 July 2017

Github

I have now issued a number of projects on GitHub... All under GPL.

Here.

They include the current build of my alarm panel.

Bricking it!

Well, pictures are out on twitter now - the new FireBrick sort of exists - but don't go trying to order them yet. We have many more steps to take before we will have stock, some months (I'll have a better idea tomorrow). There are little details like EMC testing for CE marking, and so on, some of which could causes delays. And still, it is made in UK!

However, seeing as there are pictures, I think I should say a few words about the new FireBrick model, the FB2900. This should avoid speculation, at least. We are still selling the FB2700, and I am not in a position to say anything about FB2900 pricing yet, this is purely some technical comment.


That is not the whole box, with no SFP screen or light pipes, etc. But you can already see some of the changes.



SFP port

One of the most obvious changes is that we have moved back to a 5 port format, as we had on the older FB105 models. But the extra port is SFP. This means it will be able to take a normal copper Ethernet port, but also various types of direct fibre links. Apart from use in data centres, and one each end of fibre links between buildings, this is thinking ahead to the days of true fibre internet services in the future.

Power supply

Anyone that has looked inside the existing FB2700 model will see we have a completely new PSU design. The change in design has allowed us to make a variety of different PSU options.

We have an option for automotive (12V and 24V). This is far more complex than it sounds - really! Automative supplies allow for something called "alternator load dump", and high voltage spikes, and a range of voltages from the supply. They have a lot of safety aspects to consider as well. However, this allows for FireBricks to run in cars, and trucks, and alarm panels, and all sorts of places where there are DC supplies.

We also have an option for higher DC voltages found in telecoms racks in data centres (-48V), an option we already have on the FB6000 series.

Even the mains voltage option is different, with the main board using 12V, we have a wide choice of suppliers for the PSU components. We have stuck with the "figure eight" power lead though.

Faster, better, stronger, we can rebuild it... etc, etc.

The new design has a faster processor, and removes a key limitation that stops the FB2700 doing much more than 350Mb/s. We have not got to the stage of benchmarking yet but expect it will be a lot faster. We are expecting faster crypto as well - we'll say more on that once we do have it all coded and benchmarked.

Brackets!

We have said this before so many times and not followed through, but this time it is real, honest. We have wall mount brackets and 19" rack mount brackets (for one or two FB2900s in 1U). I know, pics or it did not happen - just watch this space.

It's cool!

The power usage is lower, so the whole FireBrick will be a lot less on fire. The existing models are designed to cope with the heat, but in a confined space can get warm. The new model uses less power in the first place, and so we expect it to be a lot cooler...

Firmware?

Obviously we are always adding more to the firmware and more features will come along for the FB2500, FB2700 and FB2900 models. Software upgrades are still free, as always.

P.S. (and it should not be a P.S.) there are some good people working on making this happen, like Cliff and Kev, and they need some credit for this all coming together.

Porn ID checks set to start in April 2018

As per BBC article. I hate having to repeat myself, so I'll make it quick.
  1. Is there a problem to solve? Show us the evidence please.
  2. Is there a solution already? PCs can be set up with basic parental controls, and ISPs (even A&A) can help with that, so young kids that have no interest in porn can avoid it.
  3. Is this a solution? Older kids that want to see porn will absolutely not be stopped by these measures, so no.
  4. Will this work at all? Foreign web sites that are free are not covered, those paid by advertisers cannot be stopped by blocking card payments. Maybe some that take cards now will comply, but they sort of do as they take cards already.
  5. Kids can get cards! (albeit pre-pay or debit rather than credit cards) which makes one means of age checking somewhat harder
  6. How do you tell my age? It is almost impossible to make an age verification system that works remotely over the internet that cannot be fudged somehow. E.g. use a parents card details, simple as that. Anything an adult can type in can be typed in by a teenager. Even live video chat cannot be trusted - how long before there is an app to make your live face and voice seem a lot older in a convincing way.
Now for where it gets really bad...
  1. The age verification companies appear under no special obligation to secure the data, in spite of calls for this. If any are outside the UK they may not even have the Data Protection Act to consider.
  2. It is going to be nearly impossible to verify age without verifying identity, and almost impossible to come up with a way that cannot then be correlated to the site, and specific pages and sections of sites that identifiable people are accessing. This data will be hacked and leaked.
  3. Free porn sites are simply not covered by the legislation anyway, so what is the point.
  4. Web sites offering free porn (even those linking to or copying other sites) will now start asking for personal details to prove you are 18, including card details. They will be able to link to UK law and UK government information to justify asking. People will expect such sites to want lots of personal information. People will give details and then when scammed they are not that likely to complain or claw back as they are too embarrassed to go to their bank.
  5. The scams and risks target minorities especially - those that, more than most of us, do not want anyone to know their sexual preferences, even though totally legal.
Also, if you do access a porn site, using incognito or privacy mode so not in your browser history, you won't have cookies, etc, and so will have to do age verification every time?!

Remember, we are talking about legal content from legitimate businesses here...

Tuesday, 18 July 2017

To open source, or not to open source? That is the question...

As I have posted, I have had some fun making a new alarm panel. There are higher levels still to do, and I am sure many features will be added as it is used in anger. I have it at home now, and office will follow later. I am even making a system for the climbing frame/swing thing in the garden as a bit of an alpha test platform.

It was started as both a bit of fun and out of necessity. The necessity is that we have a number of small niggles with the Honeywell Galaxy systems we have here and at the office (see below). One way to address some of these would be to get an new alarm installer and maintenance company - but I am confident that it would (a) not solve all the niggles, (b) cost a lot to solve those that can be, and (c) cause more problems as we would mean calling them if we have to reset the alarm for any reason. So, yes, making my own alarm panel is not as daft as it sounds, especially as I have the skills to do that. Reinventing the wheel is one of my specialties!

Having spent a couple of weeks making some code, and making prototypes, and a working system - I now have a nice solution.

Will it monetise?

As a businessman I have to consider if we can now make money out of this. Right now it is A&A copyright, and A&A has paid for my time and the various bits of kit to develop it all. The company deserves something, either money, or customer acquiring kudos. A&A will get a new and better alarm system, which is a start!

It is tricky. I expect that most can be made selling, installing, and maintaining the actual systems for people. Retrofitting where a Galaxy panel exists for example, selling a system review, and selling maintenance and monitoring. Now, A&A have done telephone system installation in the past, on a small scale, but we are really not geared up for "field engineers" for installation or maintenance, to be honest.

So could I just sell the system - well maybe, but ultimately it is just s/w, and that is not that easy to police and sell. We could make physical systems, a panel box with RIO PSU and Pi and the RS485 leads, all neatly put together - maybe. We could resell the alarm system parts even? It also creates some possible liability issues - what if someone is robbed because of my code having a bug? And then we would need to establish dealers and relationships with those companies that do in fact do installation and maintenance and so on. These are all, for now, well out of our comfort zone as a company.

So, to be honest, I am not convinced I can make a lot out of it, yet...

Should I open source?

Well, this is, in part unrelated to the above. It is possible to publish source and have restrictions on use. It is possible to have some controls. It is actually quite hard to avoid and police pirate copies even if not publishing source. It is also quite possible to make and sell systems that are built on free open source software. Publishing the source code is a good thing to do, and if we are unlikely to make no money from it, that helps make it a no brainer. Or does it?

Vulnerabilities?

I may have vulnerabilities in my code that I have not spotted!. If published, someone could find them and exploit them to rob us, or other people. Of course, they don't know if we have any backup systems to alert us anyway - some part of the old alarm panel still running just for alarm/security and not the problematic door control issues... It would be a risk to try and exploit such issues. I also have slightly more faith in humanity and would expect someone to tell us if they saw an issue in the code, or at least not bother to exploit it, or at least not have the contacts to let someone else exploit it. So maybe safe to publish anyway.

Who controls the "official" version?

It always helps if there is an "official" version, and if we ever get it to meet security specs or British Standards that also helps confirm which versions does that. But who controls it? There are many ways to release code from just putting it out there, to putting on a community repository, to accepting code updates, or suggestions, or changes, and simply controlling the master copy ourselves.

If anyone can submit new code, or make changes, one has to be careful. It would be easy to hide a deliberate vulnerability or introduce a accidental one.

I was pondering, for example, I have key fob codes (numbers) and zero is invalid. If someone just removed the if(e->fob) line from the code that does nothing with a zero, then zero would be valid and may match any user without a key fob defined as they have zero stored - thus allowing a crafted key fob with zero ID to work to unlock and enter. But they could be a lot more subtle - e.g. if the key fob codes have a check digit (I assume not, but I don't know) someone could replace with if(keyfob_valid(e->fob)) and code that function to check the check digit but carefully so that an all zero fob will pass. That would look like a good and valid enhancement, and the loophole for zero would be subtly hidden. Just one idea of ways to hack the code with only a few minutes thought.

Go for it!

Let's assume we do want to make it open source, what is the best platform? Ideally where I have control of at least the main parts of the code in the "official" version. I have not done this for a while, so interested to hear.

I released some L2TP code for linux many years ago and apparently now it is some big and sophisticated system that some ISPs actually use! Totally unrecognisable to me, with loads of new features. Still has my name, which is nice.

I issued some SMS code for Asterisk once, and asterisk has changed and so has my code. Asterisk have an interesting community code copyright model. Apparently it has changed to the point it stopped working, and nobody was interested in fixing it (not even me, now). I coded something from scratch for when I needed it recently, outside asterisk framework (SIP, and alaw RTP).

We have published code on the A&A web site, and people use that with rarely a suggestion or bug report, but I know some is used by lots of people just from some of the feedback I have had (the linux drivers for Epilog laser engraver, for example).

To some extent it depends how good, and complete, my code is, which is partly why I have delayed releasing so far (I do not think complete enough, yet). If I have done it right people will use it and have hardly any suggestions or bug reports.

Open source code also benefits from the usual disclaimer of no fitness for any purpose nor liability for bugs. That is a good start. I am happy with our normal "money back guarantee" on stuff we provide for free, and I think it is more than fair :-)

The code is also layered, so a good chance the low level will just work and the higher levers will have lots of ideas and suggestions and changes, so it may even be appropriate to publish different layers using different publication models.

Indeed, one model may be open source the components but make a "system" with the web based config and status that is not open source and looks nice, and we sell that as a solution (with usually open source disclaimers?).

Do say what platform we should use to open source this...



P.S. Some of the niggles with the galaxy system and how I have addressed them - how many of these would be addressed by getting in a "proper installer/maintainer" I wonder?
  • You hold the fob 3 seconds to arm/set the system, but actually it is implemented such that you can use fob, and then use fob again in 3 seconds, so two separate uses can mean arming by mistake. New system expects actual hold for N seconds.
  • Once set you use the fob to unset, but then have to use again to open the door. Be careful not to be 3 seconds apart doing that. New system unsets and opens door in one go but does beeps to confirm unset.
  • The doors seem to have a lag at the office. New system allows 4 buses per Pi so can minimise lag. It also allows a lot of debug to find underlying issues on buses.
  • If you have a door with a mag lock at the top and reed switch at the side (even at the top by the mag lock in one case at the office), pulling the door whilst locked will trip the door forced alarm. New system can reduce lag, and also has configurable tug timer for this - if below say 0.5s then don't trip door force.
  • If the door almost closes and you grab it before it hits the mag lock but after it hits the reed switch, you get door forced. New system has locking stage and locking timer, or monitored mag lock input, which means opening at this point is not a door force.
  • If you have V locks, they motorise to engage, and so are not instant like a mag lock. This allows a door to bounce and trip the door forced alarm, especially if "slammed". New system allows open whilst "locking" without being a door force.
  • The external reporting via ethernet is crap! We have coded it, but it is messy. New system has lots of more direct reporting including HTTP GET/POST, Email and SMS with multiple reporting, and reporting selecting different groups, users, event types, etc.
  • Even the proper programming system (rather than using keypad!) is some nasty windows software. New system is currently XML files, but small and easy to work with, and will be some nice web UI.
  • Max readers scramble the true number they see so you have to buy special Honeywell key fobs with the scrambled number printed on them in order to use with the Galaxy system. The new system reports the ID it saw from the reader so any compatible key fobs or cards will be usable.
  • We'd love to do clever things like expect alarm set by certain time and report, or disable specific PIRs for Roomba at specific times of night. These will all be possible as we add more to the code, including time profiles.
  • The Galaxy system allows you to add a new Max reader - it will scan from 8 down to 0 and show the highest reader ID it can see on the bus, then allow you to set it to a new ID (0-3, or 0-7 depending on panel). A new Max appears as ID 8. It does not even say which IDs are in use already, so even for a new install you have to keep track of which you have added. It does not allow you to reprogram any that are not the highest number on the bus! The new system lets you change any Max to any other and reports what it finds on the bus.
  • The Galaxy has no way to properly integrate door lock engaged inputs that come from monitored mag locks or V locks. New system does and can detect door ajar events neatly.
  • The Galaxy allows door propped timer limits but no way to override on per case basis, hence we don't use them. New system allows key fob to override (if they are allowed to), logging who allowed door prop.
  • The max reader beeps (loudly) when door lock released, but I have V locks that make a nice clunk and do not need a beep. Indeed, every Max reader I have seen has blu tack on the sounder as annoyingly loud. New system allows silent door open and then uses beeps as an actual error case, so can be loud.
  • User names on the Galaxy are stupidly short making for almost unreadable reports from the system for some users. Also makes it hard to use same names on other systems - we have used Galaxy logs for time recording but needed a mapping of the abbreviated names. New system allows any length user names making it much simpler.
  • Of course the Galaxy simply lacks the new features we have now dreamt up, like air-lock door pairs. This is less of a niggle though as we don't need this in the office, yet!
I bet I have missed a few niggles, but you get the idea...