2024-03-04

2038

This may be slightly topical now as we have seen a seriously surprising number of leap year issues this year, or so it seems. See https://codeofmatt.com/list-of-2024-leap-day-bugs/

It is quite amazing how any programmer gets something as simple as leap years wrong. There was some excuse in 2000 maybe as people had the vague idea that every 100 years there is no leap year, but 2000 was an exception to that (every 400 years), so for the last 123 or so years we have had a leap year every 4 years. It is pretty damn simple. Personally, a much bigger issue is coding for clock changes twice a year, but leap years are simple, and there are a lot of libraries for it, so what the heck?!

Y2K was a joke?!

One of the other things I hear is people saying all the concern over Y2K bugs was just a waste of time as basically nothing happened, which is true for most people for most things. To be honest I think I heard of fewer Y2K issues than I did for this leap year, but memory can be like that.

In fact Y2K was only a non event simply because a lot of people (I was one of them) did a lot of work (OK I did not do a lot) to make sure it was a non event. There were a lot of things that would have gone wrong, honest.

Y2K was a huge success for proactive work to avoid a problem.

Y2K was easy?

One view is Y2K was not that hard to sort, the hard work was the sheer scale of it. It was almost entirely down to the way a year was recording as text or presented or input.

The classic example is your bank card that has the year as two digits. This is basically how many Y2K issues manifested.

Decades ago, checking an expiry year was easy, is current year (two digits) smaller/equal to year on card (two digits) - if so, OK.

The problem is when the current year is 97 and the year on the card is 02 as 02 is less than 97.

  • The real fix is to make years 4 digits (leaving us with a Y10K bug), but even with sticking to 2 digits, which was simpler for many things, there are issues to address...
  • One trick is if current year is >=50 and card expiry is <=50, add 100 to card expiry and compare. Simples! and solves Y2K. This is simple but broken, as what happens when it is 2051 and a card has an expiry of 49? That assumes 2149 expiry and so allowed. There are a few variations on this, and not always 50 as the reference.
  • A better way is if it seems expired by more than 50 years then assume not expired, that is a sliding comparison (several ways to do it), and for cards that only last a few years it is by far the best as you expect a card number to be replaced (or the whole concept of cards to have gone) long before a 50 year old card pops up.

The fact there is more than one way to do this is an issue, and a time bomb waiting to happen. I bet some people did the simple way and will find issues around 2050 ±5 years for all sorts of things, oops.

2038 is hard

This is where I have to explain time_t to you, sorry.

Unix, and many other systems from that, use a type for time that is seconds since the start of 1970. It is a simple system. Those seconds were stored in a signed 32 bit number, which allows -2147483648 to +2147483647 and hence dates from Fri 13 Dec 19:45:52 GMT 1901 to Tue 19 Jan 02:14:07 GMT 2038. This seemed a pretty good range. especially for adult engineers living in the early 70s. But 2038 is getting closer and closer. I may (hopefully) live to see it.

One of the Y2K issues is that the conversion of time_t to struct tm creates a tm_year that starts 1900, so would often be printed as 19%02d and so you would see 19100 for 2000. The fix is to change it to print %d and tm_year+1900. The whole struct tm thing is messy as hell and should have set tm_year to be the year, but even tm_mon is one out, so always fun!

So what's the fix?

  • A bodge, used on some systems, is to make time_t an unsigned 32 bit number. This allows time from 1970 to Sun 7 Feb 05:28:15 GMT 2106 but not dates before 1970, which is good enough™.
  • The proper way is to change time_t to a signed 64 bit type. This is what unix/posix has done. This allows times from invalid date to out of range date... OK roughly ±292 billion years from now, which is pretty good given the universe started only around 13.7 billion years ago. But it may well continue more than 292 billion years, so we may have a Y292B bug coming up. Be ready!

So it should be simple, change time_t, right?

What's the problem?

The problem here is not that the fix is harder, it is indeed simple, change time_t to 64 bit, but that it is harder to find the problems. Y2K was a lot about input/output a year as 2 digits. It was way more visible. 2038 is harder as it is hidden. It also happens in a lot of embedded systems - the code that runs your fridge or your toaster, and much more.

For a start, any code properly using time_t will simply work if the underlying build environment and include files and libraries are all up to date to have a 64 bit time_t, and crucially the code uses all the time/date libraries. This is good, but also bad, as the code you are looking at is unchanged, working and not working code looks exactly the same!

The problem is that you need to know that the built binary image in some system was built in an environment that used 64 bit time_t. Do you know that for sure? How do your tell? How do you test?

There are also cases where code may happen to misuse an int or int32_t for something. You may have a proper 64 bit time_t passed to something that does maths using int, and then passes to something using a time_t as an answer. This will work for now, but not after 2038. This is hard to spot, and worse because int is a nightmare type it could be 16, 32, or 64 bits anyway, so may or may not work. It may work in some tests and not others!

And mostly I am talking of C code or things based on it, but many languages have time logic, and many are based on the time_t 32 bit time. Some will be easier and some harder to find the problem.

A fix to make something 64 bit time_t may also break things if that is communicated in binary in a protocol of a file. Is the other side expecting 4 or 8 bytes? So fixing this can break things in itself. This will be bugs long before 2038 in attempts to get ready for 2038.

For these reason 2038 may be way worse and way harder to track down and fix.

2036

There is actually another problem with NTP protocol which used 64 bit time, 32 bits for seconds from 1900, which expires in 2036. NTP changed to 128 bits some time ago, but some may get it wrong. It will be a bit of a dry run for 2038 issues. I hope I live to see this too, at the very least.

Anecdotes

As so many people now have no clue what Y2K was like, and may not have even been born, I say as an old fogy, one of the fun things coming up to Y2K was the bullshit.

We had customers and suppliers wanting us to make formal legal statements on "Y2K compliance".

We never signed any formal statement of "Y2K compliance", ever. We said we expect anything we controlled to work with any date, and if it failed we would work to fix it, like any other fault, and we expect the same of suppliers of things we sell.

One special case was Barclays card services, as they wanted us to confirm we were "Y2K compliant". I replied asking "compliant with what?".

I found that British Standards had created a standard for what Y2K compliance meant, and it covered working correctly with all dates before, on, and after 2000, which is way more than most people were doing. It meant that "Y2K compliance" to the BS meant 2036 and 2038 compliant!

We stated we were not as we were sure we had some systems that would fail in 2036, 2038, or 2106, or 10000. We also did not work with dates before 1582. So we could not say we complied with the British Standard on Y2K, and seriously, no company could! But we asked what standard they wanted us to comply with? Is it the BS standard or something else?

Importantly we asked if THEY were compliant with the same standard! We knew they were not.

They did not reply but later came back with new questions clearly to all customers, referencing the BS standard, and some clues on what they would do and comply with (even if not the BS standard). We did not claim to comply, as we could not, obviously, and there was no come back.

Nobody really pushed back on us saying we could, of course, not comply with Y2K compliance, and neither could anyone else, but we would try our best. Asking what the person asking the question was doing to comply was always telling. It was layers of bullshit on bullshit, and in some ways "fun". I wonder how may companies spent a lot of time on the "paperwork" and not just on actually fixing things.

Thankfully all the bullshit had a deadline.

Deadline

At the end of the day, for a lot of things, and I say this an an old fogy, it is "will I be dead before this is an issue". This is, perhaps, the problem with any such code, ever. I'd love to be around for the Y292B bug.

5 comments:

  1. The real horror show is twofold:
    - compatibility, as you mention, with old code taking a time_t that is too small, likely embedding it in public interfaces and data structures in memory, and even doing things like assuming both the 1901 minimum *and* 1970 origin (how do you extend *that*?).
    - compatibility with things *persistently stored on disk*. This includes *filesystems*. If your last-modified time maxes out at 2038, what do you do? Filesystems last for decades, and real code breaks badly if the last-modified time is suddenly out by decades, *and* if it doesn't ascend properly as time passes. The XFS folks had an ingenious solution after they managed to actually have a 64-bit timestamp counter that wouldn't fit, because the top 32 bits were signed seconds: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/xfs/libxfs/xfs_format.h#n710

    ReplyDelete
  2. There's no excuse for getting leap year handling wrong. Either do very little and assume all years divisible by 4 are leap years, this works from 1901 to 2099 and is good enough for most purposes. Or do it properly (multiple of 100 years aren't, multiple of 400 years are). People who only know half the rules should have known to go and CHECK what the rules are. I've written timezone and leap year code twice and debugged a third implementation, and it only goes wrong when people don't check their assumptions and don't test their code properly.

    ReplyDelete
  3. I'm one of the people that did work quite hard to fix Y2K issues. Nothing went wrong on the day because as an industry we pulled our collective finger out. I hope that success isn't going to make people complacent about future date issues.

    ReplyDelete
  4. Your Barclays comment amused me as I have had two separate time issues with Barclays. The first was attempting to make a payment "now" on their app whilst in Hawaii. It errored with 'time in the past'. I had to make the payment two days later before the app would accept it. The second was printed records of transactions. They messed up on GMT/BST, being an hour out from the actual transaction time. Though remote, that could have real legal consequences for some people. In both cases I tried to explain in correspondence but they never 'got it'.

    ReplyDelete
  5. I well remember millennium night as I was in my office at a very well-known UK radio station at twenty minutes after midnight checking our systems had all continued to function. Javascript hadn't though, unlike in testing, as it was now suddenly the year 3000 not 2000. One really shouldn't edit live systems directly after drinking lots of champagne, but I did. Successfully!

    ReplyDelete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

One Touch Switching

We have an interesting one today! I have been reporting on the progress of One Touch Switching, and some of the many issues. To be clear, we...