2017-01-06

Barcodes

I have been messing with barcodes most of my life - and I don't say that lightly! My first ever commercial software was when I was 15 or 16 and I did some bar code reading software for an RML 380-Z. It involved reading some simple character barcodes, and also EAN/UPC barcodes. All the timing was done in the processor based on a one bit input from a light pen / reader.

I learned about barcodes back then and have been messing about ever since in various ways.

There are two main types of barcodes, though, to be fair, only one has "bars". The two types are 1D or linear barcodes, and 2D barcodes. It is really misleading to call the 2D codes "barcodes" to be honest.

Linear barcodes

There are many types of linear, or 1D, barcodes. They are designed to be read by a wand or laser or reader which looks at a line across the barcode seeing black and white in specific timing or spacing.

Normally these need a quiet zone (usually white) before and after the code, and then have bars and spaces (bars being black and spaces being white) which are certain sizes. Some standards have simply thick and thin for these and thick could be different to simply twice the width of thin. In practice, using thick as 2 "units" and thin as 1 "unit" usually works even in such cases. Some systems have several thicknesses of bar and space, each a multiple of a basic unit size. This maps well on on to simple pixel graphics images.

One of the least efficient and most annoying of these is "code 39". This uses 5 (black) bars with 4 (white) spaces making a total of 9, of which (mostly) 3 are thick and the rest are thin. Thick can simply be twice the width of thin. Code39 allows 40 combinations of 3 from 9 being thick, which codes letters, numbers and a few symbols. The space between each character could be one thin space, or more. There are a set of special codes that are thin bars and thin spaces with one of the spaces thick giving 4 extra characters.

The beauty of such a system is that each character is a self contained sequence, and you can in fact make a font out of it. There are no inherent check digits. Each normal character is the same size. The codes start and end with "*" character. So it is very easy to construct, though very inefficient.


Another simple code that only uses thick and thin bars and spaces is ITF (Interleaved 2 of 5) which only codes numbers, and then only even number of digits. It is much more compact for numeric sequences. A common checksum is the LUHN checksum as used on credit card numbers. Each pair of digits is 5 bars and 5 spaces (interleaved) where 2 of the 5 are thick. This makes 10 combinations for digits 0-9.


We then get a tad more complex where we do not simply have thick and thin, but 1, 2, 3 or even 4 unit widths. The system used for retail product code marking UPC (Universal Product Code) and EAN (European Article Number) allows coding for products using a numeric value.


By using more different widths, this allows more code density. The format has specific additional control fields such as the two thin bars with thin space at start and end and in the middle. There is a standard checksum coding as well. This is coding specific 13 or 12 or 8 digit sequences only.

Another common linear code is codabar 128 - this uses multiple width bars and spaces (up to 4 units wide). It has special coding for pairs of digits to be efficient for numeric sequences, but allows for letters and numbers and symbols. It is probably the most dense and flexible 1D coding that you can use.


Like most systems for linear coding the barcodes all have consistent width (apart from special characters in code 39). This helps allow formatting of a specific number of digits or characters in a specific space.

Two dimensional codes

There are two main standards for 2D codes. These are not "barcodes" as they do not use "bars", instead they use patterns of pixels which are black or white. Both of these include forward error correction using Reed/Solomon coding. This means that defects errors printing and reading and can for many errors. Obviously the technology to read these is different - based on cameras rather than linear pens or laser scanners.

One standard is IEC16022 "DataMatrix". It is quite nice technically. It allows a number of different methods for encoding data optimised for numeric or alpha numeric and so on. It is used on postal systems in the UK quite a lot.


The other common 2D code is QR codes (IEC18004). These are, in my opinion, not as nice technically, and not as compact, but look "cooler" so are kind of winning the popular vote on such things. They have target squares within them that sort of look better. They do have different coding formats for numeric, alphanumeric, etc.


Summary

There are many 1D and 2D coding systems and some clever new colour systems even, and picking the right on is a good idea. You want something compact and with good error corrections and detection. It is a shame so many systems opt for the worst of 1D coding using code 39 fonts though, especially when the data is purely numeric and could be much better coded as ITF or codabar128.


P.S. My card ordering system allows you to create cards with any of the above bar coding systems. The Odeon card is an example.

4 comments:

  1. The IEC18004 code on this page could be read at a far greater skew / angle than the IEC16022. I guess this makes it far more usable and reliable for the general public scanning things with their phones on billboards / magazines etc.

    ReplyDelete
    Replies
    1. I am not sure that is an inherent property though, just different reader code I expect.

      Delete
  2. You might find what I'm doing with barcodes (QR codes) and card printing of interest: https://ether.cards/. I ended up buying my own card printer so I could keep tight control over the key material (and the used ribbons, which contain negative images of everything). Keypairs are generated using a Python script and a QR code encoding module written in C.

    ReplyDelete
    Replies
    1. Slick!

      (You might add your URL to your Twitter profile :))

      Delete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

I²S

I²S is, err, fun. What is I²S Well, first off, it is grammatically like I²C which is an acronym with two Is in it which people then treat an...