I have posted a bit on passwords, and I thought it only fair to try and explain some of the issues to my non-technical friends.

I am going to try and explain what a hash is. This is going to be tricky without talking about mathematics, but I'll try.

A hash is a function that... Ah, OK, I may have to explain the idea of a function first.

A function is normally a process, a sequence of steps that you can take, to turn one thing in to another. In maths, you can have a simple function like multiply by two, and you can use that to turn 3 in to 6, and 4 in to 8, and so on.

OK, now, a hash, is a function, but it is a complicated one.

When you look at a function like multiply by two you can work out that there is an inverse function. That is, a function that does the opposite. In that case it is a divide by two, which turns 6 in to 3, and 8 in to 4, and so on, doing the opposite of the first function.

The idea of a hash function, in this case, is it takes an arbitrarily long string, such as a password, and makes it in to a number, a big number. In fact, such a big number that it is usually written as a long string of hexadecimal digits. But this should not really put you off, it is still just a number.

There are lots of different types of hash functions that can exist, and there are different reasons why you might have them, and different algorithms that make sense to meet those reasons. Here are a few examples. However, mostly, a hash function should be one you cannot do backwards (i.e. cannot make the inverse function).

1. A hash for distribution

Let's say you want to put things in different bins, and you want a way to decide which bin to put them in. We'll say we are talking about words here still (like password), but we want to turn the word in to a number of which bin to put the word in. We don't care which bin we use, but we want to always put the same word in the same bin. OK, that sounds convoluted, but trust me, this is sometimes a thing you want to do with computers!

In this case the hash can be very simple - you could turn each letter in to a number and add them all up, for example. That may not be the best way, but it is a way that will work for this type of scenario. We probably want the hash to be quick and easy to work out, and we are not too worried about how easily it is for someone who guess a word from the hash later or to try and make an inverse function that works out what the word was from the answer. There are several good functions for this, and what makes them good is that they are fast and the create a reasonably even distribution of answers for different inputs.

2. A hash for encryption

Some times you want a hash that is turning a long bit of text or data in to a single (long) number in such a way that it is almost impossible to reverse the process. I.e. there is pretty much no way to find the text that makes the answer to the hash function. In fact, you want to make it so that pretty much any text you use will come up with a different answer with very little chance that two bits of text could ever come up with the same answer.

This is quite important for lots of things in computers, as the hash can be treated as a substitute for the original text. If I know the number and I have a way of verifying that number is correct, I can assume that the text that I later check matches that number is correct.

For example, imagine it is a lot of work to send a big file to you, and at the end you want to check you have the right file and it has not been corrupted, either by accident or deliberately. Well, you could make a hash, and so could I, and we could call each other, and I could tell you the hash and you can check it is right. I don't have to read all of the text over the phone to you, just the hash.

The requirements here are that the hash is quick and easy for a computer to compute, but that the chance of two texts ever matching the same hash is very slim, and the chance of a slightly modified version of a text matching the same hash is pretty much zero, and the chance of someone being able to contrive a text that matches is pretty much zero. You need the hash to be reliably a substitute for the original text for checking.

3. A hash for passwords

The requirements for a hash for passwords are very similar to a hash for encryption, and almost all of the issues of being unable to reverse the hash or find a password that makes the same hash, are the same.

There are, however, a couple of key differences.

For a start, passwords are shorter, so a much more complex algorithm could be used which might be totally infeasible for hashing large files.

But also, you want to make the hash slow and complicated (e.g. use lots of computing power and memory). The reason being that passwords are guessable. E.g. using every word in a dictionary and lots of digits and other tricks people use to make likely passwords, and check them.

If the hash is very quick to work out, a computer can work through a lot of guesses of possible passwords and try them out to brute force the answer and find a password to match.

Indeed, as someone reported on my other blog posts, some graphics cards are able to do around 5 billion SHA1 hashes a second! SHA1 is a cryptographic hash, so designed to be quick to work out by a computer.

The trick for a password hash is to be as secure as a cryptographic hash but also hard to work out in a way that means hackers cannot try a lot of hashes per second on simple cheap hardware.

I hope that makes some sense...


  1. "an arbitrarily long string, such as a password"

    That's just a (big) number too, right?

    1. Yes and no

      Yes, in that to a computer everything is just a number (or numbers)

      No, in that programmers typically use "number" for things which are fixed length and "string" for usually textual things which are variable length, and passwords are the latter

      Though these terms are "fuzzy", and "arbitrary precision numbers" are obviously variable in length but they're very definitely numbers


Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

TOTSCO moving goal posts, again!

One of the big issues I had in initial coding was the use of correlationID on messages. The test cases showed it being used the same on a se...