Wednesday, 9 May 2012
The problem with Communications Data and the Internet
Whats wrong with that then?
The history of this is really to do with telephones. As soon as telephone exchanges were digital the telco (post office / BT) were able to get data on calls made. This was used for billing, and disputes, and so on. It was, of course, invaluable for the authorities to be able to interrogate this communications data to find who called who and when. This helps with investigations of all sorts of crimes.
Now the Internet comes along, and suddenly they realise they no longer have this. People are sending emails, and making telephone calls, and all sorts, and they don't have anyone to ask for the communications data.
They have gone some way to try and tackle this with the Data Retention Directive, which means anyone providing telephone and email services have to hold the data they are processing for a year (if asked to). But even that is rather out of date now in concept and only applies to data they already process. It is also (IMHO) very badly drafted.
So why is it so hard exactly?
Is the data there even?: With telephone calls there was a need to collect data in the first place, for billing. With email, and messaging services, and even some VoIP services, there is no need to collect the data for billing. It does not matter how many messages I send on irc for billing.
Who collects the data?: With many services there is a telco providing the service itself. E.g. a traditional voice call goes via the telephone exchange. That means there is someone with the data. However, anyone can run an email server and they don't have to be a telco, or easy to trace, or subject to logging requirements. People can run their own email server at home, and send email to other people running their own email server at home, and there be no mail server in between. The lack of central service provider in communications is increasing with more and more means to communicate directly (as Internet Protocol intended). This is even easier with IPv6 removing NAT from the equation. There are peer to peer protocols that are specifically designed around the principle of no central control or authority (for all sorts of good reasons) where there is no service provider at all, and this will happen more and more.
Where is the service provider?: Even when a service has a "provider" who will be able to collect communications data, they could be anywhere in the world. They do not need a presence in the UK or be subject to UK data retention or investigation laws. This is commonly the case even when people are not trying to avoid the legislation.
Broadcast data: One interesting thing about the likes of twitter, or usenet, is that the data is typically sent to everyone. There is no way to identify a recipient. With systems like usenet you can make such messages private using encryption. With no need to reply to the specific sender, the postings can be anonymous. So the communications data becomes "anonymous posted to public forum", but no record of what they posted as that is content.
Can you see the data: One idea is that ISPs could have to deep packet inspect communications (something not even allowed under EU law AFAIK), to extract communications data that is part of non-UK services. But computers are now fast enough for encryption to be completely standard in many services - so no way to actually get at that data in the middle.
Too many protocols: When there were only phone calls, it was easy. Even if it is only calls and email and allowing for VoIP, not too hard. But there are millions of ways to communicate, with and without there being some service provider. In-game chat is a classic, and applies to everything from world of warcraft to wordfeud and there are new apps, and games every day, many of which may happen to provide a means to communicate. There is no way any interception systems could keep up. Even knowing the system, e.g. facebook, the web interface the DPI is trying to track will change at the whim of the designers. Anyone that have tried to screen scrape such systems will know it is a big job to track this. What is worse is, that in order to try and keep up, the black boxes would have to have remote administration independent of the ISP, and that allows a lot more interception to be done without anyone knowing what is going on.
Micro telcos: As I say, people can run their own mail servers at home, but there is a level between that and traditional telco where someone runs services for other people, for money or not. This is so easy now. I used to run a mail server at my house for my family. If such micro-telcos are to have a burden of collecting, storing and reporting this communications data, that would be horrendous. If running an irc server meant keeping logs, that is a burden. If the tax payer has to pay for the black boxes in every bedroom ISP, that is expensive. If it becomes know that small ISPs are exempt, then where do the bad people get their Internet I wonder?
Fine lines: With traditional telephone calls, as a side effect of why it was collected (for billing), the communications data is simple - date/time, from number, to number, duration. That is about it. But what constitutes communications data for email, or twitter, or MSN, or irc? Is the subject of an email included? Is your friends list on facebook communications data? What about all the other useful headers in email? Maybe the ID of a PGP signature used? Where exactly is the line drawn? Even an email address may include someone's name - which is not something that was in traditional telephone call logs. Does the IP address of every IP packet count as communications data? This is an tricky questions.
Signal to noise ratio: With traditional telephone calls, the integrity of the data was good. CLI was only from trusted sources. You were sure the call from and to numbers you had logged were right, with very little chance of error or deception. Now, even on phone calls, you cannot trust the CLI. You certainly cannot trust email addresses. And of course, the vast quantity of junk mail out there - when separated from the content that spam filters use to identify it, it will make for huge amounts of noise. You certainly would not be able to use a record of an email captured from such a system in court as the defendant could point to thousands of emails which are bogus. Indeed, it would be in people's interest to have a virus on their machine sending lots of email to random addresses as it gives them plausible deniability. Of course, if people want, they can make apps that generate low levels of traffic that look like communications - e.g. apparently sending 10 small emails a second 24, hours a day, from a million of computers, to and from random addresses. In there somewhere is the email you are looking for.
There will always be ways to hide communications: All of the above is before you consider someone actually trying to hide there communications. When you start using encryption, steganography, vpns, tor, and so on, then you are able to communicate with no trail being left. So, only law abiding innocent people will be affected by this - the criminals don't have to have their privacy invaded. Are the criminals smart enough? Well, there are plenty of web sites explaining to people in China and other oppressive regimes how to bypass monitoring and firewalls - so anyone with access to google is smart enough.
Consequences: It is all very well saying that this is a total waste of time, for all of the above reasons, but is that a reason not to do it? Well, obviously, if it costs public money, then yes. But what else could go wrong. This data is valuable, and a target to be stolen or unscrupulously sold. It is an invasion of privacy. It is technical complication making things break more often. It could allow general purpose unsupervised black boxes in to ISPs with no end of possible feature creep. It will cost a fortune and so put up prices or taxes for us all.
Someone please educate the politicians!