Cost of Data Retention

The Draft Investigatory Powers Bill has a requirement for ISPs to retain data, but the wording is so wooly it could literally be any data.

One of the important points to be debated about the bill is the cost impact. Obviously people are asking what the cost of retention will be. Unfortunately I don't know, because unless, and until, we get a secret retention order, we don't know what is expected of us. Even if other ISPs get orders, we will not know as they are secret.

So we need to get a handle on what they intend. Unfortunately it is more important than that though - it is not just what they intend, but that intention has to be then put in the bill. If not, then the second the bill passes the secret orders could be very different and have totally different costs to those debated in parliament before the bill passes. If even the politicians are honest (choke!) a change of government puts someone else in charge and they can use the act based on what it says, not what the intentions were. What is worse, as they are secret, nobody will know that the orders are not as per the intentions explained to parliament.

To try and put this in to some sort of logical order, I have listed below some of the things that could possible be requested and an idea of complexity. What would be useful it to know which of these they are after, and have that writing in to the bill now.

Keeping existing logs for a year

Some things an ISP already logs. Examples are email server logs, or call server logs. If the ISP already logs something to a durable medium such as a hard disk, and keep logs logs for a period (a few days for email logs, for example), then simply asking that they keep the logs for a year, and provide a means to access via RIPA requests, is not too hard. It has some costs (bigger hard disks), but is technically relatively simple. I am not too worried if such orders are made, especially because we could move such services outside the UK if we did not wish to make logs at all.

Making some new logs

In some cases an ISP will have equipment which has some means of creating some logs, but they don't log at present. Assuming the equipment is capable of making logs that can be stored in some durable medium, then it could be possible to turn on that logging and keep those logs and have them for a year. This is slightly more work. If the logs are particularly sensitive data, the ISP may have to have extra security measures that would not be present if simply "not logging" as now. It is a step further than just keeping existing logs, but may be possible.

New equipment to make more logs

There are ways in which some equipment can create additional logging, such as sampled IP headers. This is usually used for network diagnostics, things like working out where a denial of service attack is coming from and going to or planning network upgrades or configuration changes. It may not be enough to be that useful for intelligence services as it is more statistical than a proper connection log, but it may be. Installing new equipment or upgrading existing equipment may be possible to provide this sort of additional logging. This will have some costs for the new equipment, and again for the logging itself, so is a step further. The cost will somewhat depend on the extent of logging required. In the case of A&A, one of the big costs in any new equipment is the fact that the rack in question is full and the data centre in question has no spare racks - that could make installing one cheap piece of kit very very expensive.

Logging TCP sessions, UDP exchanges, etc.

It could be that they would like a log of all "sessions". Note that a "session", or "Internet Connection" is not a hard concept - it exists for TCP, but not for UDP or ICMP. It sort of exists for IPSec with key negotiation. For some protocols like SCTP or MOSH it is somewhat more complex as the single "connection" can change endpoints like Trigger's broom and stay up for years. Even with TCP, a "connection" could last days or months or years - it could be that when the session ends and is logged it is already older than the 12 month period of logging. Just trying to define what a "connection" is will be hard, but some sort of Deep Packet Inspection (DPI) kit could track sessions. This is very expensive on any scale at all - ISPs routers use specialised hardware (ASICs) to keep up with just forwarding packets - to track "connections" is a lot more work and cost.

Logging stuff from TCP sessions, like web or email addresses

Ultimately, what was said in parliament, is that they want web logs - logging the web site names. This is much harder still - you don't just have to track a TCP session over multiple packets but have to track the clean data stream within it, understand higher level protocols like http, and extract information from those protocols like web site host name or email headers. This is another level of expensive and complex over and above session tracking. Note that this level will be increasingly thwarted by the use of encryption.

Logging all content

We don't think they yet want to log all content, but basically that would be impossible. The storage requirements would be vast and impossibly expensive - the data flowing over the Internet is just too vast to log.

In addition to these various levels of logging, there are some other key issues :-

Denial of Service attacks

One small point is that there are denial of service attacks - these will look like millions of separate connections a second. Any system that tries to log "internet connection" records will need to be able to keep up and log these. The issue is, of course, that these are enough to break the network normally - having a logging system that does not break in the face of trying to log this traffic will be even more expensive. Now, you could take the view that we don't need to log a denial of service attack, but (a) surely you do as it is illegal activity and that is the whole reason for making these logs, and (b) the DOS could be targeted at the logging - not enough to damage the ISPs network but enough to look like a shit load of connections and be too much for the logging systems to keep up with - thus losing real connection logs. Being able to cope with such new DoS targets will mean even more complexity and cost for the ISP.


One of the big issues, and costs, with any of the more complex solutions for tracking "connections" and especially tracking data from within those connections, is the changing nature of the Internet.

Already we see more and more systems using encryption - so even something a simple as sending or checking email will now be impossible for the ISP to "see" in to and identify the sender and recipient of the email by email address unless they are themselves providing the email service. https which is used for many web sites now currently allows DPI to "see" the website hostname, but that too is changing and it will soon be encrypted too.

But even without encryption, the protocols change. This is not just because standards change, and they do, but because of the very nature of the Internet. It allows packets to go from one place to another and does not care what protocols are used. As long as both ends understand, it does not have to be any sort of "standard" at all. An application on a phone could talk some completely new IP protocol to its server over the Internet, or even talk something that looks like an existing protocol like TCP but actually in a totally non standard way. That is all valid in the Internet. Web sites generally have to follow some standards but games and apps can do what they like, and often make up their own unique protocols for communication with game servers. One of the key things that may want to be tracked is things like in-game chat - but there is no way an ISP can sensibly do that looking at the packets as they pass, even if not encrypted.

Interestingly Network Address Translation (NAT) is responsible for limiting the protocols commonly in use (typically to ICMP, UDP and TCP) because that is what NAT boxes understand. Even with this limitation, the protocol then used over TCP and UDP can be whatever you like. However, IPv6 is finally taking back the Internet as simply a means to get IP packets end to end (as it was designed) - it now allows new protocols and misuse of existing protocols without the limitations of a NAT box having to understand what you are doing.

So, the equipment that does any sort of session/connection tracking or DPI will have to be constantly updated and maintained to handle the new protocols coming along, and even guess at some protocols it has never seen. If looking in to higher level protocols, that will be a constant battle with innovation on the Internet, and with rebellion at the monitoring that is being done.

However, in summary - we need to know what level of logging is intended by the bill, and we need the bill updated to be clear on that, else the cost estimates are a joke.

No comments:

Post a Comment

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.