G8 Government Workshop on Safety and Security in Cyberspace, Tokyo, May 2001

17 February 2002. Thanks to D, see a similar data-retention paper, "G8 Government-Private Sector High-Level Meeting on High-Tech Crime, Tokyo, May 22-24, 2001," at the Ministry of Foreign Affairs of Japan:

And a full listing of documents from the high-tech crime conference in May 2001:

A second anonymous source has told Cryptome that covert Internet interception and collection of data far more comprensive than described in this paper is underway in the United States (and perhaps elsewhere) as a consequence of 9/11. That government authorities have arranged with ISPs to implement and operate this system without public disclosure of its extent. The operation is claimed to be nationwide (perhaps global) and far more extensive than the individually-targeted Carnivore and similar systems. Unusually powerful computer arrays are said to be deployed on ISP backbones and central nodes for sifting and searching all Internet traffic for patterns and types of communication -- one type cited is encryption. It is not known if the operation is based on court orders or is an induced patriotic service of ISPs -- or even disinformation to demonize the Internet and encryption. More on this operation is welcomed (preferrably anonymous and encrypted); send to: jya@pipeline.com. Public key on Cryptome home page.

G8 GOVERNMENT-INDUSTRY WORKSHOP ON

SAFETY AND SECURITY IN CYBERSPACE

Potential Consequences for Data Retention of Various Business Models

Characterizing Internet Service Providers

1. Rationale for Collection and Retention

Data collection and subsequent retention is charged with the conflicting interests and values of various stakeholders. The current discussion is intended to take steps towards finding a balance amongst the diverse legitimate interests. In some jurisdictions collection is tightly constrained under fair information practices, sometimes enshrined in data protection or privacy legislation, pursuant to which data can only be collected for a limited purpose, used only for a stated purpose, with informed consent, and subject to other safeguards on use (such as checks on the integrity of the information, known destruction schedules, and subject access).

A variety of network management information is typically stored for subsequent analysis to assist in network accounting, service reliability, network element reliability, fault history, performance trends and capacity forecasts. In addition to these purposes, there can also be marketing and consumer profiling uses for such data. Given the range of potential uses, different market niches, and a host of factors including costs, it is probably fair to say that there is no single business or industry position on the collection and retention of traffic data and subscriber data.

Some of the information that is collected or retained may be of assistance for lawfully authorized computer investigations. It is evident that the retention of certain traffic and subscriber data can facilitate the tracing of criminals over the Internet by law enforcement agencies. It is crucial to understand the range of network accounting and network management practices of Internet service providers to determine the degree to which the requirements of law enforcement agencies may be met by routine ISP practices.

The goal of this document is to discuss some of the issues associated with the feasibility of data retention, what can be achieved consistent with the explicitly stated business purposes of collection and retention, and the degree to which these may facilitate the public safety mandate.

2. Privacy Issues

Before looking more closely at privacy issues in the world of Internet service providers, it is worthwhile reminding ourselves that many kinds of information can be or are being collected. A wide array of information would be regarded as content – not simply the content of private interpersonal communication, but a spectrum of content from intellectual property (trade secrets, proprietary information, copyrighted information such as articles, books, digitized movies or songs; copyrighted software in the public domain, etc.) to publicly available information devoid of any specific property rights. There are vast amounts of consumer profile information, transactional data, network traffic data, subscriber information and billing information.

A plurality of different commercial entities (financial institutions, banks, credit card companies, credit bureaus, direct marketing and telemarketing companies, etc.) collect personal data and transactional data for diverse purposes. Within the health sector hospitals, doctors, private clinics, medical service facilities and pharmacies collect and retain personal information. There are a multiplicity of different government agencies at the federal, regional and municipal levels that collect data, retain data, or seek to access data held by other governmental, regulatory, and private sector entities. Within the communications sector broadly speaking a number of different entities – cable companies, telecommunications companies (long distance carriers, local exchanges, cellular providers, PCS providers, satellite providers), and Internet service providers – are likely to collect, retain or have access to diverse kinds of information (some of which would qualify as personal and likely to have various degrees of protection).

With respect to the privacy issues associated with data retention of Internet communications, it is crucial to distinguish subscriber data from traffic data because a number of jurisdictions within the G-8 treat the two differently in terms of national legislation. In some cases, there are additional legislative considerations besides implemented data protection directives that must be borne in mind.[1]

If we look back over the past few decades, manifold procedures (with multiple threshold triggers) with different degrees of obligation (from voluntary to mandatory) for the collection, retention, access, use and destruction of diverse kinds of information have been introduced. Internationally, for example, one will recall the Council of Europe’s 1981 Convention for the Protection of Individuals with regard to the Automatic Processing of Personal Data or the 1980 OECD Guidelines Governing the Protection of Privacy and Transborder Data Flows of Personal Data. These sought to establish principles according to which personal information must be obtained fairly; used only for the original specified purpose; adequate, relevant and not excessive to that purpose; accurate and up-to-date; accessible to the subject; kept secure; and destroyed after its purpose is completed.

The degree of obligation was further strengthened in some jurisdictions, for example, by the European Union Data Protection Directives (1995 - Directive 95/46/EC and 1997 – Directive 97/66/EC). Following from these, numerous European countries have implemented stronger data protection laws to comply with their legal obligations to meet the standards of the Directive. Outside of Europe, there are also instruments with similar provisions for dealing with personal data, such as the Canadian Personal Information Protection and Electronic Documents Act (PIPEDA) of April 2000. Additionally, due to changes in technology and ideas towards regulation, there are initiatives on updating the existing implementations. For example, the EU is updating the 1997 Telecommunications Privacy Directive to make it more applicable to electronic communications (to address new applications in mobile telephony and IP traffic). Regardless of the details and differences, these legislative initiatives seek to minimize the collection and retention of data to what is necessary for stated business purposes. With respect to ISPs, the purposes are usually limited to billing and network engineering, and do not expressly include law enforcement purposes.

Privacy experts and advocates have expressed concerns in relation to the retention of data for law enforcement purposes. The European Union in particular has publicly released its reflections. This issue has received attention most recently in the Commission of the European Communities Communication, Creating a Safer Information Society by Improving the Security of Information Infrastructures and Combating Computer-related Crime (26 January 2001, COM(2000) 890 final) which among other things addresses the issue of data retention. The Communication states:

Within the Commission, the Data Protection Working Party has been considering the issue of data retention for some time now.[3] The outcome of the work on this issue is a recommendation adopted by the Data Protection Working Party:

Similarly, the Spring 2000 Conference of European Data Protection Commissioners issued a declaration on the "Retention of Traffic Data by Internet Service Providers (ISPs)" stating:

The Conference emphasises that such retention would be an improper invasion of the fundamental rights guaranteed to individuals by Article 8 of the European Convention on Human Rights. Where traffic data are to be retained in specific cases, there must be a demonstrable need, the period of retention must be as short as possible and the practice must be clearly regulated by law.

It is evident that the privacy concerns raised by data retention have been engaged in Europe by stakeholders in the data protection and privacy sphere. The European Commission is now calling for “a constructive dialogue between law enforcement, industry, data protection authorities and consumer organizations as well as other parties that might be concerned … with a view to finding appropriate, balanced solutions…” (COM(2000) 890 final: pg.20).

3. Cost Issues

Another factor frequently cited in discussions on data retention is the issue of cost. In the Data Retention Workshop held in the G8 Government-Industry Dialogue on Safety and Confidence in Cyberspace (Berlin, October 2000), the following cost implications for data retention were identified:

Quantification of costs is difficult because of the different business models and profiles on the one hand, and the lack of specificity regarding potential data retention requirements on the other hand. For example, the cost implications associated with the collection and retention of all traffic data for all Internet services is different from restricting the retention to only particular logs or specified fields within specified logs. The retention period (eg., three weeks, three months, six months, one year, etc.) will have an impact not simply on the costs associated with storage but also on the administrative costs associated with the retrieval of relevant data (particularly in the absence of indexing). Isolation of relevant data (i.e., excluding all traffic data which pertains to other subscribers not specified in the judicial authorization) represents one of the costs of conforming to data protection or privacy legislation.

There may be other minor administrative or technological costs that could emerge as relationships develop between ISP and the justice system. For example, one means of ensuring the integrity of the chain of evidence is for a hashing function (such as MD5 or SHA-1) to be employed upon the data file by the ISP thereby guaranteeing that electronic evidence has not been tampered with en route. The receiving law enforcement agency or the court would then be able to check the hash value and confirm the data file’s integrity. Although clearly beneficial to the service of justice, such refinements may add new costs. Similarly, the ISP may bear costs if its personnel require preparation time to testify or appear as expert witnesses in court.

Finally, in the hypothetical event that particular systems (hardware or software) that are not amenable to specified collection or retention, become subject to data retention requirements, there could be substantial engineering costs for product development or retro-fitting.

4. ISP Business Models and Service Profiles

One of the implications to be drawn from the Berlin Workshop (see Appendix 1) was that ‘data retention’ is not simple because there are many variables:

In order to understand this complexity and demonstrate why blanket solutions may not be feasible, why a monolithic scheme could adversely affect business, and why the requirements may need to be refined to meet the public safety mandate, this paper considers the potential consequences for data retention of various business models characterizing Internet Service Providers (ISPs).

Although not an exhaustive list, the following examples of business models or service profiles represents the range of current Internet communications providers offering service to the public:

Not included in the above list are entities such as private corporations or government institutions which make available Internet services to their employees for work-related purposes as well as entities such as public libraries, schools or universities which may provide Internet services to the various groups they serve.

5. Preliminary Analysis: Data Retention Profiles

With respect to each of the business models or service profiles, one may be able to construct ‘Data Retention Profiles’ which take into account various factors including:

For the purposes of discussion, the following represents a preliminary step toward outlining some of these factors in different business models or service profiles.

In many respects this may be the simplest business model. There could be many hundreds or even several thousand ISPs of this sort in any given national jurisdiction of the advanced industrial economies. Conceived as an ideal type, the small ISP could have one or more servers on one or more physical machines linked together on some form of local area network (LAN). The following services would be deployed on these machines and, at the discretion of the system administrator or owner, could have logging turned on:

Small ISPs are highly sensitive to minor cost fluctuations and, in some circumstances, data retention could be regarded as a cost to be reduced through various strategies (minimal collection, expeditious deletion, prompt aggregation, etc.). Of course, logs for some of the services rather than others (eg., RADIUS logs for network dial-up access as opposed to Web logs) may be more likely to be regarded by small ISPs as necessary costs of business, particularly if the primary customer-base is dial-up residential service. Some small ISPs may not use RADIUS-style authentication services and would have to fall back on modem pool logging. Data retention periods are likely to be highly variable, ranging from non-existent to long-term (eg., burned on CD-ROMs). All available logs are likely to be under the control of a sole proprietor, although the manager of the ISP may not be the owner of the company in all cases. In environments characterized by voluntary regimes, given the high probability of individual ownership, data retention may be subject to the values or ideological position of the small ISP’s proprietor. A small ISP is likely to operate out of a single location with the servers and routers in that location. Note, however, that many small ISPs are beginning to migrate toward an out-sourced model (see (E) below) in order to take advantage of efficiencies of scale and the attendant cost-savings.

A national ISP with multiple points of presence represents a significant investment and operates with the same degree of sensitivity to community values as other medium- to large-scale corporations. Although there may be some geographic dispersion of the servers on which logs are kept, some of the complications to lawful access are minimized because this type of ISP operates within a single national jurisdiction. It should be acknowledged, however, that some challenges may arise with respect to the retention of traffic data logs when a large ISP begins to number its subscribers in the millions (not simply storage costs but the administrative costs associated with retrieval, for instance).

It has been argued that high-speed Internet access – whether cable modem or ADSL (Asymmetric Digital Subscriber Line) – differs from time-metered dial-up access in a number of respects. The most obvious difference according to some arguments is that the “unlimited use” typically associated with high-speed access services eliminates one of the business reasons for network access logging. However, many high-speed ISPs have data volume caps that require network access logging for billing purposes. Network access logging or back-ups of the configuration file for DHCP, may occur with respect to high-speed residential connections due to a variety of network management reasons associated with managing a shared network..

The largest Internet service providers typically operate on a multi-national or multi-jurisdictional basis. Two hypothetical cases come to mind:

There has been a tendency within the ISP industry to either provide or use out-sourced services such as out-sourced Web-hosting, out-sourced NNTP newsgroup access or out-sourced email. This results in systems which are distributed in multiple respects: (i) having distinct instances of ownership and control, possibly in different jurisdictions and (ii) potentially different geographical locations of the servers and network logs that are not merely divergent from the ISP but potentially in different national jurisdictions.

A number of new business models are currently being evaluated within the industry, including “free ISPs” and “virtual ISPs”. Among the free Internet service providers, some provide free access while others provide free services such as free Web-based email. With respect to data retention, a number of challenges arise from these models, for example, the nature of the subscriber information that has been collected if any (i.e., in some business models there is no need for subscriber information to be collected). In addition, the accuracy of the subscriber information that has been collected may be quite variable (it could, for example, be completely pseudonymous nor need there be any evidential link to a “real” person as might exist in a fee-based service that relies predominantly on credit card payment).

In the European context, as already noted, Directive 97/66/EC requires that traffic data must be erased or made anonymous immediately after the telecommunications service, unless necessary for billing purposes. With respect to “free ISPs” in Europe, these service providers are in principle not permitted to retain traffic data even if they were to collect it. Of course, European Member States may enact legislation specifically restricting the scope of this obligation to erase data and enable retention under specified circumstances.

The cyber-café is essentially a pay-by-the-hour access provider for occasional or transient users. These users typically have subscriptions with other services, such as a subscription to a local or national ISP or simply subscriptions to Web-based email or remail services.

The cyber-café may have optional logging on some services but being a cash-based pay-per-use service usually has no validation of their users. The accuracy of the information that may be available could be highly variable. If logs are being kept, traceability is only as identifiable as the connection to the transient user’s services. If those services are pseudonymous Web surfing, email, or IRC there may be little identifying information. On the other hand, cyber-café’s are not particularly amenable to user-installed secure (i.e., encryption-implementing) client software. Even if the user can install such a client on a cyber-café machine, the user has no assurance that sensitive material (such as passwords, passphrases or text-en-clair) is not being captured or logged.

The primary difference between an anonymous service and a pseudonymous service is that a pseudonymous service preserves an identity (an alias or ‘nym’) over a certain period of time. By contrast, an anonymous service in its purest form is essentially a one-shot or single transaction-set service. There are various kinds of anonymous and pseudonymous services, most being implementations of typical Internet services such as email remailers, Web surfing, IRC, and Usenet newsgroup access (or all of these). There are also degrees of anonymity and pseudonymity depending not simply on factors such as the underlying encryption and authentication software but on the nature and security of the anonymizing server or network of servers, the nym creation procedures and, in the case of pay services, the billing mechanism.

With respect to subscriber data, anonymous or pseudonymous services vary in the degree of linkage between the pseudonym and the end-user (many so-called anonymous services are misnamed and are actually pseudonymous). The service could have a subscriber database that strongly binds a given pseudonym to a given identity (this aside from the additional issue of whether there is any additional binding to a billing identity). Outside subscriber data as such, there can be a linkage at the level of traffic data between a pseudonym and a particular IP address (this can be more or less attenuated if traffic is routed through a chain of anonymizing servers). Traffic data may exist for different lengths of time at different points in the anonymizing chain or at the pseudonymous service provider itself depending on the software used, the network configuration, and the service the end-user has chosen (eg., email with reply blocks versus “one-way” email with no reply blocks at the service provider; HTML access through an “anonymizer” is designed to prevent the Web-server the end-user is trying to reach from knowing the IP address of the end-user but, for at least a certain amount of time, the anonymizing network server retains the end-user’s true IP address or at least the IP address of the preceding server in the anonymizing chain).

Conclusion

The above considerations of various business models or service profiles characterizing Internet Service Providers suggest a diverse range of potential consequences as to whether and to what degree data retention is being pursued or could be pursued. Further refinement of these models and exploration of the potential consequences could help clarify the discussion.

Different internet services are generally handled by different network devices (such as routers or servers). Traffic data is typically recorded in native audit logs for network accounting and network management purposes. Depending on how the service provider's site is configured, different logs could be stored on many different machines, controlled by different entities in different jurisdictions. The discussion recognized that not all types of records are kept in all systems or business models. Information that is available in current models may not be available in the future because of changes in technology, services and business models. When relevant information currently exists, in most countries it is normally available to law enforcement under due legal process. It was recognized that the level of the reliability of audit logs and user authentication/access control mechanisms varied and the practices of law enforcement should take this into account. It was noted that recognized international standards, for example, common criteria ISO 15408 may improve the quality of available information in the future.

The internet can be both the target for criminal activity and the conduit for the commission of traditional crimes. In both cases, data held by service providers can be highly relevant to law enforcement investigations.

Data may be held by service providers for varying lengths of time, depending upon business models, services and technologies. Some data is held for billing purposes, other data is held for system performance auditing. Time frames vary from a few seconds to longer periods that may be required or allowed, for purposes other than law enforcement, by their national legislation. Different types of traffic data are also held for different periods of time, for example network access logs, (RADIUS or TACACS+) have different business and data storage requirements than NNTP logs and as a result may, in certain circumstances, be available for longer periods. Content is not typically retained or available.

The group discussed in some detail the kinds of relevant data that may be stored under six different headings:

A user typically uses services from many different service providers, and logs may be stored at each service provider.

The existence of prepaid, flat rate and free services may result in anonymity and may reduce the availability of billing/accounting data and subscriber information for law enforcement investigations. The group was also able to point out the differences between business models around the world. The group also recognized that changes in business models does mean that information previously available in traditional business models may no longer be available from a single source or at all.

· Cost, security and privacy implications of information collection retention and disclosure

The group discussed at length the issues of cost, security and privacy. The group identified the following cost implications:

In some jurisdictions traffic and content data are subject to the same protection whereas in most countries content data normally requires a higher level of authorization for access.

Privacy concerns include varying data protection standards for commercial reuse, varying legal and security standards for data protection and government accesses to data.

It was recognized that law enforcement had concerns on this matter generally such as the availability and integrity of certain types of data. There was a specific concern as to the differences in data storage retention periods amongst service providers. Likewise, industry also had concerns on issues such as the cost and general implications of lawful interception requirements. It was also noted that all of these issues needed further discussion and clarification and all parties should further recognize that there would be likely other legal and cost implications for all parties.

It was recognized that the joint dialogue within the workshop was extremely useful in all sides gaining a better understanding of each other’s concerns and should continue in the same format with as much continuity of attendees as possible.

The following subjects were considered as worthy of discussion at future meetings:

The following is a list of log details related to some services that are available through a typical internet service. It should be noted that the content of these logs might be subject to relevant business, technical and legal conditions; not all of the following data elements will be available in all logs.

¹Reliable time records among different computers and networks is essential for investigation and prosecution. The use of the Network Time Protocol (NTP) for synchronization should be an ISP Best Practice.

²CLI provides the number from which a telephone call is made and may or may not be available to ISPs. CLI retrieval is specific to the given combination of software and hardware.

[1] In Germany, for example, subscriber-related data which pertains to the contractual relationship between a service provider and the user of the service (and could include user’s name, address, type of contracted service, etc.) is protected by national data protection regulations and can be provided to LEAs by request and without a specific judicial order. On the other hand, traffic data is regarded as telecommunications data and as such is under constitutional protection (Gesetz zur Beschränkung des Brief-, Post-, und Fernmeldegeheimnisses (Gesetz zu Artikel 10 Grundgesetz - G 10) Law on restriction of secrecy of letters, mail and telecommunications – Law applying to article 10 of the Constitution); it can only be provided to LEAs by means of a specific order by a judge or public prosecutor.

[2] Art. 14 of Directive 97/66/EC and art 13 of Directive 95/46/EC.

[3] In Recommendation 3/99 On the preservation of traffic data by Internet Service Providers for law enforcement purposes (which was written intentionally for review at the G8 summit of that year) the Working Party states:

As a general rule, traffic data must be erased or made anonymous as soon as the communication ends (Article 6 paragraph (1) of Directive 97/66/EC). This is motivated by the sensitivity of traffic data revealing individual communication profiles including information sources and geographical locations of the user of fixed or mobile telephones and the potential risks to privacy resulting from the collection, disclosure or further uses of such data. Exception is made in Article 6 (2) concerning the processing of certain traffic data for the purpose of subscriber billing and interconnection payments, but only up to the end of the period during which the bill may lawfully be challenged or payment may be pursued.

The Working Party concurs with the more recent statement in the Communication (Creating a Safer Information Society) on adopting measures for lawful access and retention, although Recommendation 3/99 perhaps outlines more fully the criteria for such access:

Any violation of these rights and obligations is unacceptable unless it fulfils three fundamental criteria, in accordance with Article 8 (2) of the European Convention for the Protection of Human Rights and Fundamental Freedoms of 4 November 1950, and the European Court of Human Rights’ interpretation of this provision: a legal basis, the need for the measure in a democratic society and conformity with one of the legitimate aims listed in the Convention. The legal basis must precisely define the limits and the means of applying the measure: the purposes for which the data may be processed, the length of time they may be kept (if at all) and access to them must be strictly limited. Large-scale exploratory or general surveillance must be forbidden. It follows that public authorities may be granted access to traffic data only on a case-by -case basis and never proactively and as a general rule.