Hacking the Associated Press Spy Code

Donate $25 for two DVDs of the Cryptome collection of files from June 1996 to the present

20 August 2009. Updated.

18 August 2009. Formerly on the Cryptome home page.

http://www.niemanlab.org/2009/08/heres-the-ap-document-weve-been-writing-about/

Here’s the AP document we’ve been writing about

I’ve been writing this week about The Associated Press’ plans to rethink what it means to be a wire service on the Internet. Much of the reporting began with a document entitled, “Protect, Point, Pay — An Associated Press Plan for Reclaiming News Content Online,” which was distributed to AP executives, board members, and some members late last month. Though I have some more to say tomorrow, this seems like a good time to release the seven-page document in full. Excerpts:

Protect, Point, Pay – An Associated Press Plan for Reclaiming News
Content Online: Part I
The evidence is everywhere: original news content is being scraped, syndicated and monetized without fair compensation to those who produce report and verify it. AP’s legal division continues to document rampant unauthorized use of AP content on literally tens of thousands of Web sites. The problem is quickly spreading to mobile, where new applications are cropping up daily that do little more than repackage the efforts of AP and others, siphoning off consumers and revenue from those whose content is being exploited.
AP News Registry is a way to identify, record and track every piece of content AP makes available to its members and other paying customers. What makes it different from other similar efforts is that it is being designed at the outset with a rights framework that will provide an enforceable way for AP to grant and monitor specific rights in its content.
When AP distributes a news item, that content will be wrapped in a container that will include rights information and a tracking beacon that will send reports back to the core database each time the item is clicked on by an end user. The beacon will identify each piece of content, the IP address of the content viewer, the referring Web server and the time of use. The content will also be wrapped in a simple piece of code that will travel with the content wherever it is posted on the Web and that spells out, in both human and machine-readable forms, what may and may not be done with the content.

__________

This scheme should be easy to crack, probably first by a Russian wizard at ElcomSoft. When AP implements, send cracks to cryptome[at]earthlink.net

A sends:

It should be obvious that the "analog hole" exists for AP content. You simply use the image of the rendered text. This is the same hole in, e.g., Adobe PDF limitations on printing a doc.

For extra credit, the images could be OCR'ed. Any such tool would have wide applicability (e.g., recovering text from scanned books) and likely already exists. (Google?)

Since the "player" will be (must be) resident in your computer's memory, it is subject to reverse engineering and bypassing, as you write, though that program would likely subject to DMCA legal attack in the borders of the US regime. [Recall that ElcomSoft's Dimitri Skylarov was arrested at a US conference for exactly this attack on Adobe, although later released. ElcomSoft sells a slew of Adobe and other crackers online in the US and around the planet. What is not clear is whether ElcomSoft's crackers have tracking code within them which could be used by AP or officials. No cracker is wholly trustworthy due the lucrative rewards for cooperating with the protection racketeers of gov, com, edu and the damnable orgs camming your GS(E) phlap.]

A2 sends:

I can't imagine how you wouldn't hear from a bunch of other Linux users how we could do this.

Aside from older console-based tools, the standard PDF reader for the GNOME desktop (Evince) allows removing text, at least, without any other content transfering over. If I really need the images, I can always scrape off a screenshot, cut and paste it back in. Generating clean PDFs is pretty simple using Open Office after that. It's annnoying to work that way, but it will get things done if someone has time and a compelling reason.

For those needing more elegant tools, PDFEdit comes to mind. I'm sure there is at least one free tool for Windows doing the same thing.

A3 sends:

Follow up to my previous message.

The Sumatra PDF Reader is a little clunky, but works well enough for reading. It's Open Source, as well as free. I rarely use Windows, but had a chance to test Sumatra a year ago and had no trouble scraping text from a copyrighted PDF file. Get Sumatra PDF here:

http://blog.kowalczyk.info/software/sumatrapdf/index.html

Another older tool was based on the Ghostscript libraries for interpreting the Postscript printer language. PDF is a specialized extension of Postscript. These older tools are available for Windows, and have been used to open locked PDF files, though I confess I have never done it myself. As I understand it, it's a simple matter GSView simply does not honor the protocol locking files. I believe you can scrape text with the mouse, too. You'll need to install GSView and the Ghostcript libraries, both.

http://pages.cs.wisc.edu/~ghost/gsview/index.htm