The leak investigation of the draft Supreme Court opinion overturning Roe v. Wade raises important issues for journalists as well as potential sources.
FOLLOWING THE PUBLICATION by Politico of Supreme Court Justice Samuel Alito’s draft majority opinion to overturn Roe v. Wade, Chief Justice Roberts authenticated the leaked document and stated that he had “directed the Marshal of the Court to launch an investigation into the source of the leak.” Whether or not the leak itself was illegal, however, the question of how a technical investigation of this document would proceed raises some interesting issues for journalists as well as potential sources.
Leak investigators have three key areas to analyze for clues: the document itself, the environment the document circulated in, and the potential identity of the leaker. Each area in turn presents lessons and opportunities for would-be leakers to adopt various counter-forensic strategies to subvert future leak investigations.
Since the leaked opinion appears to be a scan or photocopy of a paper document instead of a transcription or recreation, the image can be analyzed for any unique markings that might allow investigators to pinpoint which particular physical copy of the document was leaked.
The first page includes several such potentially unique identifying markers, including a highlighted title, a page bend, and what appear to be staple perforations.
Other pages also reveal subtle markings that could identify the specific paper copy of the leaked document. For instance, the bottom-left region of page 90 has a singular speck; the fact that it is not present on other page images indicates that it is a stray mark present only on that physical page of the document, as opposed to being a dust flake on the scanner bed.
If investigators managed to locate a physical copy of the document matching the characteristics found in the leaked file, that would allow them to conclude that it was the physical copy that was leaked. This is significant, because it could establish the provenance of the document, which could in turn identify potential leakers.
For instance, if it were known that this particular physical copy of the document was handled by certain specific persons, those individuals would naturally fall under suspicion — though of course a scenario exists in which someone outside the intended chain of custody could have obtained the physical copy, for instance, simply by picking it up from someone else’s desk or by finding it on a photocopier. Then again, there is also the possibility that the original source of the document is digital and that the source printed out a copy prior to leaking it, or that Politico itself printed out the digital copy prior to publishing it.
Investigators could also analyze the metadata of the digital version of the document using software such as ExifTool for any clues about when, where, how, or by whom the digital copy was created. They could also exploit potential information-leaking vulnerabilities in the PDF creation and redaction process, which could inadvertently leave unintended and potentially identifying information in the digital document.
In addition to the document itself, leak investigators will likely pay attention to the environment in which the leak originated. Modern commercial office printers generally come with a variety of ancillary functions like photocopying and scanning, while also typically keeping a running log of jobs the printer performs, which may include such information as the file name and page count of the document, the date and time the job was performed, as well as the username or IP address that initiated the job. If the printer also offers the capability to email a photocopy or scan of a document, a log may keep track of which jobs were sent to which email addresses and could even store a copy of the digital document in its memory.
Investigators will likely perform an audit of printer and network logs to see which staff members opened or otherwise interacted with the document in question. Investigators could also explore who had occasion to access the document as part of their day-to-day duties, as well as where the particular copy of the leaked document was physically stored, and who had occasion to access that space.
The practice of anomaly-based insider threat detection involves investigating staff who display any kind of irregular behavior or activity. For instance, if a staff member usually swipes into the office on work days at 8 a.m. and swipes out at 5 p.m., but access logs show them coming into the office at 10 p.m. on a Saturday in the days leading up to the leak, this finding would likely subject that staff member to scrutiny, which could include analyzing available surveillance footage.
Staff computer and phone usage, particularly web browsing, could also be analyzed to see if anyone previously visited the news site that published the leak, in this case Politico, or visited other webpages of potential interest, such as any that describe whistleblowers or leaking. Rudimentary analysis could include looking through desktop browsing history, while a more thorough and sophisticated investigation would involve analyzing network traffic logs to determine whether Politico was accessed from a mobile device connected to the office Wi-Fi. Though of course in the case of Politico, a news website that covers politics and policy, it is likely to show up in quite a lot of staff logs and thus would likely not be a particularly fruitful finding for investigators.
“Sentiment analysis” may also be performed as part of an insider threat investigation by analyzing the various thoughts and opinions expressed by staff members in office communications. This kind of analysis could also utilize what’s often called “open source intelligence,” in the form of looking at staff social media postings to see if anyone had expressed interest in Politico, or any thoughts about the Alito opinion, or generally any signs of disgruntlement with their employer. Additionally, sentiment analysis may also include a review of staff postings on internal forums, as well as emails and private messages sent via channels controlled by the employer, such as direct messages sent over Slack.
Takeaways for Would-Be Leakers
These potential methods of leak investigation may also be interpreted as lessons for future leakers to evade identification by adopting a number of counter-forensic measures.
To reduce the potential amount of information investigators may glean from a leaked document, leakers could send journalists a transcription or reproduction of the document instead of the original source document itself. While a transcription of the document will not successfully pass a barium meal test — in which each individual is given a uniquely phrased copy of the document, sophisticated forms of which may deploy natural language watermarking, subtly altering the syntactic structure of every version of a document — it would nonetheless neutralize all other attempts at source document identification. Transcription would bypass efforts at identifying either errant or intentional markings on a page, as well as attempts at identifying positional watermarks such as subtle shifts in character or line spacing unique to each version of a document. Of course, this also would make it harder for journalists to verify a document’s authenticity, and care would have to be taken to ensure that the source left no identifying metadata in the transcription file.
Office equipment would best be avoided when making copies of a document, but using personal equipment can also be fraught with risk. Source camera identification is the forensic process of identifying the camera that took a particular photo. At times, this sort of identification may hinge on obvious features such as visible scratches on a lens or dead pixels on a screen. In other situations, the unique characteristics of an image might not be visible to the naked eye, but instead might be based on the unique image sensor noise each camera produces, otherwise known as photo response non-uniformity.
In other words, if leaked photographs of a document were to emerge, and leak investigators had particular suspects in mind, they could analyze photos posted to social media by the suspects to see if they provide an algorithmic match to the noise pattern in the leaked photos. When making audio recordings or photographs, therefore, it would be best practice to adopt the principle of one-time use: Use a temporary device like a cheap camera or smartphone that will be used only for the purposes of the leak, and then discard the device.
To avoid falling afoul of anomaly-detection triggers, would-be leakers might consider incorporating document acquisition as part of their normal routine instead of engaging in uncharacteristic behavior like clocking in at the office at odd hours or downloading files en masse. Likewise, leakers should avoid browsing news outlets while at work, both on their personal and of course work devices. Expressing any kind of disagreement or dissatisfaction with employer policies or decisions on either a company, public, or personal forum (such as during happy hour drinks) is also best avoided, as rigorous insider threat monitoring may keep tabs of any such behavior.
Leaking and subsequent leak investigations are back-and-forth games of forensics and counter-forensics, of operational security and its failures. While the risk of source identification can never be entirely eliminated, there are nonetheless various practical technical countermeasures which can be adopted to reduce the additional risk to sources who are already risking a great deal.