Controlling the Accidental Release of Digital Information

July 9, 2007

In an age when virtually all documents are created on computers, it has become second nature to electronically share these materials through e-mail, extranets, and USB flash drives. Unfortunately, many people don’t fully understand exactly what information is contained in the files they are distributing. In the legal community, where clients routinely entrust sensitive and privileged information to their legal counsel, this lack of understanding can have significant consequences. Imagine how you would feel if you were counsel in any of these situations:

In a products liability matter, both parties have been ordered by the court to exchange final trial exhibit lists in electronic format, such as an Excel spreadsheet. Rather than e-mail the file, your trusted paralegal copies the file onto a floppy disk that he sends via FedEx. Several days later, your opposing counsel acknowledges receipt of your exhibit list, but he also notes that the floppy disk contained an unrelated database apparently relating to another litigation matter on which you are also working. Have you just waived work product privilege on that work?

You are defense counsel in “bet the company” products liability litigation that is vital to your client’s continued existence. Your document sweep includes large numbers of electronic documents that reflect all aspects of the product’s development and management decisions about how to develop and market the product. Many of these documents must be redacted to eliminate sensitive personal information or privileged material. The litigation support vendor with which you are working recommended these materials be produced in PDF format so that the redactions can be “burned” into the image, permanently obscuring this sensitive information.

However, in preparing the PDFs, the vendor neglects to process the underlying searchable text layer of the PDF files. As a result, while the visual layer of the PDF does not show privileged or confidential text, users can locate and extract all of this information from the PDF’s hidden text using basic features built into Adobe’s free Acrobat Reader software or any number of litigation support software systems.

As in-house counsel, you are working with an outside law firm with whom you have an excellent relationship. You have just received their latest piece of work-a complex contract that you’ve been told reflects hours of high-level partner analysis (and billable time). The law firm sent you the contract in Microsoft Word format so that you can add a final bit of contact information and tweak the language in one particular section.

On a whim, you click on the “properties” menu option, where you discover that the embedded file title shows that it was originally prepared for a different client about a year ago; the work product has apparently been recycled. More disturbing, the document metadata reveals that it has only been opened twice and edited for a total of 35 minutes in the past six months instead of the hours of time described to you. And finally, the document author information indicates that the file was originally drafted, not only by an associate (not a partner), but by another law firm entirely! How do you now feel about your outside counsel?

Each of these fact patterns is an actual historical situation, not a creative hypothetical. And, as is obvious, each of these accidental lapses in good digital information management carried potentially significant consequences. Further, each of these situations could have been avoided.
What, then, should attorneys do to reduce the risks of creating these digital disasters?

1. Understand How Electronic Documents Are Organized
It is not important that attorneys understand all the information that is included in electronic work product that they create. However, it is vital that they understand the significance of the information that could be included.
All digital files share a number of common components. In addition to substantive content, which obviously differs from file to file, digital data files also contain a certain amount of information (aka “metadata) that is part of the file but not ordinarily visible when the file is displayed. For files created on a personal computer, common metadata includes information about file title, file type (e.g., “.doc,” “.xls”), the date and time that a file was first saved, the date and time that the file was last saved, and the network login ID of the computer that created the file (which may or may not be the same as the substantive author of the document).

This is only some of the metadata information that is stored for every file; other file-related information may include text and formatting deleted from the document during and other pieces of information about a file’s history. Some of this metadata information is generated when the file is opened; other information, such as last access and last edit date, updates itself each time the file is opened.

One particularly misunderstood electronic document is the PDF file. Many attorneys have been taught that “native” files (e.g., Word documents, Excel spreadsheets) can contain sensitive metadata, but they also believe that this metadata disappears if they digitally “print” final versions of their work product as PDF files. While rendering an electronic document into PDF does create an entirely new document, some PDF creation software affirmatively transfers select metadata from the source file to the new PDF file it creates.

In addition, even if no metadata is transferred from the source file, a PDF contains its own metadata, which may also contain delicate information. For example, PDF files ordinarily show the name of the original file from which the PDF was created and the network/computer identification of the person who ran the PDF process. If the PDF was edited using Adobe Acrobat or other PDF management software, the PDF will show its editing history-just like a Microsoft Office document.

PDF file properties also show security settings, such as whether the PDF can be printed or have its text extracted. None of this information, with the exception of the source file name, comes from the original document; all of it is generated as part of the PDF creation process. (As an aside, PDF security is overrated; multiple software programs can bypass PDF security settings to manipulate the file.)

2. Use Metadata-Scrubbing Tools
The easiest way to avoid accidentally distributing sensitive information is to make sure that outgoing files don’t contain any. A variety of software developers have long offered reasonably-priced “metadata scrubbing” tools that automatically clear metadata information in a wide range of data files. In addition, Microsoft also offers a metadata scrubbing plug-in for a number of its software programs, most notably Microsoft Word. Most of these tools can be adjusted to remove only selected metadata, so that the receiving party can receive innocuous information.

If metadata management tools are not available, several simple practices can still reduce the amount of metadata that you release with your files. First, save a fresh copy of the file under a new name just before sending it to an external recipient. The new file that you create will not contain the extensive editing history that may be associated with the working file, though it will show fresh metadata, such as that file’s creation date, the internal title of the file, and the computer login ID of the machine that created the new file. A second approach is to render documents to PDF.

As noted, a PDF contains its own metadata, but it will shed imbedded edits and other hidden information that is often part of the source file. A PDF can have its metadata further reduced if it is rendered without imbedded searchable text. The resulting PDF acts more like a digital photograph of the document than the editable, original native version, but it can be printed out and will contain the substantive information-the “body text” that the author intended to communicate.

3. Double-Check Materials Before They Are Distributed
Even if a law firm has deployed metadata management tools, it’s always useful to spot-check documents before they go out. Sometimes, metadata scrubbing tools may be set incorrectly or someone may have forgotten to run them. A brief check of document properties will show enough metadata to determine whether or not additional protective measures are required. In large document productions, sampling a few files should ordinarily suffice for this level of quality control check, especially if all the documents were created at the same time in a single process.

On a related note, firms should always produce materials on fresh media. Writing materials to CD-R or DVD-R media provides the greatest guarantee that the media contains no residual data from prior use. For voluminous materials, it may make sense to purchase a fresh hard drive instead of re-using an existing drive. Software that can read deleted hard drive information and even un-format hard drives is inexpensive and widely available. With the cost of new hard drives plummeting to new depths almost every day, re-using old hard drives saves little money and may distribute a treasure trove of “extra” data.

Conclusion

As a practical matter, all of us will continue to leak extra digital information in our electronic work product. Often, this metadata is unimportant and creates no risk. However, practitioners who work with sensitive information may want to implement consistent metadata scrubbing procedures to ensure that these measures will be automatically followed when it matters. Further, even without investing in comprehensive metadata scrubbing technology, modest measures can greatly reduce the risk of accidental release of sensitive information.

In an age when “reasonable good faith” remains an important standard at law for measuring behavior, even a limited amount of heightened attention can help persuade a court that a law firm took reasonable measures to protect its sensitive information.

By Fios Inc.:
“Conrad J. Jacoby, Esq. is a member of The Sedona Conference® and a contributing columnist for Fios, Inc. His work focuses on the areas of information management, e-discovery, and litigation support.”