Microsoft Word is currently the word processing software of choice for most individuals and companies. Many users are under the mistaken belief that the final version of the "visible" Word document is the only substantive content contained in the "saved file."
Beyond the visible document and hidden in Word files is data known as "metadata". Metadata can include things like revision history, authors, and "track changes" which reveals the evolution of a document and the various edits that led to the final Word file. According to Microsoft metadata found in Word files can include:
• Your name
• Your initials
• Your company or organization name
• The name of your computer
• The name of the network server or hard disk where you saved the document
• Other file properties and summary information
• Non-visible portions of embedded OLE objects
• Document revisions
• Document versions
• Template information
• Hidden text
This leads to two important action items - First, if you are creating and saving Word documents get rid of the metadata prior to circulating Word files and saving them for posterity - you can use a free software tool like Microsoft's "rhdtool" or a more robust product like Workshare Protect to remove metadata from Word and other Microsoft Office files. Better yet convert the Word file to a PDF document (and make sure the metadata is removed from the PDF file as well).
Second, if you get Word files in electronic discovery or via investigations you may want to analyze the Word file metadata for hidden information which may reveal substantive evidence in your case - you can use a software tool like Workshare Protect to analyze the document files and provide summary reports of the metadata.
For example, in one high profile matter great care was taken to remove comments and edited text metadata from a Microsoft Word document but some metadata remained with serious consequences.
In 2003 a memo was prepared by British Prime Minister Tony Blair's office to support the notion that UN weapons inspections were not working in Iraq and that military action was justified. The memo was used by US Secretary of State Colin Powell in support of military action in Iraq. The memo was posted on the Prime Minister's web site in Microsoft Word "dot doc" format.
Two investigative events occurred related to the Iraq memo Word file. First, Glen Rangwala of Cambridge University compared the memo to an article published by a US graduate student in 2002 and found that large portions were cut and pasted or copied - grammatical errors and all - from the graduate student's article to the memo. None of the copied text was credited to the original author.
Second, Richard M. Smith a privacy and security expert in the US downloaded the Word document from the Prime Minister's web site and extracted revision history metadata - ten revisions in all. The revision history supported the view that the Prime Minister's press office was deeply involved in the Iraq memo's preparation. For example, those involved in editing the memo likely included, based on the metadata, Murtaza Khan who was a press officer, Alison Blackshaw who was a personal assistant to the Prime Minister, John Pratt who worked at 10 Downing Street, and Paul Hamill who was a foreign office official. Unfortunately for the Prime Minister, the metadata seemed to point to his staff as the cut and paste technicians who may have copied portions of the US graduate student's article.
I went ahead and performed a similar type of analysis on the UK-Iraq memo Word file using Workshare Protect's metadata analysis tool and here is a partial screenshot of what I found which is consistent with Richard M. Smith's metadata analysis.
The UK-Iraq memo metadata debacle is a powerful demonstration why attorneys and investigators must be sensitive to the metadata issue in everyday communications as well as in e-discovery and should acquire the appropriate tools for removing metadata and analyzing metadata in the appropriate context.