Welcome to Moredata
This web site focuses on electronic discovery and evidence methods for litigators including software tools, cases, best practices, and rules.
My name is Ira P. Rothken and I am a high technology litigator. My law practice involves a lot of electronic discovery and Internet related litigation. My law firm's web site is located at techfirm.com and here is a sample of the kinds of cases I have handled from a CNET News.com article profiling my legal career defending Internet technology companies in complex litigation and a general profile on the Sedona Conference web site. I was a computer programmer and medical researcher before I became an attorney and I still write software code. I like to use interesting new software tools and customized software like spiders and crawlers in the investigation and litigation of my cases. I also like to demystify all the hoopla over electronic discovery or "e-discovery." I named the site Moredata since it seems in many cases in this Internet connected world, especially class actions and complex business litigation matters, litigators are saying to themselves as the cases evolve - "oh no more data?!"
I hope this site helps you to answer e-discovery and electronic evidence questions and find useful information and software tools.
If you are an attorney and need assistance on electronic evidence or e-discovery issues you may click here to contact us.
The Sedona Conference® Commentary on Preservation, Management and Identification of Sources of Information that are Not Reasonably Accessible
The Sedona Conference® Working Group 1 recently published a Commentary on the issues of preserving, managing, and identifying not reasonably accessible electronically stored information or “NRA ESI". The result is a five-step framework for analysis and six Guidelines for making reasonable, good-faith assessments where no “bright line” rules exist. Ira P. Rothken is a co-editor of the above Sedona Conference Publication entitled Commentary on Preservation, Management and Identification of Sources of Information that are Not Reasonably Accessible.which can be downloaded here and here is a web seminar on the topic. You can find a review and summary of the commentary here in which the author states that "...I highly recommend that you read this new publication and, more importantly, consult it when faced with tricky preservation issues."
It used to be that there was some unwritten understanding amongst lawyers and judges (or so it seemed) that unless it was the absolute core evidence in the case that litigants did not need to go out and extract, analyze for relevance, and preserve cell phone data. Cell phone data sort of fell into the same category as voice mail messages in the world of corporate discovery - decentralized, burdensome, transient, hard to extract, hard to preserve, and in the case of voice mail who is going to listen to a 100,000 voice mail messages to pick out a few that are relevant? Rather than deal with such difficult "new" e-discovery issues lawyers and judges for the most part turned a blind eye to it unless someone really pushed it as crucial evidence in a case.
The tide has now changed and cell phone data is now quickly becoming a part of basic e-discovery.
Why is cell phone data becoming an important part of e-discovery?
One simple reason - with thanks to RIM (Blackberry), Apple (iPhone), and Microsoft (Windows CE devices), cell phones with PDA functionality are quickly becoming the main communications device that all business persons are using both for written (email, text messaging) and phone communications. While it is still a secondary device for web surfing (and that is changing given the increased screen resolution and QWERTY keyboards of such devices) there are many internet connected business applications in use for crucial functions such as data analysis, database input and reporting, and interfacing with servers. The email and business applications on such "cell phone" devices making the devices into the smallest form factor of a de facto laptop computer. In other words the cell phones devices have progressed to the point that they have similar business communications functionality to a laptop computer and given their small size they are, in many instances, used more often.
How do you extract cell phone data for e-discovery?
There are many ways to extract cell phone data some more elegant than others. Indeed, with some cell phone/PDA devices a large part of the data (but not all) including email can be synchronized to a central server by using, for example, the functionality found in Blackberry Server and Outlook Exchange.
There are also applications that on a one by one basis use the data port on the cell phone device to communicate with the USB port on a laptop running specialized software the runs the extraction. In this category a new product called the Cell Seizure Investigator (CSI) Stick from Paraben shows some promise.
Paraben's CSI Stick is a portable cell phone forensic and data gathering hardware and software tool. According to Paraben the way it works is that you select the colored cell phone tip (compatible with the communications port) for the cell phone model to be acquired, plug the power adapter in, plug the CSI Stick into the cell phone,and press the acquire button and the extraction begins.
The CSI Stick contains a switch where you pick the level of data extraction including:
- A logical copy gets all available active data (including text and multi-media files)
- The text filter copies all SMS and text messages, phonebooks, and call logs
- The multi-media filter copies all available pictures and movies
- A physical copy gets all memory on the device
In order to use the data extracted by the CSI stick you will need to own a copy Paraben's Device Seizure or Device Seizure Lite. According to Paraben "these advanced forensic analysis tools enable you to view, search, and report on data extracted from handheld devices."
The CSI Stick currently needs to beef up its cell phone and PDA device support as it only handles certain Motorola and Samsung phone models but Paraben indicates that more manufacturer support is coming soon. Frankly, Blackberry and iPhone support are very much needed given the popularity of such devices.
On the flipside, it seems that a large organization faced with the need to preserve a huge installed base of cell phone data will need to look to some other technology - more likely one that is centralized with synchronization. Such a software application will need to run over a large number of devices like the Blackberry Server and provide the same level of extraction as the CSI Stick - so that larger organizations with a large installed base of cell phone and PDA users can have a practical method of preserving cell phone/PDA data on demand - for example in a litigation hold climate.
The CSI Stick makes cell phone evidence gathering and analysis available to the masses including lawyers, investigators, and all others in the e-discovery chain who should consider it when faced with the daunting challenges of the current state of e-discovery and related investigations.
With all the hoopla over the revised Federal Rules dealing with electronic discovery one must not lose sight of the notion that the vast majority of cases in the United States get tried in state courts. While a number of states have used the Federal Rules as a guide a number of states have enacted statutes covering e-discovery issues.
The K&L Gates e-discovery site recently published a useful list of current state e-discovery statutes - I found it helpful so I am reproducing the list below - here is a link to the K&L Gates site. I have seen a draft version of e-discovery statutory changes from California so don't be surprised if that state will soon be added to the list.
Amendments to Rules of Civil Procedure 16, 26, 26.1, 33, 34, 37 and 45
Connecticut Practice Book, Superior Court - Procedures in Civil Matters
Sec. 13-9. Requests for Production, Inspection and Examination; In General (see subsection (d), at p. 192 of 259-page .pdf document)
Idaho R. Civ. P. 34
Illinois Supreme Court Rules 201(b)(1) and 214
Amendments to Rules of Trial Procedure 26, 34 and 37
Rule 26. General provisions governing discovery
Rule 34. Production of documents, electronically stored information, etc.
Rule 37. Failure to make or cooperate in discovery; Sanctions
CCP 1424 - Scope of discovery; trial preparation; materials
CCP 1460 - Option to produce business records
CCP 1461 - Production of documents and things; entry upon land; scope
CCP 1462 - Production of documents and things; entry upon land; procedure
Amendments to Rules of Civil Procedure 16, 26, 33, 34, 37, 45
Miss. R. Civ. P. 26(b)(5)
Mont. R. Civ. P. 16(b). Scheduling and planning
Mont. R. Civ. P. 26(b). Discovery scope and limits
Mont. R. Civ. P. 26(f). Discovery conference
Mont. R. Civ. P. 33(c). Option to produce business records
Mont. R. Civ. P. 34(a). Scope
Mont. R. Civ. P. 34(b). Procedure
Mont. R. Civ. P. 37(e). Electronically stored information
Mont. R. Civ. P. 45(a). Form - issuance
Mont. R. Civ. P. 45(c). Protection of persons subject to or affected by subpoenas
Mont. R. Civ. P. 45(d). Duties in responding to subpoena
Superior Court Rule 62. (I) Initial Structuring Conference (see subsection (C)(4))
Part IV - Rules Governing Civil Practice in the Superior Court, Tax Court and Surrogate's Courts
Rule 1:9. Subpoenas
Rule 4:5B. Case Management; Conferences
Rule 4:10. Pretrial Discovery
Rule 4:17. Interrogatories to Parties
Rule 4:18. Discovery and Inspection of Documents and Property; Copies of Documents
Rule 4:23. Failure to Make Discovery; Sanctions
Uniform Civil Rules of the Supreme and County Courts, § 202.70 Commercial Division of the Supreme Court
See Rule 8. Consultation prior to Preliminary and Compliance Conferences
Local Rules of North Carolina Business Court
See Rule 17.1 - Case Management Meeting
See Rule 18.6 - Conference of Attorneys with Respect to Motions and Objections Relating to Discovery
Tex. R. Civ. P. 196.4 Electronic or Magnetic Data
Utah R. Civ. P. 26. General provisions governing discovery
Utah R. Civ. P. 33. Interrogatories to parties
Utah R. Civ. P. 34. Production of documents and things and entry upon land for inspection and other purposes
Utah R. Civ. P. 37. Failure to make or cooperate in discovery; sanctions
Utah R. Civ. P. 45. Subpoena
Effective November 1, 2007
The inevitable e-discovery collision has occurred in the Torrentspy case (entitled Columbia Pictures et al v. Bunnell et al in which the author is defense counsel) between the desire of litigants to procure data that may be available through transient RAM and what in essence constitutes a tangible "document" for purposes of responding to a request for production of documents.
Counsel involved in similar "RAM" e-discovery issues should research and be mindful of the core discovery principals at stake as the e-discovery cases evolve.
It is axiomatic that in response to a request for documents no litigant should be compelled to create, or cause to be created, new documents solely for their production. Federal Rule of Civil Procedure 34 requires only that a party produce documents that are already in existence. Alexander v. FBI (D.D.C. 2000) 194 F.R.D. 305, 310.” Paramount Pictures Corp. v. ReplayTV (C. D. Cal. 2002) CV 01-9358 FMC (Ex) (filed May 30, 2002) 2002 WL 32151632.
The phrase “Electronically Stored Information" was added to Federal Rule of Civil Procedure 34 in 2006. The Advisory Committee Notes to the 2006 Amendment of Rule 34(a) state that the rule “applies to information that is fixed in a tangible form” and that the definition “is expansive and includes any type of information that is stored electronically.” The Notes are silent as to any compact unity or functional integrity that the information must have in time or place of fixation. In the light of a generally cautious approach, it would appear that the silence is intentional. (“The wide variety of computer systems in use, and the rapidity of technological change, counsel against a limiting or a precise definition.”)
Counsel should be cautious as to any request for documents that seeks "documents" that only come about through the reduction of transient RAM to a different format "fixed in a tangible form" - especially if the production of documents requires the ongoing collection of data from transient RAM, gathering the data together into one or more files and then storage of the files in a tangible medium. Such a request for documents that requires the reduction of transient RAM to another format "fixed in a tangible form" appears to deviate from Federal Rule of Civil Procedure 34 which seems to mandate that such documents "already" be fixed in a tangible form as a condition of the requirement of production.
The Federal District Court gave a practical rule of determination in ReplayTV, supra:
“A party cannot be compelled to create, or cause to be created, new documents solely for their production. Federal Rule of Civil Procedure 34 requires only that a party produce documents that are already in existence. Alexander v. FBI (D.D.C. 2000) 194 F.R.D. 305, 310.” Further:
“It is evident to the court, based on Pignon’s declaration, that the information sought by plaintiffs is not now and never has been in existence. The Order requiring its production is, therefore, contrary to law. See National Union Elect. Corp. v. Matsushita Elec. Indust. Co., 494 F.Supp. 1257, 1261 (E.D. Pa. 1980).” (Footnote omitted.)
The phrase “Electronically Stored Information" was added to Federal Rule of Civil Procedure 34 in 2006 and the time of this writing there are few if any cases describing the contours of what "electronically stored information" is and what is considered "tangible". Given the lack of detailed legislative and judicial guidance regarding the revised Rule 34 there is a need for courts to provide clarity on how litigants should conduct themselves (especially when it comes to transient RAM data) in order to: create predictability regarding e-discovery compliance, reduce the ambiguity of data preservation letters, avoid the potentially harsh economic and social consequences of e-discovery over compliance, and to minimize the burden on the Judiciary caused by parties rushing to Court in a chaotic manner raising motions to compel, spoliation claims, and motions for protective orders.
Is Data in RAM a Document that Must be Preserved and Produced in Response to a Request for Documents?
Torrenstspy appealed today the Magistrate Judge's Log File Order. In a case I am handling we asked the court in our brief to reverse or modify an e-discovery ruling that finds data in RAM to be a document that needs to be preserved and produced in response to a request for documents in civil litigation. Here is an article from Law.com and News.com on the matter.
We argued in our appeal, amongst other things, that:
Torrenstpy has never had server logging turned on and thus log files were never created. The Magistrate committed error when she ordered the creation of log files as it is axiomatic that in responding to a request for documents one need not be compelled to create documents. The Magistrate reasoned that such log file data was in existence as it (or the “HTTP header information”) was present in RAM for a transient moment (like frankly every web server in the world) and then the Magistrate used copyright law to decide in essence that RAM was sufficiently tangible to constitute a document for purposes of preservation duties and production of documents. The Magistrate committed error in finding that RAM was sufficiently tangible to constitute a document for purposes of document production. Transient RAM or “random access memory” is ipso facto ephemeral and there is no reported case in federal civil discovery to support such a radical view that transient RAM constitutes a document for purposes of production.
The privacy issues from a civil logging Order are real in this context - logging would have an enormous chilling effect on site use and the user experience. The good faith nature of this privacy issue can be summed up in one question: What users are going to want to use the Torrentspy.com search engine when they know that Torrentspy.com has been ordered by a Federal Court to log what is in essence user clickstream tied to their IP address?
Respectfully, the Magistrate’s order, “turn on logging” or “write software” amounts to a de facto injunction, without any finding of a likelihood of success on the merits and without any bond – such an injunction order not only violates due process but it also exceeds the Magistrate’s jurisdiction. There are no copyrighted works on the Torrentspy site or linked to by the site and there had been no finding on the merits of infringement or affirmative defenses such as the DMCA - if Torrentspy succeeds in its defense but the Order is not reversed there would be no bond to compensate it for the burdens and the lost users who were chilled from using the site under fear of being tracked and logged.
Such an order is also vastly overbroad as it is designed to turn on logging for “all” site visitors. The plaintiffs sued regarding roughly fourteen US copyrighted works. Initiating a download of a torrent file from a third party server does not demonstrate that the download was successful. According to the evidence even if the torrent file download was successful it does not demonstrate that the torrent file downloader loaded the torrent file into the specialized software program needed to attempt to fetch an audiovisual file or that such attempt was successful or that the user fetched one of the copyrighted works being sued upon.
Such overbreadth of the Order can also be found in that the un-rebutted evidence demonstrated that the vast majority, over 70% of the site visitors to the Netherlands Torrentspy servers are from outside the United States. Over 90% of the bit torrent trackers are located outside the United States. In essence, the vast majority of site visits to Torrentspy.com cannot lead to primary infringements under US Copyright statutes “as a matter of law” as the site visits and such conduct after the site visits occur wholly outside the United States and do not fall under this Court's subject matter jurisdiction.
Does a defendant in litigation have an obligation to store, preserve, reduce to a more permanent form, and produce in electronic discovery data that is available only in transient RAM?
According to an Order made available today - the answer is at least sometimes yes. The Order is being appealed and is stayed pending appeal.
Indeed, the Federal Court in Los Angeles today granted Torrentspy.com's request for a stay of an e-discovery order pending appeal that in essence found that since user HTTP header information is in transient RAM on Torrentspy's servers that such data was tangible enough to be considered "documents" that needed to be "stored" in log files, preserved, and handed over in civil litigation e-discovery. Torrentspy.com did not have server logging "turned on" prior to litigation and therefore the Court in essence ordered that such server logging of user activity commence as part of its discovery order in response to a motion to compel production of documents.
You should note that in full disclosure this is a case I am working on and defending. But the "RAM" e-discovery Order is so important as an issue of first impression that I feel compelled to write about it on this site which focuses on e-discovery.
Here is a News.com story on the Order.
We will be appealing the Order as it does sound like in my view that defendants are being required to "create" documents not merely "provide" documents. In responding to requests for documents one does not have an obligation to create documents but merely to provide documents already in existence.
The Order - "turn on logging" - or "write a program to convert the data in RAM into log files" - also seems like a de facto injunction without a bond under the guise of a discovery order.
In my view if this Order as it relates to "RAM" is not quickly overturned or narrowed the worldwide social and economic consequences can be staggering. You can imagine the e-discovery preservation letters that will start being flung around warning opposing parties to preserve "relevant" things found in transient computer RAM or risk spoliation claims - the result will likely be fear and over-compliance (if one had the staff to reduce stuff in RAM to a log file) - it would follow that corporate IT and e-discovery costs would go way up as allegedly relevant data in RAM will need to be reduced to permanent storage based on one merely being a party to a case (or about to be a party).
It would also follow that consumer privacy would take a step backwards since just being a defendant with a privacy centric web server may be enough to be forced to turn server logging on and such logs handed over in discovery to the other side (without any finding that a defendant did anything wrong or is likely to lose the case).
So much for any right to surf anonymously. This may chill users "worldwide" from going to a US site that is involved in litigation and had to turn logging on.
Wait a second - a client called who was in the process of typing up some thoughts using Microsoft Word and erased something in his "open" document - it was a serious thought he had about crucial evidence related to a case - it was in transient RAM for an hour and not saved in a file yet - let me see - was that RAM thought supposed to be preserved before he pressed the delete key and saved the final version? Let us do a balancing test to figure this out.
In the course of analyzing web sites and pages in your electronic investigations you will likely come across important evidence that you will need to preserve - from web site text manifesting trademark infringement to web site photos related to copyright infringement or other illegal conduct to web site files demonstrating security breaches.
You may need to analyze, after the fact, the way certain web pages and sites appeared on a certain date and time. You may also need to use certain web site manifestations in evidence in Court.
Given the dynamic and voluminous nature of the web it does beg the question:
How do you preserve web pages and whole web sites for later use as evidence?
The answer can be found in web spider and crawler software programs you can use to "mirror" web pages and whole sites for a given date and time. The terms web spider and web crawler (and web robot) are used interchangeably and in essence mean the same thing - a program or script that browses the world wide web from link to link in an automated fashion.
Generally speaking most web spider and crawler "capture" programs work the same way - you provide a starting URL, presumably either the "front page" of the site or a deep link for the main offending page and then you "tell" the program how many levels deep of the site or third party sites you want to spider and capture.
You can use web spider software to capture a single page or an entire web site, or even to follow links to third party sites. Needless to say that you need to be careful when you input capture criteria and start the spidering software as the amount of memory grows exponentially based on decisions related to levels to crawl and linked pages to store and whether to follow links to third party sites and servers.
You should also be mindful of the web spider side effect that in some instances you may be placing a large load on a target web server who is getting "pounded" with requests from your IP address which may prematurely reveal your investigation.
I have found that in many instances, especially in the civil litigation context, that the simplest and narrowest method of spidering is usually sufficient - especially when coupled with a web video produced using a software tool like Camtasia Studio providing a video exemplar of web site browsing clearly showing the offending content .
In other words in the current civil litigation climate it may be enough to use the the built in "web capture" tool in Internet Explorer or in Adobe Acrobat to capture a small number of pages or a single web page and such an approach will rarely be seriously disputed from an evidentiary perspective. The added advantage in using Adobe Acrobat for web spidering and capture is that you can bates stamp the resulting PDF pages and have your e-discovery document production of the target web pages ready to go in an easy to use format.
Adobe Acrobat does a conversion of the pages from their native format and thus may not be suitable if anything other than the general manifestation of the visible content on the target web pages is at issue - so choose a more robust solution if, for example, the underlying page formatting, page metadata, site directory structure, link structure, or dynamically generated code for the respective target web pages is at issue.
But, again, rarely does someone waste time challenging the integrity or admissibility of the Internet Explorer or Adobe Acrobat capture of the target web page(s) manifestation. If they do challenge the integrity of such a spidering effort then subpoena or request the production of documents to get a mirror or copy of the legacy web code to see if any of the alleged differences are material to the issues in dispute.
More complex spidering tools allow for much more accurate page and site capture and are particularly useful for copying a large number of web pages and sites over a long period of time and fully automating the process - including periodic programmatic review of the target web sites for changes and subsequent copies. Some of the leading programs in this category including Grab-a-Site, Web Copier, and WebWhacker.
Programs like Grab-a-Site use specialized methods of web spidering and capture that preserves the integrity of the original site and thus may reduce or mitigate attacks on evidence admissibility some of these methods include maintaining actual filenames, server directory structure, and Unix compatibility .
Web Whacker may change the filenames in the spidering and capture process but does allow, using a proxy server technology, for a relatively realistic off line simulation of the online browsing experience for the captured site. Thus there is an argument that you should use both Grab-a-site and Web Whacker to capture a target web site given the pros and cons of each.
If you use Linux you may want to consider using the "wget" command - it is free and there is also a version for Windows. For example, to capture for your own off line review this site moredata.com you can use the command:
% wget -m http://www.moredata.com
If there problems due to internal links on the mirrored site having absolute links pointing back to the web - that can be handled by changing the wget command switch by using:
% wget -m -k http://www.moredata.com
The above change in the wget command will likely change the links on the captured site so that they are no longer identical to the original site. Thus, if you want to err on the side of evidentiary conservatism you may want to mirror target sites and pages using both wget methods around the same period of time.
It is important to remember that unless you ethically hack a server your web spidering software cannot capture much of the web site source code - like server side scripts and therefore depending on the issues in your case you may need to use litigation methods like subpoenas and document requests to get the original web source code. Indeed, the web spidering software is usually limited to what could be manifested in a user's browser and thus usually static HTML, browser side scripts, or what is dynamically generated by the web site's underlying source code - but often this is enough evidence to support the material allegations of illegal conduct on a given web site.
If you need to get access to historical manifestations of web pages and sites you can try searching the Google cache or use the Wayback machine.
You can search the Google cache by using the method "cache:URL". Here is an example of this site's home page in the Google cache.
Here is an example of my law firm web site's home page in the Wayback machine.
Given the likelihood that you will need to find, mirror, and preserve electronic evidence obtained from the web you should seriously consider acquiring as part of your electronic investigation toolkit the proper web spider software to accomplish such a task.
During the course of an electronic investigation you may be called upon to plan the forensic analysis of a target's computer system and hard drive.The target hard drive may contain a range of data relevant to an investigation from emails making overt admissions to trace evidence of files that were "attempted" to be deleted that when "undeleted" may prove to be incriminating.
The general flow chart for a computer forensic investigation can be summarized with an acronym ISUPR and is as follows:
Image - where the target hard drives are cloned
Search - where keyword searches for relevant evidence are performed if feasible (i.e. non images)
Undelete - where you restore files that were deleted if enough of the underlying data is still present
Preview - where you use a universal viewer to review potentially relevant files, data, and images
Report - where you report on your methods and results
I will deal with the first stage here - Image the target hard drive(s).
The Image stage is perhaps the most exciting stage where an investigator needs to get actual access to the target's computer system and in essence "clone" the target's hard drive(s). Needless to say that the access to the target's hard drive must be lawful or the investigator may, ironically, be in violation of applicable law such as the Computer Fraud and Abuse Act. Some examples of lawful access include via a valid search warrant, subpoena, pursuant to a contract, and when appropriate that of an employee using an employer's PC.
The imaging process must also be forensically sound, well documented, and comply with applicable rules of evidence, including but not limited to, maintaining the chain of custody.
The National Institute of Standards and Technology (NIST) under an agreement with US Department of Justice has come up with some mandatory requirements for a forensically sound hard drive imaging tool and they are:
-The tool shall make a bit-stream duplicate or an image of an original disk or partition.
-The tool shall not alter the original disk.
-The tool shall be able to verify the integrity of a disk image file.
-The tool shall log I/O errors.
-The tool’s documentation shall be correct.
There are a variety of software and hardware tools that are used to Image a target's hard drive. The US Department of Justice has tested various software and hardware hard drive imaging tools to determine if they are forensically sound and has provided such results. I will not endorse any hard drive imaging tool here or pass judgment on whether they comply with the above requirements but rather provide you with a list in no particular order of commonly used computer forensic hard drive imaging tools for your convenience:
You need to take care of handling the computer forensic Image stage in a technically sound and legally compliant manner since there is little chance to fix any mistakes made in this stage or to "unring the bell". If a mistake is made in the hard drive imaging process or an anomaly is found then a proper cross examination or other legal attack can possibly lead to the exclusion of such evidence. On the positive side if a technically and legally valid hard drive image or "clone" is made most mistakes made after the Image stage, such as "undeleting" erased files, can be fixed or redone on the fly with little or no consequence other than time.
The United States is a great melting pot of a multi-cultural world which is often reflected in the names of people who live and transact business there. Diversity of cultures in the US has manifested in a diversity of names that have not fit squarely in the "western" or "roman" first name, middle name, surname paradigm.
The lack of consistency of people "names" creates challenges in electronic investigations and e-discovery.
For example, there can be multiple variations of the same individual's name across Asia or the Middle East. If you introduce misspellings the global name recognition problem is compounded.
How do you analyze in an automated fashion huge amounts of data to identify an individual if you do not know all the multi-cultural keywords or name variations for a particular individual?
IBM has an answer and it offers a cross platform suite of AI software for Global Name Recognition consisting at its core Global Name Management, Global Name Scoring, and Global Name Analytics. The IBM Global Name Recognition software uses over 20 years of linguistic research and analysis and provides robust information for given "input" foreign names including cultural classification, gender, and possible alternative westernized spellings.
In addition IBM's Global Name Recognition software helps discern spelling errors, provides for a huge multi-cultural knowledgebase, and ranks search results amongst other things.
In this connected world it would be wise for the electronic investigator and e-discovery attorney to have access to global name recognition software to save investigation time and money and to increase accuracy.
In the course of an investigation learning the various web sites someone visited along with screenshots and the date and time can be useful evidence.
For example, in a trade secrets case a senior level employee may have attempted to bypass company email to use his private "web email" to communicate regarding stolen trade secrets. Getting screenshots of such web based email pages and related communications would be helpful to prove the theft of trade secrets.
In another example, an employee visited a web site to download illegal content in violation of a company's policies and subsequently erases such content from the "my documents" area and denies the download.
Obviously to "forensically" obtain web browsing history would be helpful in investigating matters involving illegal downloading or to help explain or verify other wrongful conduct.
How do you reconstruct web browsing history? In an area deep in the caverns of the Windows operating system resides "temp" and "cache" directories and files some of which are "hidden" that contains, usually by default, the data needed to reconstruct a robust web browsing history. For most users this type of data is relatively inaccessible in that it is hard to find and harder to reassemble into a useful form.
There are software tools available to computer forensic investigators to automatically reassemble web browsing history. One well established tool is NetAnalysis which scans a PC for the hard to find files stored in areas like temp and cache directories and processes the files and attempts to reassemble them into a web browsing log complete with screenshots and time and dates of visits. NetAnalysis goes further and even has functions that will attempt to find data in the file "slack" and other hard to analyze storage areas. NetAnalysis also includes a robust report writer feature that summarizes web browsing history by useful criteria, including date, time, and URL.
On the flipside, there are multiple tools to "wash" a computer of "hidden" web browsing history including most notably Webroot's Window Washer software which also allows for "bleaching" a fancy word for military standard deletion with multiple overwrites.
From an investigation perspective there is a bit of a race - keep in mind that you had better capture the hard drive to run NetAnalysis before the alleged wrongdoer runs Window Washer.
Getting access to web browsing history can be an important part of an investigation and can make or break a case. NetAnalysis allows you to quickly, efficiently, and comprehensively forensically analyze hard to get at web browsing system files to recreate browsing history and optimize your evidence results.
During the course of analyzing electronic evidence in an investigation or in litigation you will inevitably be faced with password protected files for which the password is unavailable. You may need to ethically hack password protected files using hacker-like software tools.
Given the evolving intrusive nature of e-discovery and electronic investigations it will likely become more commonplace for employees, executives, and investigation targets to attempt to gain some "perceived" preemptive privacy by the use of password protection on electronic files.
For example, in a trade secrets case you may find a suspicious password protected PDF file as an attachment in an email in the Outlook "sent box" of a top research scientist employee and that scientist is no longer around or cooperative. Access to the contents of the “locked” PDF file may be crucial to determine if the employee sent the competitor trade secrets.
In another more benign example, a top executive may have misplaced login information to a key Windows server hosting a document production in a pending litigation. Ethically hacking the “locked” Windows server to gain access to the stored files can save a huge amount of time and money in the document production process.
Needless to say that investigators and attorneys should be careful to fully evaluate lawfulness in the given context before deciding to “crack” the password to a target file or operating system. There are numerous state and federal laws which may prohibit unauthorized access to files and systems including, but not limited to, the Federal Computer Fraud and Abuse Act, the Federal ECPA, and state privacy, anti-spyware, and anti-hacking statutes like California's Consumer Protection Against Computer Spyware Act.
You may need to use brute force, dictionary attack, or “common vulnerability” techniques to crack the passwords of and gain access to the contents of a target file or system.
Here is a shocker for many non-tech lawyers and investigators – most of the popular file format and operating system passwords can be hacked in minutes using techniques like “brute force” or “dictionary attacks” amongst others. Indeed, researcher Philippe Oechslin developed such an optimized brute force cryptanalytic technique that he was able to hack Microsoft Windows password hashes in about 13.6 seconds.
Common vulnerability access to password protected files is both a recognized method of ethical hacking as well as a national security risk and thus the Department of Homeland Security National Cyber Security Division has created the National Vulnerability Database where you can query a large number of cyber security vulnerabilities for appropriate purposes.
Unless you have a lot of time on your hands if you need to gain lawful access to a password protected data file or system you will be better served to use an existing “ethical hacking” software tool which usually contains sufficient heuristics, from years of research, to determine an optimized method of cracking the password to a given file format or system.
The Elcomsoft Password Recovery Bundle is a comprehensive software package that allows authorized users and investigators to crack password protection and gain access to a large number of common business software file formats and operating systems including:
- Windows NT/2000/XP/2003 user-level security: advanced audit and recovery
- Windows PWL files, RAS/dial-up/VPN passwords, SYSKEY startup password, cached credentials, shared resources, Windows/Office CD keys, asterisk fields
- Windows 2000/XP/2003/Vista Encrypting File System
- Microsoft software: Word, Excel, Access, Outlook, Outlook Express, Internet Explorer, PowerPoint, OneNote, Project, Visio, VBA, Money, Mail, Schedule+, Microsoft Word and Excel
- Compression utilities (archives): ZIP/PkZip/WinZip, RAR/WinRAR, ACE/WinACE, ARJ/WinArj
- Corel WordPerfect Office: WordPerfect, QuattroPro, Paradox; WordPerfect Lightning
- Adobe Acrobat (PDF)
- ACT! (Symantec / Best Software / Sage)
- Lotus SmartSuite (Organizer, WordPro, 1-2-3 and Approach)
- E-mail clients (Microsoft Internet Mail And News, Eudora, TheBat!, Netscape Navigator/Communicator Mail, Pegasus mail, Calypso mail, Opera and others)
- Instant Messengers (ICQ, ICQLite, Yahoo!, AOL IM, Windows Live Messenger, Google Talk, Excite Messenger, Trillian and many others)
- Intuit Quicken, Quicken Lawyer and QuickBooks
Microsoft Word is currently the word processing software of choice for most individuals and companies. Many users are under the mistaken belief that the final version of the "visible" Word document is the only substantive content contained in the "saved file."
Beyond the visible document and hidden in Word files is data known as "metadata". Metadata can include things like revision history, authors, and "track changes" which reveals the evolution of a document and the various edits that led to the final Word file. According to Microsoft metadata found in Word files can include:
• Your name
• Your initials
• Your company or organization name
• The name of your computer
• The name of the network server or hard disk where you saved the document
• Other file properties and summary information
• Non-visible portions of embedded OLE objects
• Document revisions
• Document versions
• Template information
• Hidden text
This leads to two important action items - First, if you are creating and saving Word documents get rid of the metadata prior to circulating Word files and saving them for posterity - you can use a free software tool like Microsoft's "rhdtool" or a more robust product like Workshare Protect to remove metadata from Word and other Microsoft Office files. Better yet convert the Word file to a PDF document (and make sure the metadata is removed from the PDF file as well).
Second, if you get Word files in electronic discovery or via investigations you may want to analyze the Word file metadata for hidden information which may reveal substantive evidence in your case - you can use a software tool like Workshare Protect to analyze the document files and provide summary reports of the metadata.
For example, in one high profile matter great care was taken to remove comments and edited text metadata from a Microsoft Word document but some metadata remained with serious consequences.
In 2003 a memo was prepared by British Prime Minister Tony Blair's office to support the notion that UN weapons inspections were not working in Iraq and that military action was justified. The memo was used by US Secretary of State Colin Powell in support of military action in Iraq. The memo was posted on the Prime Minister's web site in Microsoft Word "dot doc" format.
Two investigative events occurred related to the Iraq memo Word file. First, Glen Rangwala of Cambridge University compared the memo to an article published by a US graduate student in 2002 and found that large portions were cut and pasted or copied - grammatical errors and all - from the graduate student's article to the memo. None of the copied text was credited to the original author.
Second, Richard M. Smith a privacy and security expert in the US downloaded the Word document from the Prime Minister's web site and extracted revision history metadata - ten revisions in all. The revision history supported the view that the Prime Minister's press office was deeply involved in the Iraq memo's preparation. For example, those involved in editing the memo likely included, based on the metadata, Murtaza Khan who was a press officer, Alison Blackshaw who was a personal assistant to the Prime Minister, John Pratt who worked at 10 Downing Street, and Paul Hamill who was a foreign office official. Unfortunately for the Prime Minister, the metadata seemed to point to his staff as the cut and paste technicians who may have copied portions of the US graduate student's article.
I went ahead and performed a similar type of analysis on the UK-Iraq memo Word file using Workshare Protect's metadata analysis tool and here is a partial screenshot of what I found which is consistent with Richard M. Smith's metadata analysis.
The UK-Iraq memo metadata debacle is a powerful demonstration why attorneys and investigators must be sensitive to the metadata issue in everyday communications as well as in e-discovery and should acquire the appropriate tools for removing metadata and analyzing metadata in the appropriate context.
Combining the power of satellite global positioning system (GPS) technology and 8 megapixel digital picture quality Ricoh has created a useful camera device for electronic evidence field investigators.
There are many times during the course of gathering electronic evidence that a field investigator needs to solidify the "chain of custody." One of these steps usually includes taking before and after photographs of the "target" computer system where data was gathered - in essence to support, amongst other things, later testimony on the identification and location of the target system.
Now, those involved in gathering electronic evidence and chain of custody data have a new friend, a satellite GPS capable digital camera known as the Ricoh 500SE which will record the latitude and longitude where each photo was snapped, along with the date and time, and log it along with the photo metadata in real time. The Ricoh 500SE also contains a GPSlock feature allowing it to record the location of the target being photographed as opposed to the location of the camera.
The Ricoh 500SE can send digital pictures and related data to other devices using bluetooth technology.
You can certainly see a scenario in Court where the digital photos and related GPS data are superimposed on top of a map providing for an impressive summary display of an electronic evidence investigation.
The use of laser and ink jet printer forensic analysis should be considered as an e-discovery tool to determine the origin of a printed document.
Sooner or later, if you take enough depositions, someone is going deny they created or printed out a key unsigned document.
You can see the line of deposition questioning regarding a laser printed document:
Q: Mr. CEO did you draft this document (that has all the bad stuff in it)?
Q: Did you print this document on the laser printer attached to your PC at home?
Q: Did you print this document on the laser printer attached to your PC at your office?
Q: Have you even seen this document or copies of it before I just showed it to you?
There have been significant advances made in the past five years in analyzing the subtle variations in printed pages and matching them to a particular printer model and even a specific printer.
In one study forensic scientists used image texture analysis and banding characteristics in the printed page to help determine printer model and even the specific printer. In addition, in another study entitled "Printer Forensics using SVM Techniques" from the same group of scientists at Purdue University, font size, font type, and printer anomalies were used to statistically tie a particular printed page to a specific printer.
To be fair, confirming that a target person's laser printer created a particular document still does not prove conclusively that the same person drafted or printed the document - but it does create important circumstantial evidence. The circumstantial evidence becomes even more important if few if any other persons besides the target would have access to such printer.
So what is the action plan when faced with a witness who denies "printing out" an unsigned laser or ink jet printed document? Besides locking in the target person's testimony:
1. Write a "preservation letter" to demand the "other side" maintain in a secure location the target printers and original printed document at issue (and if feasible meet and confer on such printer issues at the initial "e-discovery" conference of counsel)
2. Retain a printer forensic expert
3. Attempt to get access, for your expert, to the original printed document if still available
4.Have the printer forensic analysis done to analyze and compare the target printer(s) and document(s) at issue
It was just a matter of time before the lowly laser printer would be engulfed in the storm of e-discovery. Using forensic science to tie a laser printer to a printed document may be enough to provide substantial evidence in a case. Printer forensics should be given careful consideration as an additional tool in the e-discovery toolbox.
E-mail communications play a vital role in e-discovery and as core evidence in most complex litigations. In some instances it is important to verify, using e-mail header information and digital tracing techniques, the integrity of email messages prior to their admissibility.
In many cases the parties will, as they should under the revised Federal E-Discovery Rules, stipulate that certain emails were sent or received on certain dates and times and are authentic in nature and the only real issue for evidentiary purposes is relevance.
But what do you do when you have reason to believe that an email is a forgery or a spoof?
How about if the timing of an email is important - what time was it sent?
If the origin or timing of an email message is in issue you need to get a native production of email documents with the header information intact and analyze the header information. The header information of an email is a treasure trove of digital forensic information and it can allow you to learn where and when an email message came from. It also gives you the information you need to subpoena the sender's ISP to get even more information that can nail down the integrity and timing of an email message.
The typical email header will generally look like this:
Received: from purported hostname (hostname (host IP address))
by recipient hostname
with email protocol message ID
for recipient timestamp (GMT).
Care should be taken to "trusting" the "purported hostname" as this information is easily manipulated by the sender to be false.
The key thing is to discern the sender's host IP address. While even IP addresses in the header can be manipulated via the use of intermediate proxy servers, thus making the header more complex, the casual person attempting to forge or spoof the origin of an email may overlook the use of such technically sophisticated methods not thinking that one day such email would be held up to serious forensic scrutiny.
Once the sender's IP address is discerned you can do a reverse DNS lookup and determine the originating domain name and then use a Whois lookup to discern the IP's organization and contact information. The tracing of a header IP address back to an entity can be automated using software tools like NeoTrace Pro . This program allows you to trace an IP address or hostname to its source on top of a world map and the detailed Whois source data is displayed in an adjacent window.
The next step, if necessary and feasible, is to issue a subpoena to the sender's ISP gathered from the Whois lookup and to use the message ID(s) to get logs and other documents related to the email messages at issue.
Regardless of header complexity the main method of tracing an email on the Internet is usually the same:
1. Procure the email header information in native digital format.
2. Identify the IP address of the server used to send the email message(s) at issue.
3. Trace the IP address to discern the sender by a reverse DNS and Whois lookup or via tracing software like NeoTrace Pro.
4. Identify the message ID(s) at issue and if need be issue a subpoena to the ISP asking for data in their log files and on their servers related to such message ID(s).
Email evidence is now a crucial part of electronic evidence and discovery and care should be taken to procure and use native header information to determine the integrity of emails.
Cell phones have evolved into more than talking devices and are now in substance hand-held computing devices exposing such devices to "litigation holds" and e-discovery requests. More and more models of cell phones are running robust operating systems, like Palm, Symbian, Windows Mobile, and Blackberry OS. Cell phone data extraction kits have been developed to assist investigators and attorneys in obtaining and preserving key cell phone evidence.
Most current models of basic "free with service deal" cell phones act like miniature computers, run software applications, and are connected to the Internet. Current basic cell phone functions include: Phonebook, SMS, Calendar, Memos, To Do Lists, Pictures, Video, Audio, amongst running thousands of other small format applications like attachment viewers, mini spreadsheets, pop email, and instant messaging.
The issue of when a litigation hold must include the data on cell phone devices is still murky. Call data and text messaging are transient in nature and come and go from the cell phone's memory and cache very quickly. The transient nature of cell phone devices will certainly be a factor in any Court's calculus about the reasonableness of the applicability of a litigation hold to cell phone devices. The more the cell phone acts like a small laptop, like say a Windows Mobile device, with less transient storage the more likely the device needs to be considered for a litigation hold.
The more the cell phone data is important to the case, regardless of the transient nature of the storage, the more likely that such data would need to be mirrored and stored as part of a litigation hold. For example, I recently was involved in a case where a text message that auto deletes in a few days was at issue in a contract dispute - this is the sort of case where arguably the cell phone data needs to be preserved. In other situations knowing when a call was made or received can be important to a case. The list is virtually endless for the many ways cell phone data can be relevant to a dispute.
Given the popularity of cell phone use and the importance of data that can be extracted litigators and investigators should not overlook preserving the evidence as part of a litigation hold as well as requesting such evidence from the other side via e-discovery requests.
The Logicube CellDEK developed in cooperation with the UK's Forensic Science Service appears to be a state of the art "field" device to download and store a copy of the current data state of a cell phone device. The portable CellDEK® acquires data from over 200 of the most popular cell phones and PDA's using numerous different USB adapters specific to each handheld device. Connectivity by infra-red and Bluetooth are also built-in.
The CellDEK software automatically performs forensic extraction of the following data: Handset Time and Date, Serial Numbers (IMEI, IMSI), Dialed Calls, Received Calls, Phonebook (both handset and SIM), SMS (both handset and SIM), Deleted SMS from SIM, Calendar, Memos, To Do Lists, Pictures, Video, and Audio.
Paraben has an impressive competing product known as the Device Seizure Toolbox.
Using the proper method for cell phone data extraction is important for laying the proper foundation for admissibility - both Logicube's and Paraben's offerings take into consideration preserving the integrity of the data and verifying the accuracy of stored files using an MD5 hash paradigm amongst other things.
Review of document productions in native digital format can result in premature expert disclosure due to invisible GIFS, also known as "web bugs", and other server "pings" if precautionary tactics are not taken.
There is a split of authority between Courts on whether or not native digital files complete with metadata need to be provided in response to e-discovery requests for documents. Some Courts have taken the view that "flattened" but searchable versions of digital documents should be produced such as in TIFF or PDF formats on the theory that most lawyers can review the documents without the need for experts and such formats are a good metaphor for the traditional 8.5 by 11 page. The resulting digital "pages" can be bates stamped and printed for great ease of use at deposition and trial. After all how do you refer to one of a million emails in an Outlook "pst" file at trial unless it is reduced ultimately to a tangible marked page of some sort.
Other Courts have taken the view that such digital documents should be produced as kept in the normal course of business - so for example - rather than converting Outlook "pst" files to PDF the native "pst" files should be produced with metadata intact.
There is a tendency by counsel in some instances to overlawyer by insisting on getting native digital files complete with metadata in document productions even when such "native" productions do not advance the case in any material manner.
However, few courts would likely oppose compelling production of native digital files when issues related to the integrity of the digital files, metadata, foundation, chain of custody, or forensic matters are actually at issue.
But stay alert to the unintended consequences of analyzing a native file document production. For example, if you load native pst email files into the Outlook software on your PC it is quite likely that the other side, if sophisticated, could know who is looking at such native files and when. It all starts with knowing that emails themselves act like mini web pages and image files in such emails, like a clear "one pixel by one pixel" GIF also referred to as an invisible GIF, can link back to servers which capture the IP address of the image manifesting PC.
For example a number of companies periodically send email that contains links to image files located on their servers - the images are in essence fetched when the email manifests in the inbox thus pinging the image host server. In some instances such links to invisible GIF files (or for that matter visible GIF files) let senders know if someone opened the email and will "radar back to the mothership" the viewer's IP address - the date and time are usually logged with it.
Historically such GIF technology allowed email marketers to track the effectiveness of their programs and maintain a database of "good" email addresses.
Others use such GIF technology to know if the recipient opened the email.
Naturally if the entity that produced the documents in their native format included emails with GIF(s) linked to their servers they may be able to track the IP address of the reviewing PC. This of course creates a reverse privacy trap for the e-discovery analyzer whose IP address can be captured by "the other side's" servers and an IP lookup could be done to prematurely discern the expert's organization and thus resulting in possible premature expert disclosure.
There are a number of safeguards that experts can employ to mitigate the invisible GIF problem. The review can be done behind an anonymous proxy server so the IP address reveals nothing of substance. Another approach is to include a robust firewall that would block the fetching of the GIF(s) from the remote server. The ultimate approach is to make sure there is no Internet connection on the e-discovery reviewing computer - this may prevent some of the "linked to" images from appearing in the email and web related content unless found in the cache. Caution should be exercised by those reviewing native digital document productions from inadvertently revealing their IP address and providing premature expert disclosure.
Steganography and E-Discovery: What will Courts and litigators do when spam contains hidden messages?
From time to time issues will arise under the evolving e-discovery rules that frankly seem pretty darn hard to resolve - the use of Steganography is one of them. Steganography is the art and science of hiding or obscuring messages so as to engage in covert communications. In terms of e-discovery and evidence issues steganography involves placing a hidden encrypted message in other data, usually a digital photograph, video file, audio file, or yes even spam. The recipient of the steganographic data would use a steganography key to unlock and decrypt the message.
For example using an inexpensive steganographic program ill intentioned parties, who do not want their communications handed over in any upcoming case, can use a proxy server to send the appearance of spam back and forth which contain images that have embedded encrypted steganographic messages. What a nightmare for litigators on both sides of the case. The nightmare multiples if the supposed spam e-mails contain links to third party web sites that manifest steganographic images.
The current mainstream thought under the revised e-discovery rules is that requests for e-discovery should be proportional in nature and reasonably tailored to the facts and issues in the case. In essence, litigators currently make widespread use of reasonable keyword and "soundex" searches to distill out responsive electronic documents and emails - emails that look like spam are generally not produced nor are they requested (unless the case is over unsolicited e-mail).
But no automated keyword search will be able to distill out relevant messages made using steganography. No manual visual inspection will be able to detect messages in steganographic form - the photos look the same.
To be fair there are a lot of socially important uses for steganography. For example, in many contexts the right to privacy is protected and advanced by using steganographic messages and one can certainly appreciate using steganography to protect important personal information like passwords, lock combinations, trade secrets, and financial information. In addition, steganography can be used to protect intellectual property such as embedding a secret message in photos, websites, and videos and thus proving that an alleged defendant copied your works. Steganography can also be used by our intelligence services as a secure method of communication.
There are numerous inexpensive programs that help you to create steganographic messages such as Invisible Secrets. There are also some programs that help you to detect steganographic files such as Stego Suite from Wetstone. I suspect programs like the Stego Suite will become a more important part of the modern civil litigators e-discovery toolkit - especially if there is some access during the case to hard drives and server drives for automated analysis.
What will Courts and litigators do if steganography becomes more widespread due to both socially acceptable and unacceptable uses? What if steganography starts to make up the majority of sensitive corporate communications? Can corporations communicate the most important sensitive digital messages in the manner and method they choose or do they have an obligation once litigation commences to use a tangible-centralized easily searchable form? How are outside litigators supposed to ensure the integrity and completeness of the e-discovery process in a world of steganography?
What if spam contains hidden messages?
The only way to perform e-discovery in such a steganographic world would be to ask for and get all communications, all server drives, and all hard drives, and hope that there is no off site storage five thousand miles away in a jurisdiction hostile to American law. Such a broad request will not be meet with enthusiasm by the other side or most Judges. If a miracle occurs and you get the overbroad discovery and access then you would need to use automated steganography detection software with robust artificial intelligence like Stego Suite to possibly detect steganographic carrier files. Break the encryption if you are lucky and then run your keyword search to see if the data is relevant to the case. The e-discovery effort, time, and cost from the possible use of steganograpy can become mind boggling.
It will be an interesting evolution as the e-discovery cases evolve as to how the steganography legal and technical issues will be handled.
Metamorphoseis a "freeware" mass file and folder renamer software. This program will allow you to programmatically add characters and numbers in certain positions in the filename. You can set filters for the automated renaming process. For electronic evidence and discovery purposes when appropriate this software will help you to include the bates stamp number or range in the filename along with the date.
CT Summation is perhaps the most well rounded electronic discovery and litigation case analysis software currently on the market. Summation imports various file formats like Outlook emails and other "documents" and then allows you to integrate the key documents into a case outline. Summation includes an e-discovery loader that will allow you to directly import PST files which you can then view in native format. Summation also has what they call a “petrification” tool that allows you to turn those Outlook PSTs into graphic files for redaction and Bates stamping. But here is where Summation really shines as it integrates a robust outlining tool called the Case Organizer that allows you to input your claims and defenses and then tag and link different exhibits, documents, or transcript excerpts. You can find this Case Organizer paradigm through competitive products like CaseMap which has more robust features in that one area but no one software product seems to integrate e-discovery, file conversion, tagging, and case analysis like Summation.