January 9, 2020
Digital Forensics: Theory vs. Practice
As an active digital forensic
practitioner for over 10 years, I have attended many training offerings from
many different companies/resources, read many white papers published by any
number of scientific and academic entities and worked hundreds of active cases
for plaintiffs, defendants and in law enforcement covering PC, Mac and mobile
device forensics. One aspect that
crosses all of these areas that has waned slightly in the last few years, but
still rears its ugly head, are the theoretical questions surrounding digital forensics. Among these we have all heard at one point or
another -- hash collisions, data cross-contamination and reverse-engineering of
hash values to be made into a viewable data file. While we can Google these theories and
findings to death, their practical application in “everyday forensics” is reality-based,
not theoretical.
Hash Collisions
The topic of hash collisions
generally comes up when working independent analysis in criminal defense
cases. This digital version of the “some
other dude did it” (or SODDI) defense is based upon the theory that two digital
files containing completely different data can be run through a hashing
algorithm and obtain the same result. Hash
calculation is a big part of forensics and particularly in cases dealing with
child exploitation images, the hash value is used to locate those sharing illicit
images on the peer-to-peer file-sharing networks. However, we also use hash values to validate
evidence files as identical to the original, to cancel out any irrelevant/system
files and to validate the authenticity of files across a system or multiple
pieces of evidence. Hashing algorithms
such as MD5 and SHA-1
have been “broken” for years, but are still in ubiquitous use in digital
forensics. Why? Because the practical application of these
collisions is so minimal, it is not even worth mentioning in a court of law. But
rest assured, it still gets mentioned! The
only real application these collisions have is to attempt to obfuscate
the facts and/or confuse the finder of fact in a legal proceeding. Simply put, there are no documented cases where
someone accused of downloading or sharing illicit images was falsely accused
because the images they downloaded/shared possessed the same hash value as some
innocuous files they were attempting to download/share. Consider the statistical likelihood that
someone downloaded/shared an innocuous file which happened to share the same
hash value as an illicit file and also was on a police watch list where a
search warrant was executed. All of
those factors being in place at once is very unlikely.
While we are constantly testing, honing
and refining our knowledge in the field of digital forensics and we may even
work in a “lab”, the fact remains that at a practical level, none of us have
the ability to re-create these collisions, nor have we seen them “in the wild”,
so to speak. They are reserved for a theoretical
lab environment where the sole purpose is to find and publish the collision,
not to find and report the truth in the evidence.
Data Cross-Contamination
Before I discuss the practicality of
data cross-contamination, I’ll insert a disclaimer that I understand that using
sterilized media to store forensic data and conduct analysis is mentioned as
potential best practices, as detailed in the Scientific
Working Group on Digital Evidence (SWDGE) Best Practices for Computer Forensic Acquisitions
(v. 1.0). One of the reasons for
this to avoid data cross-contamination.
What is that? It is a theory that
if you have a piece of media upon which you store data to be analyzed in a
forensically-sound environment, that if you do not sterilize the media (i.e.,
wipe and validate prior to placing the data to be analyzed on the media) that
some data from a previous or unrelated case could become part of the current
case analysis data, thus potentially contaminating the results with un-related
data. This is a viable theory when
dealing with physical evidence such as DNA samples or fingerprints, but it has
very little, if any practical application in digital forensics. Consider that if you create a forensic data
file such as an .e01, raw or .zip file, what is the method and/or likelihood that
copying that file onto a piece of non-sterilized media will somehow mix or
comingle with pre-existing data? I’ve
heard one claim of data cross-contamination from another examiner, but anecdotes
are not data, nor was the claim ever validated.
We sterilize the media, not because we’ve ever seen it affect any cases,
but to avoid questions about it when testifying.
Hash Value Reverse-Engineering
Having obtained much of my initial
training in law enforcement and, as such, working a majority of cases involving
illicit images, I can recall being trained that catalogs of illicit image hash
values are law enforcement sensitive and not to be disseminated to independent
examiners or to the general public.
Why? Because someone could
potentially and theoretically reverse-engineer the hash value to re-create the file,
which would be illegal. This came up again
in a case worked independently in 2019.
I thought this theory and explanation was long gone, but it is not.
The problem with the theory of reverse-engineering
a hash value is I’m not sure it’s ever been done, at least not at a practical
level. It is a theory. Scientists, academics and lab-rats may have
done it, but I don’t know anyone who actively practices digital forensics that
either 1) has the knowledge, skills and abilities to do it and/or 2) has the
desire to do it. So why is it still
mentioned as a consideration in cases? (Hint:
see the above note about obfuscation and confusion).
Wrapping It Up
I’m not an academic or a lab-rat. I’m just an old(ish) retired investigator with
some skillsets that can often be of benefit to parties involved in litigation. Because of that, I’m concerned with the
practicality of digital forensics – What is the best way to get the case
analyzed? What evidence is relevant? Where do I need to look for the
evidence? What am I missing that could
potentially answer important questions?
Theoretical considerations like those mentioned here are not worthy of
much calorie-burning when trying to answer these questions. In the pragmatic world of digital forensics,
we have to consider what is, not what could be. Because the truth lies in the facts of the
case and the data which is part of the case, not on theory of what could or may
have happened… And likely did not!
Author:
Patrick J. Siewert
Principal Consultant
Professional Digital Forensic
Consulting, LLC
Virginia DCJS #11-14869
Based in Richmond, Virginia
Available Wherever You Need Us!
We Find the Truth for a
Living!
Computer Forensics -- Mobile Forensics
-- Specialized Investigation
About
the Author:
Patrick
Siewert is the Principal Consultant of Pro Digital Forensic Consulting, based
in Richmond, Virginia. In 15 years of
law enforcement, he investigated hundreds of high-tech crimes, incorporating
digital forensics into the investigations, and was responsible for
investigating some of the highest jury and plea bargain child exploitation
investigations in Virginia court history. Patrick is a graduate of SCERS, BCERT, the
Reid School of Interview & Interrogation and multiple online investigation
schools (among others). He is a
Cellebrite Certified Operator and Physical Analyst as well as certified in
cellular call detail analysis and mapping. He continues to hone his digital forensic
expertise in the private sector while growing his consulting &
investigation business marketed toward litigators, professional investigators
and corporations, while keeping in touch with the public safety community as a
Law Enforcement Instructor.
Email: Inquiries@ProDigital4n6.com