dnasec.cs.washington.edu Open in urlscan Pro
2607:4000:200:12:3eec:efff:fe5e:6f68  Public Scan

Submitted URL: http://dnasec.cs.washington.edu/
Effective URL: https://dnasec.cs.washington.edu/
Submission: On November 08 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

DOCTORING DIRECT-TO-CONSUMER GENETIC TESTS WITH DNA SPIKE-INS

Direct-to-consumer (DTC) genetic testing companies have provided personal
genotyping services to millions of customers. Customers mail saliva samples to
DTC service providers to have their genotypes analyzed and receive back their
raw genetic data. Both consumers and the DTC companies use the results to
perform ancestry analyses, relative matching, trait prediction, and estimate
predisposition to disease, often relying on genetic databases composed of the
data from millions of other DTC-genotyped individuals. While the digital
integrity risks to this type of data have been explored, we considered whether
data integrity issues could manifest upstream of data generation through
physical manipulation of DNA samples themselves, for example by adding synthetic
DNA to a saliva sample ("spiked samples") prior to sample processing by a DTC
company. Here, we investigated the feasibility of this scenario within the
standard DTC genetic testing pipeline. Starting with the purchase of
off-the-shelf DTC genetic testing kits, we found that synthetic DNA can be used
to precisely manipulate the results of saliva samples genotyped by a popular DTC
genetic testing service and that this method can be used to modify arbitrary
single nucleotide polymorphisms (SNPs) in multiplex to create customized
doctored genetic profiles. This capability has implications for the use of
DTC-generated results and the outcomes of their downstream analyses.

Paper


DNA SEQUENCING FLOW CELLS AND THE SECURITY OF THE MOLECULAR-DIGITAL INTERFACE

sequencing is the molecular-to-digital conversion of DNA molecules, which are
made up of a linear sequence of bases (A,C,G,T), into digital information.
Central to this conversion are specialized fluidic devices, called sequencing
flow cells, that distribute DNA onto a surface where the molecules can be read.
As more computing becomes integrated with physical systems, we set out to
explore how sequencing flow cell architecture can affect the security and
privacy of the sequencing process and downstream data analysis. In the course of
our investigation, we found that the unusual nature of molecular processing and
flow cell design contributes to two security and privacy issues. First, DNA
molecules are `sticky' and stable for long periods of time. In a manner
analogous to data recovery from discarded hard drives, we hypothesized that
residual DNA attached to used flow cells could be collected and resequenced to
recover a significant portion of the previously sequenced data. In experiments
we were able to recover over 23.4% of a previously sequenced genome sample and
perfectly decode image files encoded in DNA, suggesting that flow cells may be
at risk of data recovery attacks. Second, we hypothesized that methods used to
simultaneously sequence separate DNA samples together to increase sequencing
throughput (multiplex sequencing), which incidentally leaks small amounts of
data between samples, could cause data corruption and allow samples to
adversarially manipulate sequencing data. We find that a maliciously crafted
synthetic DNA sample can be used to alter targeted genetic variants in other
samples using this vulnerability. Such a sample could be used to corrupt
sequencing data or even be spiked into tissue samples, whenever untrusted
samples are sequenced together. Taken together, these results suggest that, like
many computing boundaries, the molecular-to-digital interface raises potential
issues that should be considered in future sequencing and molecular sensing
systems, especially as they become more ubiquitous.

Paper


GENOTYPE EXTRACTION AND FALSE RELATIVE ATTACKS: SECURITY RISKS TO THIRD-PARTY
GENETIC GENEALOGY SERVICES BEYOND IDENTITY INFERENCE

Customers of direct-to-consumer (DTC) genetic testing services routinely
download their raw genetic data and give it to third-party companies that
support additional features. One type of analysis, called genetic genealogy,
uses genetic data and genealogical methods to find new relatives. While genetic
genealogy is quite popular, it has raised new privacy concerns. Genetic
genealogy services can be leveraged to find the person corresponding to
anonymous genetic data and have been used dozens of times by law enforcement to
solve crimes. We hypothesized that the open design and broad API offered by some
genetic genealogy services raise other significant security and privacy issues.
To test this hypothesis, we analyzed the security practices of GEDmatch, the
largest third-party genetic genealogy service. Here, we experimentally show how
the GEDmatch API is vulnerable to a number of attacks from an adversary that
only uploads normally formatted genetic data files and runs standard queries.
Using a small number of specifically designed files and queries, an attacker can
extract a large percentage of the genetic markers from other users; 92% of
markers can be extracted with 98% accuracy, including hundreds of medically
sensitive markers. We also find that an adversary can construct genetic data
files that falsely appear like relatives to other samples in the database; in
certain situations, these false relatives can be used to make the
re-identification of genetic data more difficult. These attacks are possible
because of the rich set of features supported by the API, including detailed
visualizations, that are meant to enhance usability. We conclude with security
recommendations for genetic genealogy services.

FAQ and Paper


COMPUTER SECURITY AND PRIVACY IN DNA SEQUENCING

The rapid improvement in DNA sequencing has sparked a big data revolution in
genomic sciences, which has in turn led to a proliferation of bioinformatics
tools. To date, these tools have encountered little adversarial pressure. This
paper evaluates the robustness of such tools if (or when) adversarial attacks
manifest. We demonstrate, for the first time, the synthesis of DNA which – when
sequenced and processed – gives an attacker arbitrary remote code execution. To
study the feasibility of creating and synthesizing a DNA-based exploit, we
performed our attack on a modified downstream sequencing utility with a
deliberately introduced vulnerability. After sequencing, we observed information
leakage in our data due to sample bleeding. While this phenomena is known to the
sequencing community, we provide the first discussion of how this leakage
channel could be used adversarially to inject data or reveal sensitive
information. We then evaluate the general security hygiene of common DNA
processing programs, and unfortunately, find concrete evidence of poor security
practices used throughout the field. Informed by our experiments and results, we
develop a broad framework and guidelines to safeguard security and privacy in
DNA synthesis, sequencing, and processing.

FAQ and Paper