blog.oceanprotocol.com Open in urlscan Pro
52.6.3.192  Public Scan

Submitted URL: https://blog.oceanprotocol.com/how-ocean-compute-to-data-relates-to-other-privacy-preserving-technology-b4e1c330483
Effective URL: https://blog.oceanprotocol.com/how-ocean-compute-to-data-relates-to-other-privacy-preserving-technology-b4e1c330483?gi=8e1480de...
Submission: On December 25 via api from LV — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Open in app

Sign up

Sign In

Write


Sign up

Sign In


Published in

Ocean Protocol

Trent McConaghy
Follow

May 28, 2020

·
10 min read
·

Listen



Save







HOW DOES OCEAN COMPUTE-TO-DATA RELATE TO OTHER PRIVACY-PRESERVING APPROACHES?


A SURVEY SPANNING FEDERATED LEARNING, HOMOMORPHIC ENCRYPTION, MULTI-PARTY
COMPUTE, AND MORE


Creatures of the ocean love their privacy! Here’s an octopus hiding in sand. Can
you see it? [Image: CC-BY-SA 4.0]


INTRODUCTION

At Ocean Protocol, we recently released Ocean Compute-to-Data. It helps AI
practitioners access valuable, private data for more accurate AI models. Data
owners get to retain privacy and control over their data.

Compute-to-Data works as follows. First, Data owners approve AI algorithms to
run on their data. Then, Compute to Data orchestrates remote computation and
execution on data to train AI models. The compute is sufficiently “aggregating”
or “anonymizing” that the privacy risk is minimized. Yet it results in a model
that’s useful for research or business.

This article asks: how does Ocean Compute-to-Data relate to other
privacy-preserving approaches?

Here’s the quick answer: it’s complementary. Each technology has its own usage,
and its own constraints.

We’ll now give a more detailed answer, in a fashion that’s approachable to less
deeply-technical audience. We survey some notable privacy-preserving
technologies. For each, we discuss its challenges, how those challenges are
being addressed, and how the technique relates to Ocean. We do the same for
Ocean Compute-to-Data. We conclude with a broader discussion of Ocean in the
privacy-preserving ecosystem.


SURVEY


ENCRYPTION AND DECRYPTION

Encryption transforms data into a form that can be safely sent across an
insecure channel. When received, the receiver uses a key to transform the data
back into its original plaintext form.

Symmetric encryption is when the same key is used to encrypt and decrypt;
schemes like Diffie-Hellman are used to send the key itself safely across an
insecure channel.

Asymmetric encryption has “public keys and private keys, coming in pairs. What
one does, the other undoes” [Ref]. Alice encrypts a message with Bob’s public
key, then sends the message across an insecure channel. Only Bob can decrypt it,
with his private key.

Encryption and decryption are widely used, for applications like secure
web-based payments (the “https” you see in your browser) and secure messaging
(end-to-end encryption such as Signal).

Ocean Protocol uses encryption/decryption as part of its access control
infrastructure.


HOMOMORPHIC ENCRYPTION (HE)

In HE, compute is performed on encrypted data. Therefore, non-trusted parties
can perform compute without ever learning the contents of the data.

Challenge: HE is still too computationally intensive to be used in most
applications.

Towards solving: Speed will continue to improve with time due to better
algorithms, faster chips, and dedicated chips.

HE is a remarkable idea, almost like it’s out of science fiction. We look
forward to when it scales enough to work in more applications, as it will be
useful to have as part of the Ocean technology stack. It will combine well with
Ocean’s other features like data asset management and marketplaces.


SECURE ENCLAVES / TRUSTED EXECUTION ENVIRONMENTS (TEE)

In TEE, computation is performed in special chips that can see the private data
but are severely restricted with what information they can share with their host
machine. Intel SGX is the most prominent hardware example.

Challenge: any security flaw found in the chips renders the chip useless, and
there is a history of this happening.

Towards solving: TEE chips have been hardening over time; today we’re
approximately at the threshold of production usage.

TEEs play well with Ocean: Ocean can manage data assets which then have
computation performed in TEEs; and results come back to Ocean.

Related, Oasis Labs leverages blockchain to manage secure enclave-based compute.
There is opportunity for integration of Ocean and Oasis.


Aquariums are a bit like trusted execution environments… for dangerous sea
creatures. [Image: CC0]


MULTI-PARTY COMPUTE (MPC)

In MPC, the compute task is broken into small sub-tasks; a different party
performs each sub-task; and the results are merged.

Challenge: bandwidth can be a bottleneck because it requires a lot of
communication between the parties.

Towards solving: researchers are working to reduce bandwidth needs.

MPC plays well with Ocean: Ocean for data asset management, MPC for compute. For
example, here’s a prototype integration doing image classification for a
healthcare use case.

The Enigma blockchain project focuses on TEEs and MPC. Therefore there are
future opportunities for integration with Ocean and Enigma.


ZERO-KNOWLEDGE PROOFS (ZKPS)

In ZKPs, Alice asks Bob if Bob knows x, and Bob can provably reply without
leaking information.

Constraints: ZKPs require interactive sessions, scale poorly, and only answer
binary questions.

Towards solving: First, some use cases are perfectly ok with the constraints of
ZKPs. Perhaps the most famous example in blockchain is ZCash, which offers
Bitcoin-like functionality (e.g. prevent double spending), but without leaking
Personally Identifiable Information (PII). Second, there is steady progress to
loosen the constraints given above, especially the scaling part.

In requiring interactive sessions and binary outputs, ZKPs are less directly
applicable to Ocean on the AI side. However, we are excited about the future of
ZKPs elsewhere for Ocean. Like in Zcash, they could be helpful to reduce PII
leakage about blockchain transactions themselves. For example, Zokrates provides
private transactions in Ethereum. Furthermore, with ZK Rollups (or its more
lightweight Optimistic cousin) there is great promise for blockchain scalability
in addition to privacy.


Here’s a sea moth looking to minimize its information leakage. [Image: Matt
Kieffer CC-BY-SA 2.0]


SYNTHETIC DATA

In synthetic data generation, a probability density function (PDF) is computed
or “learned” from the original dataset, next to the data itself. Then, millions
of datapoints can be drawn from the PDF and shared. Those datapoints are
naturally “anonymized”, which reduces risk of personally-identifiable
information (PII) leaking.

Challenge 1: not flexible. PDF construction is essentially doing AI-style
modeling, where the choice of the algorithm is made by the provider of the
synthetic data generation technology.

Challenge 2: less accurate. There’s now modeling in two layers — the PDF and the
final AI model built by the AI practitioner. Modeling error compounds.
Furthermore, if the PDF is overfit, PII will leak.

Towards solving: Problem 1 is addressed by letting the AI practitioner build the
PDF themselves. Problem 2 is addressed if the AI practitioner simply builds a
single model themselves next to the data. And then, you have Ocean
Compute-to-Data (!). So Synthetic Data is a poor approach to AI modeling.
However, Synthetic Data is still useful for visualization to gain intuition on
the (synthetic) data, such as 2D or 3D scatterplot visualizations on synthetic
data. This is what makes Synthetic Data complementary to Ocean.


FEDERATED LEARNING (FL)

In FL, a neural network is randomly initialized. Then, weight updates are
computed next to the data itself in data silo #1, and sent to the neural
network. This is repeated in data silo #2, #3, and so on. In the end, a neural
network has been trained across many data silos, without data leaving the
premises of each respective silo.

TensorFlow Federated (TFF) and OpenMined are the most prominent FL projects. TFF
does orchestration in a centralized fashion and OpenMined decentralized. Google
Federated Analytics takes a cue from FL and computes simpler aggregate values
such as averages.

Challenge: in TFF-style FL, a centralized entity (e.g. Google) must perform the
orchestration of compute jobs across silos. So, PII can leak to this entity.

Towards solving: OpenMined addresses this via decentralized orchestration. But
its software infrastructure could use improvement to manage computation at each
silo in a more secure fashion; this is where Compute-to-Data can help.


DECOUPLED HASHING (DH)

DH is less well-known than other techniques surveyed but it’s worth
understanding. We first review traditional Feature Hashing (FH). FH trains an AI
model as follows: (1) On training data, create a hash for each {input variable,
input value} combination. (2) Apply a learning algorithm to learn a weight for
each hash. It runs on new / testing inputs as follows: (1) On test data, create
a hash for each {input variable, input value} combination. (2) Run the hashes
through the trained model.

Traditionally, all the steps are done on the same machine. But they don’t need
to be! This is the idea of DH. DH does training step (1) and testing step (1)
next to the data. The result is naturally anonymized. Training step (2) and
testing step (2) can be done anywhere by anyone, without seeing any private
information.

DH is pragmatic: it has minimal information leakage, scales well, and doesn’t
require new leaps in technology or science. The remaining challenge is how to
set up the infrastructure to separate steps (1) and (2), and to coordinate the
actors on each side. Ocean Compute-to-Data can help with infrastructure and
coordination to lower barriers to using DH.


DIFFERENTIAL PRIVACY (DP)

DP “is a system for publicly sharing information about a dataset by describing
the patterns of groups within the dataset while withholding information about
individuals in the dataset.” The main tactic is to add random noise to each
input datapoint so that any actor reviewing statistics derived from all the
datapoints can’t extract PII.

DP can enhance the privacy of of other techniques. It’s crucial for synthetic
data: DP is the main accepted way of generating it in a provably private way. DP
has been shown to help Federated Learning, for example here. DP holds potential
for Compute-to-Data contexts too.


Stonefish (trying to hide) in coral. [Image: Matt Kieffer CC-BY-SA 2.0]


COMPUTE-TO-DATA

The main idea of Compute-to-Data is to bring computation to the data, where the
data stays on-premise. The compute results returned are sufficiently aggregated
or anonymized that the privacy risk is minimized.

Ocean Compute-to-Data draws on a lineage of related ideas and technologies.
Database researchers have explored the idea of compute next to the data since
the 1970s; the modern incarnation is near-memory computing and near-data
computing. As discussed, FL brings compute next to data for training AI models
across many data silos, albeit with centralized orchestration). FL started to
gain traction in 2015. The Fitchain project also brought compute next to data,
including collaboration with Ocean in 2018. It has a commercial spinoff.
Finally, an academic paper from Algorand recently proposed a technology that
brings compute to data.

Ocean brings the idea of compute-to-data into its ecosystem of blockchain-based
access control (platform level) and data marketplaces to buy and sell private
data while preserving privacy. It’s a long lineage of ideas and tech, all around
a shared movement of regaining control of our data. We’re proud to be part of
that movement.

In Ocean Compute-to-Data, data owners approve AI algorithm scripts to run on
their data, then Compute to Data orchestrates remote computation and execution
on data to train AI models.

Challenge: there’s a risk that the script supplied leaks PII. This has two
variants: (a) malicious, and (b) overfitting.

In (a), the script has special code that sends the data to the script supplier.
The supplier would obfuscate this code via an easy-to-miss special import like
“import sk_learn” (versus the correct version “import sklearn”). The special
library wraps sklearn, but injects copying.

In (b), the model learns too much detail, so that PII can be extracted from it.
An extreme example is: in CART tree training, learning each branch only stops
when the leaf node has a single datapoint. Or, neural network could get overfit
it has a large number of parameters compared to its datapoints, and it doesn’t
do regularization in training.

To solve: The Data Provider chooses what algorithms to trust. Therefore it’s the
same entity that risks private data getting exposed and chooses what algorithm
to trust. It is their choice to make, based on their risk-reward preference. For
(a): they simply do inspection. For (b): some algorithms are easy to trust, like
averaging or learning a logistic regression model with linear basis functions.
But for more advanced modeling, it’s a bit more of a burden. To ease that, we
envision a rise of community-curated scripts with skin-in-the-game (staking) to
help “harden” the most useful or promising scripts over time.


Bringing the action to where it’s secure: here’s an octopus hiding in a clam
shell. [Image: arhnue CC0]


OCEAN AND THE PRIVACY-PRESERVING ECOSYSTEM

Ocean Compute-to-Data’s properties make it useful for now. It’s less burdened by
some of the issues that have slowed adoption of some privacy-preserving
techniques. This is not by accident: when we first started exploring how to
preserve privacy in Ocean, we reviewed the approaches surveyed above, and
realized that bringing compute to data was the most pragmatic choice for the
near term.

But other approaches are maturing nicely. Ocean Protocol is not constrained to
just compute-to-data as a privacy preserving technique. As time goes on and
other techniques mature, we envision other techniques being used in conjunction
with Ocean.

Of particular interest is FL, which is closest in spirit to Ocean
Compute-to-Data, since FL also brings compute to data. In fact, FL is
complementary to Ocean: FL does higher-level management across many data silos,
and Ocean securely manages computation at a given silo. We’re especially excited
about integrations with OpenMined FL technology.

OpenMined is interesting to Ocean more generally. It’s evolved from being a pure
FL technology to become a broader toolbox of open “connective tissue” software
for privacy-preserving AI technologies, alongside a large and growing community.
We look forward to further interactions with the OpenMined community.


CONCLUSION

This article asked: how does Ocean Compute-to-Data relate to other
privacy-preserving approaches?

We see that Ocean is complementary. Each technology has its own usage, its its
own constraints, and its own complementary relation to Ocean.
Encryption/decryption, HE, TEE, MPC, and ZKPs sit side-by-side with Ocean. DP
can enhance Compute-to-Data further. Synthetic data, and FL flows are directly
improved by Compute-to-Data.


ACKNOWLEDGEMENTS

Special thanks to Andrew Trask, David Holtzman, Bruce Pon, Adam Drake, and
Julien Thevenard for providing feedback on this article.


FURTHER READING

OpenMined has an excellent series on privacy-preserving data science, starting
with this article.


MAIN ARTICLE UPDATES

 * May 31, 2020: Added section on Decoupled Hashing.



Follow Ocean Protocol via our Newsletter and Twitter; chat with us on Telegram
or Discord; and build on Ocean starting at our docs.

Homepage
Privacy
Data
Artificial Intelligence
Deeptech
Thanks to Julien Thevenard


539

539

1




539



1




MORE FROM OCEAN PROTOCOL

Follow

A New Data Economy

Ocean Protocol Team

·May 27, 2020


OCEAN PROTOCOL LAUNCHES COMPUTE-TO-DATA

With the latest Ocean release, enterprises can sell data while preserving
privacy and AI practitioners can access private data to advance research —
[PRESS RELEASE] Singapore — May 26th, 2020 — Ocean Protocol, a decentralized
data exchange protocol to unlock data for AI, announces the release of
Compute-to-Data, which enables sharing, buying and selling data while preserving
privacy. Private data can help research in life-altering innovations in science
and technology. The more data…

Product

3 min read



Product

3 min read




--------------------------------------------------------------------------------

Share your ideas with millions of readers.

Write on Medium

--------------------------------------------------------------------------------

Diksha Dutta

·May 23, 2020


DEMYSTIFYING DATA TRUSTS AND COLLECTIVE CONSENT IN THE WORLD OF DATA PRIVACY

Anouk Ruhaak, Mozilla and AlgorithmWatch Fellow, on the future of data — For the
third episode of Ocean’s podcast Voices of the Data Economy, we spoke to Anouk
Ruhaak, who is currently researching and developing data governance models as a
Mozilla Fellow embedded with AlgorithmWatch. …

Homepage

4 min read



Homepage

4 min read




--------------------------------------------------------------------------------

Ocean Protocol Team

·May 22, 2020


V2 OCEAN COMPUTE-TO-DATA RELEASE

Unlocking Private Data while Preserving Privacy — The Ocean Protocol team aims
to fuel an open data economy, by enabling data owners and consumers to securely
exchange and monetize data in a safe and secure manner. We’ve spent countless
hours coding, building a global community of thousands and establishing
partnerships with organizations that believe in our vision…

Homepage

3 min read



Homepage

3 min read




--------------------------------------------------------------------------------

Manan Patel

·May 19, 2020


TECHNICAL GUIDE TO OCEAN COMPUTE-TO-DATA

An overview of our v2 release, Ocean Compute-to-Data — [Note from Nov 2021: some
of the content in this post is obsolete, as V3 and later Ocean releases interact
with Compute-to-Data in slightly different ways. Please refer to
oceanprotocol.com/technology/compute-to-data for up-to-date info.] With the v2
Compute-to-Data release, Ocean Protocol provides a means to exchange data while
preserving privacy. This…

Homepage

8 min read



Homepage

8 min read




--------------------------------------------------------------------------------

Sheridan Johns

·Apr 29, 2020


DECENTRALIZATION VIA COLLABORATION — THE OCEAN PROTOCOL PARTNER PROGRAM

Fueling the Web3 Movement with Strategic Co-Creation — Breaking down data silos
is hard, but rewarding work. When we set out to build a world where proprietary
data can be shared without compromising data security and data privacy, it was
clear that Ocean would need the support and expertise of the broader community
for this ambitious mission. Over…

Community

3 min read



Community

3 min read




--------------------------------------------------------------------------------

Read more from Ocean Protocol


RECOMMENDED FROM MEDIUM

Dmytro Naumets

LAMINAR MARKETS | ZELLIC SECURITY ASSESSMENT REPORT



Xcoder(Joy ahmed)

[BAC/IDOR] HOW MY FATHER CREDIT CARD HELP ME TO FIND THIS ACCESS CONTROL ISSUE



Charmion Byers

{UPDATE} TURBO DIRT BIKE SPRINT HACK FREE RESOURCES GENERATOR



Cloud Journey

AZURE FIREWALL POLICY AND HUB VNET



Mohammad Ali | @0xMohd

THEY LIED ABOUT TOTOK



ATNET Airdrops & Trading Tools

in

Cryptolounge

ENDING SOON AIRDROPS — 19 NOV



Rom

in

Rom’s Ramblings

IS STAYSAFE.PH SAFE?



CyberVein

CYBERVEIN WEEKLY REPORT 01/25/2021–01/29/2021



AboutHelpTermsPrivacy

--------------------------------------------------------------------------------


GET THE MEDIUM APP




TRENT MCCONAGHY

6.8K Followers

Trent McConaghy. @OceanProtocol , AI, data, Web3, #TokenEngineering, MCV.
www.trent.st

Follow



MORE FROM MEDIUM

Jeffrey Scholz

in

RareSkills

BLOCKCHAIN JOB TIER LIST



Mark Vassilevskiy

5 UNIQUE PASSIVE INCOME IDEAS — HOW I MAKE $4,580/MONTH



Ren & Heinrich

in

DataDrivenInvestor

I ANALYZED 200 DEFI PROJECTS. HERE IS WHAT I FOUND OUT.



Ann

in

Crypto 24/7

THESE NEW DEFI PROTOCOLS ARE FREAKING IMPRESSIVE



Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.