2020_USA20_ACB-T10_01_protect-privacy-in-a-data-driven-world-privacy-preserving-machine-learning

SESSION ID: ACB-T10 Protecting Privacy in a Data-Driven World: Privacy-Preserving Machine Learning Casimir Wierzynski Senior Director, AI Products Intel @casimirw © 2020 Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. #RSAC #RSAC Machine learning enables new services using sensitive data • Thanks to ML/AI we enjoy innovative products and services • But the data that feed them are very sensitive and personal • We must find ways to unlock the power of AI while protecting data privacy 2 #RSAC Current approaches to privacy and ML User control Prescribe user’s rights Data protection Anonymize Encrypt Know what’s collected, by whom, why, opt out.. Remove “identifiable information” But identity can be inferred in many ways  Need more protections on both fronts 3 Other names and brands may be claimed as the property of others. Encrypt at rest, in transit But it’s decrypted during use Current approaches to AI require complex webs of trust • With digital assets: “sharing” = ”giving” + “trust” • Machine learning is fundamentally a multi-stakeholder computation: Training data owner Inference data owner ML service provider Model owner Supply chain Infrastructure provider 4 Complex web of trust #RSAC What if untrusted parties could do machine learning together? Finance / Insurance 8% Healthcare 8% Retail 6% TOTAL 22% of US GDP Source: https://www.bea.gov/system/files/2019-04/gdpind418_0.pdf #RSAC Rival banks could build joint antimoney laundering models Hospitals could use remote, 3rd party analytics on patient data Retailers could monetize their purchase data while protecting user privacy Other names and brands may be claimed as the property of others. Introducing privacy-preserving machine learning (PPML) #RSAC Using cryptography and statistics, you can do “magic”: Federated learning, Multi-party Computation Homomorphic Encryption Differential privacy math results Basically the same math You can pool your data without sharing it You can do machine learning while data stays encrypted results You can collect personal data with quantifiable privacy protections We can amplify these building blocks using Trusted Execution Environments (TEEs), eg Intel SGX https://commons.wikimedia.org/wiki/File:Twemoji_1f607.svg https://commons.wikimedia.org/wiki/File:Twemoji2_1f608.svg FL HE TEE DP PPML use case: monetizing private data and insights • Bank hires “AI company” for fraud model Retailers have private data • They update the model using private data Insights consumer bank Is fraud? transaction • AI Co. Model owner TEE Δ model prev. model #RSAC Δ model + + retail data prev. model noise noise retail data Data owner FL HE TEE DP MPC PPML use case: monetizing private data and insights bank Is fraud? transaction • • Bank hires “AI company” for fraud model AI Co. Retailers have private data • They update the model using private data • With MPC, model stays private #RSAC model shares shares noise noise Data producer shares Data producer #RSAC Federated Learning To improve performance of ML system  get more data! more accurate Model accuracy fuels demand for bigger datasets More data Hestness, Joel, et al. "Deep learning scaling is predictable, empirically." arXiv preprint arXiv:1712.00409 (2017). 10 #RSAC #RSAC The data silo problem • Privacy / Legality (HIPAA / GDPR) • Data too valuable (or value unknown) • Data too large to transmit 11 Federated learning part 1: train locally and aggregate b Σ c Aggregator a 12 #RSAC Federated learning part 2: share aggregate; goto step 1 b Σ c Aggregator a 13 #RSAC Federated learning (FL): some care required #RSAC • FL solves a lot of data access problems. Security / privacy • Data holders can see the model • Data holders can tamper with the protocol • Model updates leak information #RSAC https://software.intel.com/en-us/sgx #RSAC Federated learning with Intel SGX b Σ c Aggregator a 16 A vision for protecting FL with Intel® SGX Confidentiality • Model IP won’t be stolen. • Attacks can’t be computed. Integrity & attestation • Only approved models/training procedures. • All participants know rules are enforced. • Algorithmic defenses can’t be bypassed. Stops attackers from using the model. Stops attackers from being adaptive. No