44 min

Privacy-aware Data Pipelines with Skyflow’s Piper Keyes Partially Redacted: Data, AI, Security, and Privacy

    • Technology

A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and increased ROI.
However, despite your best efforts, sensitive customer data tends to find its way into our analytics pipelines, ending up in our data warehouses and metrics dashboards. Replicating customer PII to your downstream services greatly increases your compliance scope and makes maintaining data privacy and security significantly more challenging.
In this episode, Engineering Lead at Skyflow Piper Keyes joins the show to discuss what goes into building a privacy-aware data pipeline, what tools and technologies should you be using, and how Skyflow addresses this problem.
Topics:
What is a data analytics pipeline?
What does it mean to build a privacy-aware data pipeline?
Can you give some examples of use cases where privacy-aware data pipelines are particularly important?
What’s it mean to de-identify data and how does that work?
What are some common techniques used to preserve privacy in data pipelines?
How does analytics work for de-identified data?
How do you balance the need for data privacy with the need for actually being able to use the data?
What’s it take to build a privacy-aware pipeline from scratch?
What are some of the biggest challenges in building privacy-aware data pipelines?
How does something like this work with Skyflow?
Let’s say I have customer’s transactional data from Visa, how could I ingest that data into my data warehouse but avoid having to build PCI compliance infrastructure? Walk me through how that works.
Could you build a machine learning model based on the de-identified data?
Once I have the data in my warehouse, let’s say I needed to inform a clinical trial participant about an issue but I also want to maintain their privacy, how could I perform an operation like that?
What other use cases does this product enable?
Resources:
Running Secure Workflows with Sensitive Customer Data
Maximize Privacy while Preserving Utility for Data Analytics

A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and increased ROI.
However, despite your best efforts, sensitive customer data tends to find its way into our analytics pipelines, ending up in our data warehouses and metrics dashboards. Replicating customer PII to your downstream services greatly increases your compliance scope and makes maintaining data privacy and security significantly more challenging.
In this episode, Engineering Lead at Skyflow Piper Keyes joins the show to discuss what goes into building a privacy-aware data pipeline, what tools and technologies should you be using, and how Skyflow addresses this problem.
Topics:
What is a data analytics pipeline?
What does it mean to build a privacy-aware data pipeline?
Can you give some examples of use cases where privacy-aware data pipelines are particularly important?
What’s it mean to de-identify data and how does that work?
What are some common techniques used to preserve privacy in data pipelines?
How does analytics work for de-identified data?
How do you balance the need for data privacy with the need for actually being able to use the data?
What’s it take to build a privacy-aware pipeline from scratch?
What are some of the biggest challenges in building privacy-aware data pipelines?
How does something like this work with Skyflow?
Let’s say I have customer’s transactional data from Visa, how could I ingest that data into my data warehouse but avoid having to build PCI compliance infrastructure? Walk me through how that works.
Could you build a machine learning model based on the de-identified data?
Once I have the data in my warehouse, let’s say I needed to inform a clinical trial participant about an issue but I also want to maintain their privacy, how could I perform an operation like that?
What other use cases does this product enable?
Resources:
Running Secure Workflows with Sensitive Customer Data
Maximize Privacy while Preserving Utility for Data Analytics

44 min

Top Podcasts In Technology

Acquired
Ben Gilbert and David Rosenthal
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Lex Fridman Podcast
Lex Fridman
Hard Fork
The New York Times
TED Radio Hour
NPR
Darknet Diaries
Jack Rhysider