In a world increasingly dominated by data, privacy has become both a precious commodity and a pressing concern. Enter differential privacy to protect individuals’ data in an era where information, like water, is vital but potentially destructive if not properly contained.
“All the other techniques we have for ensuring privacy in data processing have holes in them,” according to Joseph P. Near, an associate professor of computer science at the University of Vermont and data privacy expert. “Differential privacy is the only approach we know of that doesn’t.”
But what exactly is differential privacy, and how does it measure up to the promises it makes?
The mechanism behind differential privacy involves adding a certain amount of “noise” to the data. This noise is designed to be statistically negligible for the overall dataset but significant enough to obscure individual data points. This way, even if an adversary gets their hands on the data, they cannot extract specific information about any individual.
Using differential privacy allows organizations to collect and analyze data without exposing the details of the original individual entries. Imagine you have a dataset containing the ages of all employees in a company. Differential privacy allows you to determine the average age without revealing any specific employee’s age.
While databases can be protected from unauthorized users through various security measures, such as firewalls, encryption, access controls, and authentication mechanisms, the queries used to learn information about the dataset can be vulnerable to attacks from authorized analysts.
Sharing sensitive data usually requires anonymizing and redacting information, which can be time-consuming and resource intensive. But when accessing data using differential privacy, the query results do not reveal any individual data points, and the system can be built such that analysts do not have direct access to the raw data.
But there are challenges in integrating differential privacy into existing systems. “One of the biggest hurdles for widespread adoption is the difficulty of understanding the technology for non-specialized analysts, namely, choosing parameter values that quantify the privacy protection,” says Gonzalo Munilla Garrido, a digital platforms and privacy technologist with the European Commission.
Dr. Munilla Garrido, who got his PhD at Technology University of Munich, has extensive experience working on privacy research. “Without user-friendly tools, organizations need to have a deep understanding of both their data and the nuances of differential privacy to implement it correctly,” he said. (He noted that his comments are his own, and don’t reflect the views of his employer.)
Oasis Labs, co-founded by Dawn Song, a computer science professor from University of California, Berkeley, wants to make the sophisticated concept accessible and usable for businesses and researchers using SQL queries – SQL (Structured Query Language) is a standardized programming language used for managing and manipulating relational databases.
Running SQL queries is the biggest use of data at many companies and Oasis Labs’ product, PrivateSQL, allows them to do so in a differentially private way even if they do not have deep technical expertise in the field.
“At Oasis Labs, we believe that privacy should not be an option, but an essential, integral component of how businesses use data,” said Dr. Song. “Oasis PrivateSQL eases the adoption of differential privacy, enabling data analysts and researchers to harness its capabilities through familiar SQL queries without requiring specialized expertise.”
Many other companies use or offer differential privacy tools, including Apple, Google, Microsoft, Amazon, Uber, and LinkedIn. These tech giants have incorporated differential privacy techniques into various products and services to protect user data while still deriving valuable insights. But University of Vermont’s Prof. Near says that “for people who know very little about differential privacy, there’s almost nothing available” beyond PrivateSQL.
PrivateSQL is designed to integrate seamlessly with existing databases, automatically applying differential privacy techniques to ensure that personal data remains protected while still allowing for valuable insights to be gleaned. This makes it particularly appealing to organizations handling sensitive information, such as healthcare providers, financial institutions, and government agencies.
For instance, if you query an employee database to find the average salary, and a new employee joins the company, querying the average salary again could reveal the new employee’s salary. Differential privacy prevents this by ensuring that the presence or absence of an individual‘s data has no significant impact on the query results. Analysts continue using SQL as usual, but PrivateSQL rewrites the queries to be differentially private, ensuring individual data remains protected.
Oasis Labs’ product offers a user-friendly interface that allows non-experts to apply differential privacy to their datasets with minimal effort. The product includes detailed guides and support to help organizations set appropriate privacy budgets and understand the trade-offs involved.
PrivateSQL supports a wide range of data types and can be integrated with various database systems. It’s compliant with HIPAA, the Health Insurance Portability and Accountability Act that sets a national standard to protect medical records and other personal health information. It allows pharmaceutical companies, for example, to preview datasets before purchasing them. This makes it a versatile tool for organizations with diverse data needs.
Furthermore, the product includes robust auditing capabilities, allowing organizations to monitor and verify that differential privacy is being applied correctly, ensuring compliance with regulatory requirements and internal privacy policies.
The potential applications of differential privacy are vast and varied. In the healthcare sector, for example, it allows researchers to analyze patient data to uncover trends and insights without risking patient confidentiality – a huge blocker in medical research. Financial institutions can use it to detect fraudulent activities while safeguarding customer information. Government agencies can collect and analyze census data to inform policy decisions without compromising citizen privacy.
The U.S. Census Bureau, for example, found that traditional anonymization techniques were insufficient, and implemented differential privacy.
The adoption of differential privacy is a hard sell for internal use cases, particularly for non-tech-savvy companies with legacy pipelines that analyze data as they want, even if their process offers lesser privacy guarantees. “They may say, ‘why would I put additional effort if I never got in trouble with my current process?’” said Dr. Munilla Garrido, who has written extensively on the subject. He said that might ultimately be true for some use cases, but for others, like when using SQL, the added effort offers significant privacy improvements.
Unfortunately, for most companies, implementing better privacy technology is a top-down mandate and motivated by regulation rather than innovation. The friction against improving privacy is even more prevalent in businesses where privacy does not easily align with the core product’s sales. “For such companies, differential privacy might be more compelling when analyzing external data, where privacy constraints are stricter, and the analysis might not have been possible to date for such reason,” Dr. Munilla Garrido said.
And differential privacy methods are not fool proof. Despite adding noise to query results, running the same query multiple times and averaging the results could still reveal true answers. “If you perform the same differential privacy query multiple times, you create a distribution around the original value, which can be used to approximate it,” said Dr. Munilla Garrido.
Oasis Labs’ product gets around this by allowing users to set a “privacy budget,” limiting the kind and number of queries for users. Once this budget is exhausted, no further queries are allowed.
PrivateSQL is designed to be easy to deploy, work with any SQL database, and it supports multiple interfaces, including a REST API. Oasis Labs said it will soon release a no-code solution to make data access easier for non-technical users.
When a user connects their database, PrivateSQL automatically scans the database and generates reasonable privacy parameters. Administrators can then customize privacy settings according to their specific data and needs.
Oasis Labs is also exploring integrations with other privacy-enhancing technologies, such as federated learning, multiparty computation, and homomorphic encryption. These technologies, when combined, can provide even greater levels of privacy and security, opening new possibilities for data analysis and sharing.
Read the full article here