Privacy-Preserving in Machine Learning

Last Updated : 28 Jun, 2025

The advancement of machine learning (ML) has transformed industries, enabling the extraction of insights from vast datasets. However, as ML systems rely heavily on sensitive data ranging from personal health records to financial details, they raise significant concerns about privacy. Ensuring data confidentiality while leveraging the power of ML is a core challenge in modern AI. Privacy-preserving machine learning (PPML) tackles this issue, aiming to protect user data while still enabling effective model training and predictions.

Techniques in Privacy-Preserving Machine Learning

1. Differential Privacy (DP)

Differential Privacy is a technique that ensures the output of a function remains almost the same whether or not an individual's data is included in the dataset. It works by designing algorithms in a way that the presence or absence of a single person's information doesn't significantly affect the result. This makes it extremely difficult to identify or infer details about any individual, even if their data is part of the database.

Differential-Privacy
Differential Privacy

Mathematical Formulation

An algorithm \mathcal{A} is (\epsilon, \delta)- differentially private if for all datasets D_1 and D_2 differing by one element, and for all S \subseteq \text{Range}(\mathcal{A}),

P[\mathcal{A}(D_1) \in S] \leq e^\epsilon \cdot P[\mathcal{A}(D_2) \in S] + \delta

Here:

  • \epsilon: Privacy loss parameter (smaller \epsilon means stronger privacy).
  • \delta: Relaxation parameter (acceptable probability of violating DP).

Python Code Example

Python
import numpy as np

def add_laplace_noise(data, sensitivity, epsilon):
    scale = sensitivity / epsilon
    noise = np.random.laplace(loc=0, scale=scale, size=data.shape)
    return data + noise

# Example
data = np.array([10, 15, 20])
noisy_data = add_laplace_noise(data, sensitivity=5, epsilon=0.1)
print(f"Noisy Data: {noisy_data}")

Output:

Noisy Data: [ -5.76963332 46.77210639 -41.752765 ]

2. Federated Learning (FL)

Federated Learning is a collaborative machine learning technique where individual devices or institutions train models locally using their own private data. Instead of sharing the data itself, they only send the trained model updates to a central server. These updates are then combined to improve a shared global model. This approach keeps sensitive information on the local side, enhancing privacy and security while still enabling powerful, collective learning.

Federated-learning
Federated Learning

Example

Consider training a predictive keyboard model across smartphones. In FL, each device updates the model locally and sends only the updates (e.g., gradients) to a central server, which aggregates them without accessing personal text data.

Challenges

  • Communication overhead between devices and the server.
  • Potential leakage through model updates.

Python Code Example

Python
import numpy as np

def federated_training(local_datasets, global_model, learning_rate, rounds):
    for _ in range(rounds):
        local_updates = []
        for data in local_datasets:
            # Simulate local training
            model = global_model
            gradient = np.mean(data) - model 
            local_updates.append(gradient)
        
        # Aggregate updates
        global_update = np.mean(local_updates)
        global_model += learning_rate * global_update
    
    return global_model

# Example
local_datasets = [np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]
global_model = 0.0
updated_model = federated_training(local_datasets, global_model, learning_rate=0.1, rounds=10)
print(f"Updated Global Model: {updated_model}")

Output:

Updated Global Model: 3.2566077995000002

3. Homomorphic Encryption (HE)

Homomorphic Encryption is a privacy-preserving method that allows computations to be performed directly on encrypted data. The data stays secure and unreadable throughout the process, even while operations are carried out. Only the owner of the encryption key can decrypt the final results. This makes it possible to process sensitive information without ever exposing it, ensuring confidentiality even during analysis or search.

Homomorphic-Encryption
Homomorphic Encryption

Example

In a medical research scenario, encrypted patient records can be analyzed without revealing the raw data to researchers.

Challenges

  • High computational cost.
  • Limited support for complex operations.

Python Example

Note: Use libraries like PySEAL or TenSEAL for real HE implementation.

Python
def simple_homomorphic_addition(encrypted_a, encrypted_b, encryption_key):
    # Addition on encrypted values
    return encrypted_a + encrypted_b

# Example 
encrypted_a, encrypted_b = 5, 3  # Encrypted values
result = simple_homomorphic_addition(encrypted_a, encrypted_b, encryption_key="dummy_key")
print(f"Encrypted Result: {result}")  

Output:

Encrypted Result: 8

4. Secure Multi-Party Computation (SMPC)

SMPC enables multiple participants to jointly perform computations on their private inputs while keeping those inputs completely hidden from each other. Each participant contributes their part securely, often in the form of encrypted data or digital signatures. The final output is computed in a way that maintains privacy, and a verifier ensures the validity of each individual contribution without accessing the underlying data.

Secure-Multi-Party-Computation
Secure Multi-Party Computation

Example

Banks can jointly calculate the total risk exposure of their clients without sharing individual customer data.

Challenges

  • Complex implementation.
  • Increased communication between parties.

Applications of PPML

  1. Healthcare: Training models on sensitive medical data without compromising patient confidentiality.
  2. Finance: Building credit scoring models while respecting user privacy.
  3. IoT Devices: Securely analyzing data from personal devices like smartphones or smart home gadgets.
  4. Government: Enabling inter-agency collaborations on sensitive information without compromising citizen privacy.

Challenges in PPML

  1. Performance Overheads: Techniques like HE and SMPC significantly increase computational and memory requirements.
  2. Trade-offs: Balancing privacy, utility, and scalability can be difficult.
  3. Complexity of Implementation: Advanced PPML methods often require specialized knowledge and tools.

Similar Articles

Comment