Loading...
Thumbnail Image
Item

Addressing data scarcity with synthetic data: A secure and GDPR-compliant cloud-based platform

Borovits,Nemania
Bardelloni,Gianluigi
Hashemi,Hossein
Tulsiani,Masoom
Tamburri,Damian Andrew
van den Heuvel,Willem-Jan
Abstract
This study presents a cloud-based platform for synthetic data generation, validation and evaluation, developed to address data scarcity in the telecommunications sector while ensuring compliance with the General Data Protection Regulation (GDPR). In collaboration with a Dutch telecommunications provider facing data scarcity due to low user-consent rates, we developed a platform that allows synthetic data vendors to securely generate synthetic data based on schema input without accessing sensitive information. Vendors uploaded containerized executables for synthetic data generation and the platform automated infrastructure provisioning, ensuring no access to personal data. A validation mechanism minimized the risk of re-identification by ensuring that the synthetic data did not inadvertently replicate real data points. We mutually agreed with the vendors on five evaluation metrics and the platform logged and calculated performance for each, allowing them to refine their algorithms. To validate the platform’s performance, we conducted an offline study with the TV viewership team, using each vendor’s synthetic data to generate viewership categories. The vendor with the best evaluation metrics also produced categories most similar to the real data, confirming the platform’s effectiveness. This study, involving two vendors and a telecommunications company, demonstrated the platform’s applicability in addressing business challenges while ensuring privacy compliance.
Description
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
software engineering, designing software, information systems, information integration, security, privacy-preserving protocols, privacy
Citation
Borovits, N, Bardelloni, G, Hashemi, H, Tulsiani, M, Tamburri, D A & van den Heuvel, W-J 2025, 'Addressing data scarcity with synthetic data : A secure and GDPR-compliant cloud-based platform', ACM Transactions on Software Engineering and Methodology. https://doi.org/10.1145/3732937
Embedded videos