DOAJ Open Access 2025

Establishing a Repository of Synthetic Datasets for Researchers: A Scottish Perspective

Sophie McCall

Abstrak

Objectives This presentation will describe how we co-designed the creation and provision of a single low fidelity synthetic data repository for multiple data controllers, where researchers can access synthetic data for data discovery and code development. Methods From user and public engagement, we identified researcher demand for access to low-fidelity synthetic data prior to real data. Our metadata catalogue collates information on Scottish datasets available for research and is the digital platform for our synthetic data repository, hosting assets generated by partner organisations and ourselves. Embedded into our process are quality checks to assess labelling, structure, disclosure and documentation of synthetic data, with an ‘End User Licence Agreement’ to guard against synthetic data being used inappropriately. These measures give assurance that privacy is preserved whilst making synthetic data as freely available as possible. Results We have completed the pilot phase of this project, establishing a repository of synthetic data in our metadata catalogue, which researchers can apply to for access. We will share the rationale for decisions made during the project, together with challenges faced. Key aspects of consideration are the user journey and promotion of our service within the wider data community. Using data analytics relating to number of synthetic datasets requested and downloaded, together with case studies from researchers, we will establish the success of our project. Finally, we will describe our plans to extend our work, to include hosting more datasets and working with partner organisations to support with their synthetic data generation, to ensure they meets our standards of quality and disclosure. Conclusion Developing a synthetic data repository has been a significant milestone for our organisation and our ambition to improve researcher access to data. By adopting an iterative approach and responding to user and public feedback, our repository has proved to be an exemplar of how to make synthetic data available.

Penulis (1)

S

Sophie McCall

Format Sitasi

McCall, S. (2025). Establishing a Repository of Synthetic Datasets for Researchers: A Scottish Perspective. https://doi.org/10.23889/ijpds.v10i4.3095

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →
Lihat di Sumber doi.org/10.23889/ijpds.v10i4.3095
Informasi Jurnal
Tahun Terbit
2025
Sumber Database
DOAJ
DOI
10.23889/ijpds.v10i4.3095
Akses
Open Access ✓