DOAJ Open Access 2025

An information theoretic limit to data amplification

S J Watts L Crow

Abstrak

In recent years generative artificial intelligence has been used to create data to support scientific analysis. For example, generative adversarial networks (GANs) have been trained using Monte Carlo simulated input and then used to generate data for the same problem. This has the advantage that a GAN creates data in a significantly reduced computing time. $N$ training events for a GAN can result in $NG$ generated events with the gain factor $G$ being greater than one. This appears to violate the principle that one cannot get information for free. This is not the only way to amplify data so this process will be referred to as data amplification which is studied using information theoretic concepts. It is shown that a gain greater than one is possible whilst keeping the information content of the data unchanged. This leads to a mathematical bound, $2\log (\text{Generated}\ \text{Events}) \unicode{x2A7E} {\text{3log(Training Events)}}$ , which only depends on the number of generated and training events. This study determined the conditions for both the underlying and reconstructed probability distributions to ensure this bound. In particular, the resolution of variables in amplified data is not improved by the process but the increase in sample size can still improve statistical significance. The bound was confirmed using computer simulation and analysis of GAN generated data from the literature.

Topik & Kata Kunci

Computer engineering. Computer hardware Electronic computers. Computer science

Penulis (2)

S J Watts

L Crow

Format Sitasi

APA MLA BibTeX

Watts, S.J., Crow, L. (2025). An information theoretic limit to data amplification. https://doi.org/10.1088/2632-2153/add78d

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.1088/2632-2153/add78d

Informasi Jurnal

Tahun Terbit: 2025
Sumber Database: DOAJ
DOI: 10.1088/2632-2153/add78d
Akses: Open Access ✓