Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
Abstrak
Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of large-scale unified medical datasets and hindering the development of powerful medical foundation models. In this work, we present the largest survey to date of medical image datasets, covering over 1,000 open-access datasets with a systematic catalog of their modalities, tasks, anatomies, annotations, limitations, and potential for integration. Our analysis exposes a landscape that is modest in scale, fragmented across narrowly scoped tasks, and unevenly distributed across organs and modalities, which in turn limits the utility of existing medical image datasets for developing versatile and robust medical foundation models. To turn fragmentation into scale, we propose a metadata-driven fusion paradigm (MDFP) that integrates public datasets with shared modalities or tasks, thereby transforming multiple small data silos into larger, more coherent resources. Building on MDFP, we release an interactive discovery portal that enables end-to-end, automated medical image dataset integration, and compile all surveyed datasets into a unified, structured table that clearly summarizes their key characteristics and provides reference links, offering the community an accessible and comprehensive repository. By charting the current terrain and offering a principled path to dataset consolidation, our survey provides a practical roadmap for scaling medical imaging corpora, supporting faster data discovery, more principled dataset creation, and more capable medical foundation models.
Penulis (127)
Zhongying Deng
Cheng Tang
Ziyan Huang
Jiashi Lin
Ying Chen
Junzhi Ning
Chenglong Ma
Jiyao Liu
Wei Li
Yinghao Zhu
Shujian Gao
Yanyan Huang
Sibo Ju
Yanzhou Su
Pengcheng Chen
Wenhao Tang
Tianbin Li
Haoyu Wang
Yuanfeng Ji
Hui Sun
Shaobo Min
Liang Peng
Feilong Tang
Haochen Xue
Rulin Zhou
Chaoyang Zhang
Wenjie Li
Shaohao Rui
Weijie Ma
Xingyue Zhao
Yibin Wang
Kun Yuan
Zhaohui Lu
Shujun Wang
Jinjie Wei
Lihao Liu
Dingkang Yang
Lin Wang
Yulong Li
Haolin Yang
Yiqing Shen
Lequan Yu
Xiaowei Hu
Yun Gu
Yicheng Wu
Benyou Wang
Minghui Zhang
Angelica I. Aviles-Rivero
Qi Gao
Hongming Shan
Xiaoyu Ren
Fang Yan
Hongyu Zhou
Haodong Duan
Maosong Cao
Shanshan Wang
Bin Fu
Xiaomeng Li
Zhi Hou
Chunfeng Song
Lei Bai
Yuan Cheng
Yuandong Pu
Xiang Li
Wenhai Wang
Hao Chen
Jiaxin Zhuang
Songyang Zhang
Huiguang He
Mengzhang Li
Bohan Zhuang
Zhian Bai
Rongshan Yu
Liansheng Wang
Yukun Zhou
Xiaosong Wang
Xin Guo
Guanbin Li
Xiangru Lin
Dakai Jin
Mianxin Liu
Wenlong Zhang
Qi Qin
Conghui He
Yuqiang Li
Ye Luo
Nanqing Dong
Jie Xu
Wenqi Shao
Bo Zhang
Qiujuan Yan
Yihao Liu
Jun Ma
Zhi Lu
Yuewen Cao
Zongwei Zhou
Jianming Liang
Shixiang Tang
Qi Duan
Dongzhan Zhou
Chen Jiang
Yuyin Zhou
Yanwu Xu
Jiancheng Yang
Shaoting Zhang
Xiaohong Liu
Siqi Luo
Yi Xin
Chaoyu Liu
Haochen Wen
Xin Chen
Alejandro Lozano
Min Woo Sun
Yuhui Zhang
Yue Yao
Xiaoxiao Sun
Serena Yeung-Levy
Xia Li
Jing Ke
Chunhui Zhang
Zongyuan Ge
Ming Hu
Jin Ye
Zhifeng Li
Yirong Chen
Yu Qiao
Junjun He
Akses Cepat
- Tahun Terbit
- 2026
- Bahasa
- en
- Sumber Database
- arXiv
- Akses
- Open Access ✓