Matthew K. Horton, Patrick Huck, Ruo Xi Yang et al.
Hasil untuk "data science"
Menampilkan 20 dari ~44561522 hasil · dari CrossRef, arXiv, Semantic Scholar
H. Fröhlich, R. Balling, N. Beerenwinkel et al.
BackgroundPersonalized, precision, P4, or stratified medicine is understood as a medical approach in which patients are stratified based on their disease subtype, risk, prognosis, or treatment response using specialized diagnostic tests. The key idea is to base medical decisions on individual patient characteristics, including molecular and behavioral biomarkers, rather than on population averages. Personalized medicine is deeply connected to and dependent on data science, specifically machine learning (often named Artificial Intelligence in the mainstream media). While during recent years there has been a lot of enthusiasm about the potential of ‘big data’ and machine learning-based solutions, there exist only few examples that impact current clinical practice. The lack of impact on clinical practice can largely be attributed to insufficient performance of predictive models, difficulties to interpret complex model predictions, and lack of validation via prospective clinical trials that demonstrate a clear benefit compared to the standard of care. In this paper, we review the potential of state-of-the-art data science approaches for personalized medicine, discuss open challenges, and highlight directions that may help to overcome them in the future.ConclusionsThere is a need for an interdisciplinary effort, including data scientists, physicians, patient advocates, regulatory agencies, and health insurance organizations. Partially unrealistic expectations and concerns about data science-based solutions need to be better managed. In parallel, computational methods must advance more to provide direct benefit to clinical practice.
LSST Dark Energy Science Collaboration, Eric Aubourg, Camille Avestruz et al.
The Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) will produce unprecedented volumes of heterogeneous astronomical data (images, catalogs, and alerts) that challenge traditional analysis pipelines. The LSST Dark Energy Science Collaboration (DESC) aims to derive robust constraints on dark energy and dark matter from these data, requiring methods that are statistically powerful, scalable, and operationally reliable. Artificial intelligence and machine learning (AI/ML) are already embedded across DESC science workflows, from photometric redshifts and transient classification to weak lensing inference and cosmological simulations. Yet their utility for precision cosmology hinges on trustworthy uncertainty quantification, robustness to covariate shift and model misspecification, and reproducible integration within scientific pipelines. This white paper surveys the current landscape of AI/ML across DESC's primary cosmological probes and cross-cutting analyses, revealing that the same core methodologies and fundamental challenges recur across disparate science cases. Since progress on these cross-cutting challenges would benefit multiple probes simultaneously, we identify key methodological research priorities, including Bayesian inference at scale, physics-informed methods, validation frameworks, and active learning for discovery. With an eye on emerging techniques, we also explore the potential of the latest foundation model methodologies and LLM-driven agentic AI systems to reshape DESC workflows, provided their deployment is coupled with rigorous evaluation and governance. Finally, we discuss critical software, computing, data infrastructure, and human capital requirements for the successful deployment of these new methodologies, and consider associated risks and opportunities for broader coordination with external actors.
MD L. Nelson Sanchez-Pinto, PhD Yuan Luo, MD Matthew M. Churpek
The digitalization of the health-care system has resulted in a deluge of clinical big data and has prompted the rapid growth of data science in medicine. Data science, which is the field of study dedicated to the principled extraction of knowledge from complex data, is particularly relevant in the critical care setting. The availability of large amounts of data in the ICU, the need for better evidence-based care, and the complexity of critical illness makes the use of data science techniques and data-driven research particularly appealing to intensivists. Despite the increasing number of studies and publications in the field, thus far there have been few examples of data science projects that have resulted in successful implementations of data-driven systems in the ICU. However, given the expected growth in the field, intensivists should be familiar with the opportunities and challenges of big data and data science. The present article reviews the definitions, types of algorithms, applications, challenges, and future of big data and data science in critical care.
Qazi Mamunur Rashid, Erin van Liemt, Tiffany Shih et al.
Current AI models often fail to account for local context and language, given the predominance of English and Western internet content in their training data. This hinders the global relevance, usefulness, and safety of these models as they gain more users around the globe. Amplify Initiative, a data platform and methodology, leverages expert communities to collect diverse, high-quality data to address the limitations of these models. The platform is designed to enable co-creation of datasets, provide access to high-quality multilingual datasets, and offer recognition to data authors. This paper presents the approach to co-creating datasets with domain experts (e.g., health workers, teachers) through a pilot conducted in Sub-Saharan Africa (Ghana, Kenya, Malawi, Nigeria, and Uganda). In partnership with local researchers situated in these countries, the pilot demonstrated an end-to-end approach to co-creating data with 155 experts in sensitive domains (e.g., physicians, bankers, anthropologists, human and civil rights advocates). This approach, implemented with an Android app, resulted in an annotated dataset of 8,091 adversarial queries in seven languages (e.g., Luganda, Swahili, Chichewa), capturing nuanced and contextual information related to key themes such as misinformation and public interest topics. This dataset in turn can be used to evaluate models for their safety and cultural relevance within the context of these languages.
Daniel Apai, Rory Barnes, Matthew M. Murphy et al.
The search for extraterrestrial life in the Solar System and beyond is a key science driver in astrobiology, planetary science, and astrophysics. A critical step is the identification and characterization of potential habitats, both to guide the search and to interpret its results. However, a well-accepted, self-consistent, flexible, and quantitative terminology and method of assessment of habitability are lacking. Our paper fills this gap based on a three year-long study by the NExSS Quantitative Habitability Science Working Group. We reviewed past studies of habitability, but find that the lack of a universally valid definition of life prohibits a universally applicable definition of habitability. A more nuanced approach is needed. We introduce a quantitative habitability assessment framework (QHF) that enables self-consistent, probabilistic assessment of the compatibility of two models: First, a habitat model, which describes the probability distributions of key conditions in the habitat. Second, a viability model, which describes the probability that a metabolism is viable given a set of environmental conditions. We provide an open-source implementation of this framework and four examples as a proof of concept: (a) Comparison of two exoplanets for observational target prioritization; (b) Interpretation of atmospheric O2 detection in two exoplanets; (c) Subsurface habitability of Mars; and (d) Ocean habitability in Europa. These examples demonstrate that our framework can self-consistently inform astrobiology research over a broad range of questions. The proposed framework is modular so that future work can expand the range and complexity of models available, both for habitats and for metabolisms.
Kishankumar Bhimani, Khushbu Saradva
This research study explores the new dynamics of employee-organi-zation relationships (EOR) [6] using advanced data science methodologies and presents findings through accessible visualizations. Leveraging a dataset pro-cured from a comprehensive nationwide big employee survey, this study employs innovative strategy for theoretical researcher by using our state-of-the-art visual-ization. The results present insightful visualizations encapsulating demographic analysis, workforce satisfaction, work environment scrutiny, and the employee's view via word cloud interpretations and burnout predictions. The study underscores the profound implications of data science across various management sectors, enhancing understanding of workplace dynamics and pro-moting mutual growth and satisfaction. This multifaceted approach caters to a diverse array of readers, from researchers in sociology and management to firms seeking detailed understanding of their workforce's satisfaction, emphasizing on practicality and interpretability. The research encourages proactive measures to improve workplace environ-ments, boost employee satisfaction, and foster healthier, more productive organ-izations. It serves as a resourceful tool for those committed to these objectives, manifesting the transformative potential of data science in driving insightful nar-ratives about workplace dynamics and employee-organization relationships. In essence, this research unearths valuable insights to aid management, HR profes-sionals, and companies
F. Archetti, Antonio Candelieri
F. Creutzig, S. Lohrey, X. Bai et al.
Non-technical summary Manhattan, Berlin and New Delhi all need to take action to adapt to climate change and to reduce greenhouse gas emissions. While case studies on these cities provide valuable insights, comparability and scalability remain sidelined. It is therefore timely to review the state-of-the-art in data infrastructures, including earth observations, social media data, and how they could be better integrated to advance climate change science in cities and urban areas. We present three routes for expanding knowledge on global urban areas: mainstreaming data collections, amplifying the use of big data and taking further advantage of computational methods to analyse qualitative data to gain new insights. These data-based approaches have the potential to upscale urban climate solutions and effect change at the global scale. Technical summary Cities have an increasingly integral role in addressing climate change. To gain a common understanding of solutions, we require adequate and representative data of urban areas, including data on related greenhouse gas emissions, climate threats and of socio-economic contexts. Here, we review the current state of urban data science in the context of climate change, investigating the contribution of urban metabolism studies, remote sensing, big data approaches, urban economics, urban climate and weather studies. We outline three routes for upscaling urban data science for global climate solutions: 1) Mainstreaming and harmonizing data collection in cities worldwide; 2) Exploiting big data and machine learning to scale solutions while maintaining privacy; 3) Applying computational techniques and data science methods to analyse published qualitative information for the systematization and understanding of first-order climate effects and solutions. Collaborative efforts towards a joint data platform and integrated urban services would provide the quantitative foundations of the emerging global urban sustainability science.
Ram Sagar
The Aryabhatta Research Institute of Observational Sciences (ARIES), a premier autonomous research institute under the Department of Science and Technology, Government of India has a legacy of about seven decades with contributions made in the field of observational sciences namely atmospheric and astrophysics. The Survey of India used a location at ARIES, determined with an accuracy of better than 10 meters on a world datum through institute participation in a global network of Earth artificial satellites imaging during late 1950. Taking advantage of its high-altitude location, ARIES, for the first time, provided valuable input for climate change studies by long term characterization of physical and chemical properties of aerosols and trace gases in the central Himalayan regions. In astrophysical sciences, the institute has contributed precise and sometime unique observations of the celestial bodies leading to a number of discoveries. With the installation of the 3.6 meter Devasthal optical telescope in the year 2015, India became the only Asian country to join those few nations of the world who are hosting 4 meter class optical telescopes. This telescope, having advantage of geographical location, is well-suited for multi-wavelength observations and for sub-arc-second resolution imaging of the celestial objects including follow-up of the GMRT, AstroSat and gravitational-wave sources.
Yu Ding
Eve Kovacs, Yao-Yuan Mao, Michel Aguena et al.
Large simulation efforts are required to provide synthetic galaxy catalogs for ongoing and upcoming cosmology surveys. These extragalactic catalogs are being used for many diverse purposes covering a wide range of scientific topics. In order to be useful, they must offer realistically complex information about the galaxies they contain. Hence, it is critical to implement a rigorous validation procedure that ensures that the simulated galaxy properties faithfully capture observations and delivers an assessment of the level of realism attained by the catalog. We present here a suite of validation tests that have been developed by the Rubin Observatory Legacy Survey of Space and Time (LSST) Dark Energy Science Collaboration (DESC). We discuss how the inclusion of each test is driven by the scientific targets for static ground-based dark energy science and by the availability of suitable validation data. The validation criteria that are used to assess the performance of a catalog are flexible and depend on the science goals. We illustrate the utility of this suite by showing examples for the validation of cosmoDC2, the extragalactic catalog recently released for the LSST DESC second Data Challenge.
LSST Dark Energy Science Collaboration, Bela Abolfathi, Robert Armstrong et al.
In preparation for cosmological analyses of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), the LSST Dark Energy Science Collaboration (LSST DESC) has created a 300 deg$^2$ simulated survey as part of an effort called Data Challenge 2 (DC2). The DC2 simulated sky survey, in six optical bands with observations following a reference LSST observing cadence, was processed with the LSST Science Pipelines (19.0.0). In this Note, we describe the public data release of the resulting object catalogs for the coadded images of five years of simulated observations along with associated truth catalogs. We include a brief description of the major features of the available data sets. To enable convenient access to the data products, we have developed a web portal connected to Globus data services. We describe how to access the data and provide example Jupyter Notebooks in Python to aid first interactions with the data. We welcome feedback and questions about the data release via a GitHub repository.
A. Singleton, Daniel Arribas-Bel
It is widely acknowledged that the emergence of “Big Data” is having a profound and often controversial impact on the production of knowledge. In this context, Data Science has developed as an interdisciplinary approach that turns such “Big Data” into information. This article argues for the positive role that Geography can have on Data Science when being applied to spatially explicit problems; and inversely, makes the case that there is much that Geography and Geographical Analysis could learn from Data Science. We propose a deeper integration through an ambitious research agenda, including systems engineering, new methodological development, and work toward addressing some acute challenges around epistemology. We argue that such issues must be resolved in order to realize a Geographic Data Science, and that such goal would be a desirable one.
Michael J. Muller, M. Feinberg, T. George et al.
With the rise of big data, there has been an increasing need to understand who is working in data science and how they are doing their work. HCI and CSCW researchers have begun to examine these questions. In this workshop, we invite researchers to share their observations, experiences, hypotheses, and insights, in the hopes of developing a taxonomy of work practices and open issues in the behavioral and social study of data science and data science workers.
Fábio C. P. Navarro, Hussein Mohsen, Chengfei Yan et al.
Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural “exports” and “imports” between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.
Mahyuddin K. M. Nasution, Opim Salim Sitompul, Erna Budhiarti Nababan
Dirk P. Kroese, Z. Botev, T. Taimre et al.
The purpose of Data Science and Machine Learning: Mathematical and Statistical Methods is to provide an accessible, yet comprehensive textbook intended for students interested in gaining a better understanding of the mathematics and statistics that underpin the rich variety of ideas and machine learning algorithms in data science.
Halaman 7 dari 2228077