R. Bonney, Jennifer Shirk, T. Phillips et al.
Hasil untuk "data science"
Menampilkan 20 dari ~44761168 hasil · dari DOAJ, CrossRef, Semantic Scholar
S. García, Julián Luengo, F. Herrera
P. Adams
T. Murdoch, A. Detsky
C. Kletzing, W. Kurth, Mario H. Acuña et al.
The Electric and Magnetic Field Instrument and Integrated Science (EMFISIS) investigation on the NASA Radiation Belt Storm Probes (now named the Van Allen Probes) mission provides key wave and very low frequency magnetic field measurements to understand radiation belt acceleration, loss, and transport. The key science objectives and the contribution that EMFISIS makes to providing measurements as well as theory and modeling are described. The key components of the instruments suite, both electronics and sensors, including key functional parameters, calibration, and performance, demonstrate that EMFISIS provides the needed measurements for the science of the RBSP mission. The EMFISIS operational modes and data products, along with online availability and data tools provide the radiation belt science community with one the most complete sets of data ever collected.
Peter Bhlmann, S. Geer
J. Danielson, D. Gesch
J. Dickinson, B. Zuckerberg, David N. Bonter
Geoffrey E. Hinton
A. S. McEwen, N. Thomas, Hirise Team
Charles C. Ragin
S. Muthukrishnan
Julien Troudet, P. Grandcolas, A. Blin et al.
Studying and protecting each and every living species on Earth is a major challenge of the 21st century. Yet, most species remain unknown or unstudied, while others attract most of the public, scientific and government attention. Although known to be detrimental, this taxonomic bias continues to be pervasive in the scientific literature, but is still poorly studied and understood. Here, we used 626 million occurrences from the Global Biodiversity Information Facility (GBIF), the biggest biodiversity data portal, to characterize the taxonomic bias in biodiversity data. We also investigated how societal preferences and taxonomic research relate to biodiversity data gathering. For each species belonging to 24 taxonomic classes, we used the number of publications from Web of Science and the number of web pages from Bing searches to approximate research activity and societal preferences. Our results show that societal preferences, rather than research activity, strongly correlate with taxonomic bias, which lead us to assert that scientists should advertise less charismatic species and develop societal initiatives (e.g. citizen science) that specifically target neglected organisms. Ensuring that biodiversity is representatively sampled while this is still possible is an urgent prerequisite for achieving efficient conservation plans and a global understanding of our surrounding environment.
H. Yamada, Chang Liu, Stephen Wu et al.
There is a growing demand for the use of machine learning (ML) to derive fast-to-evaluate surrogate models of materials properties. In recent years, a broad array of materials property databases have emerged as part of a digital transformation of materials science. However, recent technological advances in ML are not fully exploited because of the insufficient volume and diversity of materials data. An ML framework called “transfer learning” has considerable potential to overcome the problem of limited amounts of materials data. Transfer learning relies on the concept that various property types, such as physical, chemical, electronic, thermodynamic, and mechanical properties, are physically interrelated. For a given target property to be predicted from a limited supply of training data, models of related proxy properties are pretrained using sufficient data; these models capture common features relevant to the target task. Repurposing of such machine-acquired features on the target task yields outstanding prediction performance even with exceedingly small data sets, as if highly experienced human experts can make rational inferences even for considerably less experienced tasks. In this study, to facilitate widespread use of transfer learning, we develop a pretrained model library called XenonPy.MDL. In this first release, the library comprises more than 140 000 pretrained models for various properties of small molecules, polymers, and inorganic crystalline materials. Along with these pretrained models, we describe some outstanding successes of transfer learning in different scenarios such as building models with only dozens of materials data, increasing the ability of extrapolative prediction through a strategic model transfer, and so on. Remarkably, transfer learning has autonomously identified rather nontrivial transferability across different properties transcending the different disciplines of materials science; for example, our analysis has revealed underlying bridges between small molecules and polymers and between organic and inorganic chemistry.
J. Wallis, E. Rolando, C. Borgman
Research on practices to share and reuse data will inform the design of infrastructure to support data collection, management, and discovery in the long tail of science and technology. These are research domains in which data tend to be local in character, minimally structured, and minimally documented. We report on a ten-year study of the Center for Embedded Network Sensing (CENS), a National Science Foundation Science and Technology Center. We found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data. Few repositories exist to accept data in CENS research areas.. Data sharing tends to occur only through interpersonal exchanges. CENS researchers obtain data from repositories, and occasionally from registries and individuals, to provide context, calibration, or other forms of background for their studies. Neither CENS researchers nor those who request access to CENS data appear to use external data for primary research questions or for replication of studies. CENS researchers are willing to share data if they receive credit and retain first rights to publish their results. Practices of releasing, sharing, and reusing of data in CENS reaffirm the gift culture of scholarship, in which goods are bartered between trusted colleagues rather than treated as commodities.
Ray M. Chang, R. Kauffman, YoungOk Kwon
article i nfo Available online xxxx The era of big data has created new opportunities for researchers to achieve high relevance and impact amid changes and transformations in how we study social science phenomena. With the emergence of new data col- lection technologies, advanced data mining and analytics support, there seems to be fundamental changes that are occurring with the research questions we can ask, and the research methods we can apply. The contexts in- clude social networks and blogs, political discourse, corporate announcements, digital journalism, mobile tele- phony, home entertainment, online gaming, financial services, online shopping, social advertising, and social commerce. The changing costs of data collection and the new capabilities that researchers have to conduct re- search that leverages micro-level, meso-level and macro-level data suggest the possibility of a scientifi cp aradigm shift toward computational social science. The new thinking related to empirical regularities analysis, experimen- tal design, and longitudinal empirical research further suggests that these approaches can be tailored for rapid acquisition of big data sets. This will allow business analysts and researchers to achieve frequent, controlled and meaningful observations of real-world phenomena. We discuss how our philosophy of science should be changing in step with the times, and illustrate our perspective with comparisons between earlier and current re- search inquiry. We argue against the assertion that theory no longer matters and offer some new research directions.
Huibo Yang, Mengxuan Hu, Amoreena Most et al.
BackgroundLarge language models (LLMs) have demonstrated impressive performance on medical licensing and diagnosis-related exams. However, comparative evaluations to optimize LLM performance and ability in the domain of comprehensive medication management (CMM) are lacking. The purpose of this evaluation was to test various LLMs performance optimization strategies and performance on critical care pharmacotherapy questions used in the assessment of Doctor of Pharmacy students.MethodsIn a comparative analysis using 219 multiple-choice pharmacotherapy questions, five LLMs (GPT-3.5, GPT-4, Claude 2, Llama2-7b and 2-13b) were evaluated. Each LLM was queried five times to evaluate the primary outcome of accuracy (i.e., correctness). Secondary outcomes included variance, the impact of prompt engineering techniques (e.g., chain-of-thought, CoT) and training of a customized GPT on performance, and comparison to third year doctor of pharmacy students on knowledge recall vs. knowledge application questions. Accuracy and variance were compared with student’s t-test to compare performance under different model settings.ResultsChatGPT-4 exhibited the highest accuracy (71.6%), while Llama2-13b had the lowest variance (0.070). All LLMs performed more accurately on knowledge recall vs. knowledge application questions (e.g., ChatGPT-4: 87% vs. 67%). When applied to ChatGPT-4, few-shot CoT across five runs improved accuracy (77.4% vs. 71.5%) with no effect on variance. Self-consistency and the custom-trained GPT demonstrated similar accuracy to ChatGPT-4 with few-shot CoT. Overall pharmacy student accuracy was 81%, compared to an optimal overall LLM accuracy of 73%. Comparing question types, six of the LLMs demonstrated equivalent or higher accuracy than pharmacy students on knowledge recall questions (e.g., self-consistency vs. students: 93% vs. 84%), but pharmacy students achieved higher accuracy than all LLMs on knowledge application questions (e.g., self-consistency vs. students: 68% vs. 80%).ConclusionChatGPT-4 was the most accurate LLM on critical care pharmacy questions and few-shot CoT improved accuracy the most. Average student accuracy was similar to LLMs overall, and higher on knowledge application questions. These findings support the need for future assessment of customized training for the type of output needed. Reliance on LLMs is only supported with recall-based questions.
Kenza Tazi, Andrew Orr, J. Scott Hosking et al.
Water resources from the Indus Basin sustain over 270 million people. However, water security in this region is threatened by climate change. This is especially the case for the upper Indus Basin, where most frozen water reserves are expected to decrease significantly by the end of the century, leaving rainfall as the main driver of river flow. However, future precipitation estimates from global climate models differ greatly for this region. To address this uncertainty, this paper explores the feasibility of using probabilistic machine learning to map large-scale circulation fields, better represented by global climate models, to local precipitation over the upper Indus Basin. More specifically, Gaussian processes are trained to predict monthly ERA5 precipitation data over a 15-year horizon. This paper also explores different Gaussian process model designs, including a non-stationary covariance function to learn complex spatial relationships in the data. Going forward, this approach could be used to make more accurate predictions from global climate model outputs and better assess the probability of future precipitation extremes.
Carme Carrion, Camilla Alay Llamas, Eka Dian Safitri et al.
Abstract Background Planetary Health studies the impact of the global environmental crisis on health. Urgent transdisciplinary, intersectoral, and holistic solutions adapted to local realities are needed. Designing training programs attuned to contextual needs of diverse groups and geographical areas is crucial. Planetary health programs are emerging worldwide, but little is known about their scope and learning outcomes. A systematic scoping review is needed to shed light on the state of planetary health education. Objectives This review aims to identify existing frameworks, competencies, content, and teaching methods in planetary health education. Methods Following PRISMA Extension for Scoping Reviews (PRISMA-ScR) guidelines, we included studies targeting undergraduate and postgraduate students, focusing on skills, knowledge, and abilities related to planetary health, published in English or Spanish. No exclusions were made based on geographic area, study design, or publication period. Databases consulted were MEDLINE via PubMed, Scopus, Web of Science, and ProQuest. Selection and data extraction processes were conducted systematically. Results We included 73 articles, with 88% from high-income countries and 49% focused on health professionals. Conceptual frameworks identified include "One Health," "Sustainable Development Goals," and the "Planetary Health Education Framework." Transversal skills (complex problem-solving, systemic thinking, collaboration, interdisciplinary) and specific competencies (understanding health interactions with climate change, pollution) were outlined in 45% of studies. Half of the studies described 23 general topics and 93 specific content areas. Teaching methods included in-person (59%), virtual (12%), and hybrid models (29%). Conclusions This review highlights the heterogeneity in conceptual frameworks, competencies, content, and teaching methods in planetary health education for health professionals. Future research should focus on developing and evaluating evidence-based educational models to address the evolving challenges of planetary health. Recommendations include enhancing collaboration among stakeholders and integrating innovative teaching methods to improve planetary health education. Trial registration The protocol has been registered in the Open Science Framework database (registration number: osf.io/h2b3j, March 2024). Clinical trial number: not applicable.
Halaman 26 dari 2238059