The Extent and Consequences of P-Hacking in Science
M. Head, L. Holman, R. Lanfear
et al.
A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.
1100 sitasi
en
Medicine, Biology
Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data*
Andrew Bell, K. Jones
This article challenges Fixed Effects (FE) modeling as the ‘default’ for time-series-cross-sectional and panel data. Understanding different within and between effects is crucial when choosing modeling strategies. The downside of Random Effects (RE) modeling—correlated lower-level covariates and higher-level residuals—is omitted-variable bias, solvable with Mundlak's (1978a) formulation. Consequently, RE can provide everything that FE promises and more, as confirmed by Monte-Carlo simulations, which additionally show problems with Plümper and Troeger's FE Vector Decomposition method when data are unbalanced. As well as incorporating time-invariant variables, RE models are readily extendable, with random coefficients, cross-level interactions and complex variance functions. We argue not simply for technical solutions to endogeneity, but for the substantive importance of context/heterogeneity, modeled using RE. The implications extend beyond political science to all multilevel datasets. However, omitted variables could still bias estimated higher-level variable effects; as with any model, care is required in interpretation.
1434 sitasi
en
Mathematics
Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references
L. Bornmann, Rüdiger Mutz
Many studies (in information science) have looked at the growth of science. In this study, we reexamine the question of the growth of science. To do this we (a) use current data up to publication year 2012 and (b) analyze the data across all disciplines and also separately for the natural sciences and for the medical and health sciences. Furthermore, the data were analyzed with an advanced statistical technique—segmented regression analysis—which can identify specific segments with similar growth rates in the history of science. The study is based on two different sets of bibliometric data: (a) the number of publications held as source items in the Web of Science (WoS, Thomson Reuters) per publication year and (b) the number of cited references in the publications of the source items per cited reference year. We looked at the rate at which science has grown since the mid‐1600s. In our analysis of cited references we identified three essential growth phases in the development of science, which each led to growth rates tripling in comparison with the previous phase: from less than 1% up to the middle of the 18th century, to 2 to 3% up to the period between the two world wars, and 8 to 9% to 2010.
1391 sitasi
en
Geography, Computer Science
Data Mining and Analysis: Fundamental Concepts and Algorithms
Mohammed J. Zaki
995 sitasi
en
Computer Science
JENDL-4.0: A New Library for Nuclear Science and Engineering
K. Shibata, O. Iwamoto, T. Nakagawa
et al.
Pegasus, a workflow management system for science automation
E. Deelman, K. Vahi, G. Juve
et al.
865 sitasi
en
Computer Science
A new dawn for citizen science.
J. Silvertown
Color Science, Concepts and Methods. Quantitative Data and Formulas
W. D. Wright
1203 sitasi
en
Engineering
Estimating dynamic panel data models: a guide for
Ruth A. Judson, Ann L. Owen
2202 sitasi
en
Computer Science
Linear Mixed Models for Longitudinal Data
G. Verbeke, G. Molenberghs
The KDD process for extracting useful knowledge from volumes of data
U. Fayyad, G. Piatetsky-Shapiro, Padhraic Smyth
2204 sitasi
en
Computer Science
Functional Data Analysis
J. Ramsay, Bernard Walter Silverman
2183 sitasi
en
Mathematics, Computer Science
Global products of vegetation leaf area and fraction absorbed PAR from year one of MODIS data
R. Myneni, S. Hoffman, Y. Knyazikhin
et al.
The Analysis of Social Science Data with Missing Values
R. Little, D. Rubin
1113 sitasi
en
Computer Science
Citation indexes for science; a new dimension in documentation through association of ideas.
E. Garfield
The View from Above: Applications of Satellite Data in Economics
D. Donaldson, Adam Storeygard
Mapping citizen science contributions to the UN sustainable development goals
D. Fraisl, Jillian Campbell, L. See
et al.
The UN Sustainable Development Goals (SDGs) are a vision for achieving a sustainable future. Reliable, timely, comprehensive, and consistent data are critical for measuring progress towards, and ultimately achieving, the SDGs. Data from citizen science represent one new source of data that could be used for SDG reporting and monitoring. However, information is still lacking regarding the current and potential contributions of citizen science to the SDG indicator framework. Through a systematic review of the metadata and work plans of the 244 SDG indicators, as well as the identification of past and ongoing citizen science initiatives that could directly or indirectly provide data for these indicators, this paper presents an overview of where citizen science is already contributing and could contribute data to the SDG indicator framework. The results demonstrate that citizen science is “already contributing” to the monitoring of 5 SDG indicators, and that citizen science “could contribute” to 76 indicators, which, together, equates to around 33%. Our analysis also shows that the greatest inputs from citizen science to the SDG framework relate to SDG 15 Life on Land, SDG 11 Sustainable Cities and Communities, SDG 3 Good Health and Wellbeing, and SDG 6 Clean Water and Sanitation. Realizing the full potential of citizen science requires demonstrating its value in the global data ecosystem, building partnerships around citizen science data to accelerate SDG progress, and leveraging investments to enhance its use and impact.
The Ames Stereo Pipeline: NASA's Open Source Software for Deriving and Processing Terrain Data
R. Beyer, O. Alexandrov, S. McMichael
The NASA Ames Stereo Pipeline is a suite of free and open source automated geodesy and stereogrammetry tools designed for processing stereo images captured from satellites (around Earth and other planets), robotic rovers, aerial cameras, and historical images, with and without accurate camera pose information. It produces cartographic products, including digital terrain models, ortho‐projected images, 3‐D models, and bundle‐adjusted networks of cameras. Ames Stereo Pipeline's data products are suitable for science analysis, mission planning, and public outreach.
373 sitasi
en
Computer Science
RESTplus: an improved toolkit for resting-state functional magnetic resonance imaging data processing.
Xize Jia, Jue Wang, Hai Sun
et al.
Center for Cognition and Brain Disorders, Institutes of Psychological Sciences, Hangzhou Normal University, Hangzhou 311121, China b Zhejiang Key Laboratory for Research in Assessment of Cognitive Impairments, Hangzhou 311121, China Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA d School of Life Science and Technology, Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing 100101, China Department of Psychology, University of Chinese Academy of Sciences, Beijing 100101, China g Preclinical Pharmacology Section, National Institute on Drug Abuse, Baltimore 21224, USA
Jupyter: Thinking and Storytelling With Code and Data
B. Granger, Fernando Pérez
Project Jupyter is an open-source project for interactive computing widely used in data science, machine learning, and scientific computing. We argue that even though Jupyter helps users perform complex, technical work, Jupyter itself solves problems that are fundamentally human in nature. Namely, Jupyter helps humans to think and tell stories with code and data. We illustrate this by describing three dimensions of Jupyter: 1) interactive computing; 2) computational narratives; and 3) the idea that Jupyter is more than software. We illustrate the impact of these dimensions on a community of practice in earth and climate science.
167 sitasi
en
Computer Science