Hasil untuk "data science"

Menampilkan 20 dari ~44761805 hasil · dari DOAJ, CrossRef, Semantic Scholar

JSON API
S2 Open Access 2016
The Deluge of Spurious Correlations in Big Data

Cristian S. Calude, G. Longo

Very large databases are a major opportunity for science and data analytics is a remarkable new field of investigation in computer science. The effectiveness of these tools is used to support a “philosophy” against the scientific method as developed throughout history. According to this view, computer-discovered correlations should replace understanding and guide prediction and action. Consequently, there will be no need to give scientific meaning to phenomena, by proposing, say, causal relations, since regularities in very large databases are enough: “with enough data, the numbers speak for themselves”. The “end of science” is proclaimed. Using classical results from ergodic theory, Ramsey theory and algorithmic information theory, we show that this “philosophy” is wrong. For example, we prove that very large databases have to contain arbitrary correlations. These correlations appear only due to the size, not the nature, of data. They can be found in “randomly” generated, large enough databases, which—as we will prove—implies that most correlations are spurious. Too much information tends to behave like very little information. The scientific method can be enriched by computer mining in immense databases, but not replaced by it.

337 sitasi en Computer Science
DOAJ Open Access 2026
Dual-Region Encryption Model Based on a 3D-MNFC Chaotic System and Logistic Map

Jingyan Li, Yan Niu, Dan Yu et al.

Facial information carries key personal privacy, and it is crucial to ensure its security through encryption. Traditional encryption for portrait images typically processes the entire image, despite the fact that most regions lack sensitive facial information. This approach is notably inefficient and imposes unnecessary computational burdens. To address this inefficiency while maintaining security, we propose a novel dual-region encryption model for portrait images. Firstly, a Multi-task Cascaded Convolutional Network (MTCNN) was adopted to efficiently segment facial images into two regions: facial and non-facial. Subsequently, given the high sensitivity of facial regions, a robust encryption scheme was designed by integrating a CNN-based key generator, the proposed three-dimensional Multi-module Nonlinear Feedback-coupled Chaotic System (3D-MNFC), DNA encoding, and bit reversal. The 3D-MNFC incorporating time-varying parameters, nonlinear terms and state feedback terms and coupling mechanisms has been proven to exhibit excellent chaotic performance. As for non-facial regions, the Logistic map combined with XOR operations is used to balance efficiency and basic security. Finally, the encrypted image is obtained by restoring the two ciphertext images to their original positions. Comprehensive security analyses confirm the exceptional performance of the regional model: large key space (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mn>2</mn><mn>536</mn></msup></semantics></math></inline-formula>) and near-ideal information entropy (7.9995), NPCR and UACI values of 99.6055% and 33.4599%. It is worth noting that the model has been verified to improve efficiency by at least 37.82%.

Science, Astrophysics
CrossRef Open Access 2026
From Open Data to Open Science?: A Semantic Diagnosis of Public Science and Technology Data

Junyoung Jeong

Abstract Open Science has emerged as a central paradigm in contemporary science and technology (S&T) policy, with Open Data widely regarded as one of its core components. Despite this prominence, limited empirical attention has been paid to whether Open Data occupies a structurally meaningful position within the semantic architecture of Open Science discourse. This study conducts a computational semantic analysis of public S&T data-related documents to diagnose the conceptual relationship between Open Data and Open Science. Using BERTopic-based modeling and hierarchical clustering, we examine how Open Data is positioned within the broader Open Science discourse, focusing on its centrality, proximity to key Open Science concepts, and alignment with FAIR principles. The results reveal that while Open Data is frequently referenced, it exhibits a distinct core-periphery structure: administrative and management-oriented metadata occupy a central semantic position, whereas scientifically rich raw data tend to remain on the periphery. The structural analysis further indicates that the semantic integration of Open Data remains uneven across domains, suggesting a partial decoupling between policy expectations and conceptual implementation. By providing a semantic diagnosis of Open Data within Open Science discourse, this study contributes to scientometric research by offering a structural perspective on how foundational concepts of Open Science are articulated and operationalized in practice. The findings highlight the need to move beyond declarative commitments toward a more conceptually integrated understanding of Open Data in the evolution of Open Science.

S2 Open Access 2016
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences

Nirav C. Merchant, Eric Lyons, S. Goff et al.

The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.

303 sitasi en Biology, Medicine
S2 Open Access 2018
A Framework for Articulating and Measuring Individual Learning Outcomes from Participation in Citizen Science

T. Phillips, Norman Porticella, M. Constas et al.

Since first being introduced in the mid 1990s, the term “citizen science”—the intentional engagement of the public in scientific research—has seen phenomenal growth as measured by the number of projects developed, people involved, and articles published. In addition to contributing to scientific knowledge, many citizen science projects attempt to achieve learning outcomes among their participants, however, little guidance is available for practitioners regarding the types of learning that can be supported through citizen science or the measuring of learning outcomes. This study provides empirical data to understand how intended learning outcomes first described by the informal science education field have been employed and measured within the citizen science field. We also present a framework for describing learning outcomes that should help citizen science practitioners, researchers, and evaluators in designing projects and in studying and evaluating their impacts. This is a first step in building evaluation capacity across the field of citizen science.

234 sitasi en Sociology
DOAJ Open Access 2025
A Baseline Assessment of Residential Wood Burning and Urban Air Quality in Climate-Vulnerable Chilean Cities

Ricardo Baettig, Ben Ingram

This study presents a comprehensive latitudinal analysis of air particulate matter (PM) across an 1400 km pollution corridor spanning Chile’s central-southern zone. We systematically analyzed PM<sub>2.5</sub> and PM<sub>10</sub> concentrations across eight major urban centers (2014–2015), providing crucial pre-Paris Agreement baseline data for South America’s most extensive air quality monitoring network. Our analysis reveals significant pollution gradients, with Coyhaique ranking one of the world’s most severely polluted cities (95th percentile globally, WHO database) and demonstrating an extreme 86% fine particulate matter ratio that far exceeds international urban standards. Residential wood combustion (RWC) demonstrates systematic correlations with fine PM concentrations (R<sup>2</sup> > 0.96), suggesting RWC is the dominant pollution driver across multiple climate zones. The documented pollution patterns represent a concerning continental-scale environmental pattern, with 4900–6500 annual premature deaths directly attributable to PM<sub>2.5</sub> exposure-one of the highest per-capita pollution mortality rates in South America. This work provides a methodological framework applicable to mountain-valley pollution systems globally while addressing critical knowledge gaps in regional air quality science. The evidence indicates the need for urgent implementation of comprehensive wood combustion control strategies and positions this research as essential baseline documentation for both national air quality policy and international climate change assessment frameworks.

Geography. Anthropology. Recreation, Social Sciences
DOAJ Open Access 2025
Novel forecasting of white maize futures volatility: a hybrid GARCH-based bi-directional LSTM model

Chun-Sung Huang, Ayesha Sayed

Price volatility in grain markets, especially for maize, has substantial socio-economic impacts, particularly in low-income regions where food security remains a critical concern. Accurate forecasting of grain price volatility is therefore crucial in safeguarding the financial interests of commodity traders, as well as shielding consumers from detrimental effects of inflationary food prices. This study proposes a hybrid Bi-directional Long Short-Term Memory (BLSTM) model, integrated with generalised autoregressive conditional heteroscedasticity (GARCH)-type methods, to forecast white maize futures volatility in South Africa. By comparing the forecasting accuracy of the hybrid BLSTM model against several benchmarks, including standard LSTM and BLSTM models, our results demonstrate notable improvements in prediction accuracy, as shown through heteroscedasticity-adjusted performance metrics. The key contribution of this research is its enhancement of volatility forecasting by combining advanced machine learning with traditional econometric approaches, bridging a gap in predictive accuracy for commodity price dynamics. Additionally, this study supports the United Nations Sustainable Development Goals (SDGs), particularly Zero Hunger and Responsible Consumption and Production, by improving food price stability and risk management in agriculture. This approach exemplifies the evolving role of data science in financial analysis, offering market participants an effective tool to manage price risk and improve food security.Impact Statement This study introduces a novel hybrid forecasting model that integrates GARCH-type econometric techniques with Bi-directional Long Short-Term Memory (BLSTM) neural networks to predict the realised volatility of white maize futures. As white maize is a staple food, accurate volatility forecasting directly contributes to improved food security and price stability. The model significantly outperforms traditional approaches and standard deep learning models across multiple forecast horizons, offering a powerful risk management tool for farmers, traders, and policymakers. By enhancing the accuracy of agricultural price forecasts, this research supports the United Nations Sustainable Development Goals (SDGs), particularly Zero Hunger (SDG 2) and Responsible Consumption and Production (SDG 12), while also demonstrating the value of advanced data science methods in addressing real-world socio-economic challenges.

Finance, Economic theory. Demography
S2 Open Access 2018
BIM for heritage science: a review

D. P. Pocobelli, Jan Boehm, P. Bryan et al.

Building Information Modelling (BIM) is a new process that is spreading in the Architecture, Engineering and Construction field. It allows the creation of virtual building models, which can be linked to numerical data, texts, images, and other types of information. Building components, such as walls, floors, etc. are modelled as “smart objects”, i.e. they are defined by numerical parameters, such as dimensions, and are embedded with other kinds of information, such as building materials and properties. Stored data are accessible and modifiable by all different professionals involved in the same project. The BIM process has been developed for new buildings, and it allows to plan and manage the whole building life-cycle. BIM for built heritage has started to be researched recently, and its use is still not widespread. Indeed, built heritage is characterised by complex morphology and non-homogeneous features, which clash with BIM’s standardised procedures. Moreover, to date, BIM does not allow fully automated procedures to model heritage buildings. This review focuses on the survey and digitisation phases, which can be seen as the initial phases of application of BIM in conservation projects. It also briefly covers the modelling stage. Here we present the main methodologies developed for BIM for built heritage. Issues about digitisation are also highlighted, principally in connection with the unavailability of automated processes. During the last 10 years, research has led to promising results; for example, videogame interfaces have been used to simulate virtual 3D tours that display in a single interface the 3D model and the database containing metadata, and new software plug-ins have been developed, to easily create “smart objects”. Nevertheless, further research is needed to establish how BIM can support the practice of building conservation. There is a gap in BIM’s information holding capacities, namely the storage of cultural and historical documentation, as well as monitored and simulated data relevant for preventive conservation. Future work should focus on the development of new tools that will be able to store and share all the relevant metadata.

216 sitasi en
DOAJ Open Access 2024
123 Utilizing Project ECHO to mitigate environmental impacts on health through collaborative provider education

R. Ellen Hogentogler, George Garrow, Jessica Beiler et al.

OBJECTIVES/GOALS: Launch a case-based learning collaborative on best practices that meet social, emotional and physical health needs of underserved communities as they relate to environmental toxins—specifically those related to the train derailment in OH. Topics discussed could also include disasters and spills, air quality, extreme heat, and water. METHODS/STUDY POPULATION: In response to a call for action delivered by PA’s Acting Secretary of Health, we established a partnership between Penn State CTSI, Project ECHO at Penn State, and Primary Health Network (PHN). PHN is the largest Federally Qualified Health Center in PA, making it uniquely qualified to reach rural providers diagnosing and treating patients impacted by environmental events. Utilizing the ECHO model, we are hosting monthly, 1-hour sessions on environmental determinants of health starting October 2023. Experts in pulmonology, toxicology, atmospheric science, and rural medicine (whom many participants would have limited access to outside of the ECHO platform) and participants have the opportunity to share and learn from their varied experiences exemplifying a culture of ‘all teach, all learn’. RESULTS/ANTICIPATED RESULTS: Project ECHO is an ideal model for upscaling workforce quickly, allowing participants to be responsive in the care of their community, regardless of location and access to specialty clinics. 74 participants across 26 PA counties registered for the series, ranging from PCPs, medical directors, and state officials. Upon registration, nearly half of our direct patient-care participants do not routinely conduct an environmental exposure history and almost 70% report receiving questions from their patients related to how the environment might impact their health. More than half of those providers reported feeling unprepared to answer patients’ questions related to the environment’s impact on their health. Evaluation data will be collected at enrollment, after each session, and post-series. DISCUSSION/SIGNIFICANCE: This series could result in: * Reduction of health disparities caused by environmental events (no cost, virtual learning) * Increased preparedness to quickly address health questions/symptoms related to environmental exposures * Increased awareness of the environmental impacts on health. * Improved testing/treatment for patients

Halaman 29 dari 2238091