Hasil "data science"

S2 Open Access 2019

Current status of Landsat program, science, and applications

M. Wulder, T. Loveland, D. Roy et al.

Abstract Formal planning and development of what became the first Landsat satellite commenced over 50 years ago in 1967. Now, having collected earth observation data for well over four decades since the 1972 launch of Landsat-1, the Landsat program is increasingly complex and vibrant. Critical programmatic elements are ensuring the continuity of high quality measurements for scientific and operational investigations, including ground systems, acquisition planning, data archiving and management, and provision of analysis ready data products. Free and open access to archival and new imagery has resulted in a myriad of innovative applications and novel scientific insights. The planning of future compatible satellites in the Landsat series, which maintain continuity while incorporating technological advancements, has resulted in an increased operational use of Landsat data. Governments and international agencies, among others, can now build an expectation of Landsat data into a given operational data stream. International programs and conventions (e.g., deforestation monitoring, climate change mitigation) are empowered by access to systematically collected and calibrated data with expected future continuity further contributing to the existing multi-decadal record. The increased breadth and depth of Landsat science and applications have accelerated following the launch of Landsat-8, with significant improvements in data quality. Herein, we describe the programmatic developments and institutional context for the Landsat program and the unique ability of Landsat to meet the needs of national and international programs. We then present the key trends in Landsat science that underpin many of the recent scientific and application developments and follow-up with more detailed thematically organized summaries. The historical context offered by archival imagery combined with new imagery allows for the development of time series algorithms that can produce information on trends and dynamics. Landsat-8 has figured prominently in these recent developments, as has the improved understanding and calibration of historical data. Following the communication of the state of Landsat science, an outlook for future launches and envisioned programmatic developments are presented. Increased linkages between satellite programs are also made possible through an expectation of future mission continuity, such as developing a virtual constellation with Sentinel-2. Successful science and applications developments create a positive feedback loop—justifying and encouraging current and future programmatic support for Landsat.

906 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2016

Defining Computational Thinking for Mathematics and Science Classrooms

David Weintrop, Elham Beheshti, Michael S. Horn et al.

1343 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2016

Data mining with big data

Xindong Wu, Xingquan Zhu, Gong-Qing Wu et al.

1873 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2020

The misuse of colour in science communication

F. Crameri, G. Shephard, P. Heron

The accurate representation of data is essential in science communication. However, colour maps that visually distort data through uneven colour gradients or are unreadable to those with colour-vision deficiency remain prevalent in science. These include, but are not limited to, rainbow-like and red–green colour maps. Here, we present a simple guide for the scientific use of colour. We show how scientifically derived colour maps report true data variations, reduce complexity, and are accessible for people with colour-vision deficiencies. We highlight ways for the scientific community to identify and prevent the misuse of colour in science, and call for a proactive step away from colour misuse among the community, publishers, and the press. The accurate representation of data is essential in science communication, however, colour maps that visually distort data through uneven colour gradients or are unreadable to those with colour vision deficiency remain prevalent. Here, the authors present a simple guide for the scientific use of colour and highlight ways for the scientific community to identify and prevent the misuse of colour in science.

808 sitasi en Medicine, Computer Science

Detail DOI Sumber

S2 Open Access 2014

Named data networking

Lixia Zhang, Alexander Afanasyev, Jeffrey Burke et al.

Named Data Networking (NDN) is one of five projects funded by the U.S. National Science Foundation under its Future Internet Architecture Program. NDN has its roots in an earlier project, Content-Centric Networking (CCN), which Van Jacobson first publicly presented in 2006. The NDN project investigates Jacobson's proposed evolution from today's host-centric network architecture (IP) to a data-centric network architecture (NDN). This conceptually simple shift has far-reaching implications for how we design, develop, deploy, and use networks and applications. We describe the motivation and vision of this new architecture, and its basic components and operations. We also provide a snapshot of its current design, development status, and research challenges. More information about the project, including prototype implementations, publications, and annual reports, is available on named-data.net.

2183 sitasi en Biology, Computer Science

Detail DOI Sumber

S2 Open Access 2018

Matminer: An open source toolkit for materials data mining

Logan T. Ward, Alex Dunn, Alireza Faghaninia et al.

Abstract As materials data sets grow in size and scope, the role of data mining and statistical learning methods to analyze these materials data sets and build predictive models is becoming more important. This manuscript introduces matminer, an open-source, Python-based software platform to facilitate data-driven methods of analyzing and predicting materials properties. Matminer provides modules for retrieving large data sets from external databases such as the Materials Project, Citrination, Materials Data Facility, and Materials Platform for Data Science. It also provides implementations for an extensive library of feature extraction routines developed by the materials community, with 47 featurization classes that can generate thousands of individual descriptors and combine them into mathematical functions. Finally, matminer provides a visualization module for producing interactive, shareable plots. These functions are designed in a way that integrates closely with machine learning and data analysis packages already developed and in use by the Python data science community. We explain the structure and logic of matminer, provide a description of its various modules, and showcase several examples of how matminer can be used to collect data, reproduce data mining studies reported in the literature, and test new methodologies.

849 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2013

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

Justin Grimmer, Brandon M Stewart

2951 sitasi en Political Science

Detail DOI Sumber

S2 Open Access 2021

Gaia Early Data Release 3

S. Hodgkin, D. Harrison, E. Breedt et al.

Context. Since July 2014, the Gaia mission has been engaged in a high-spatial-resolution, time-resolved, precise, accurate astrometric, and photometric survey of the entire sky. Aims. We present the Gaia Science Alerts project, which has been in operation since 1 June 2016. We describe the system which has been developed to enable the discovery and publication of transient photometric events as seen by Gaia. Methods. We outline the data handling, timings, and performances, and we describe the transient detection algorithms and filtering procedures needed to manage the high false alarm rate. We identify two classes of events: (1) sources which are new to Gaia and (2) Gaia sources which have undergone a significant brightening or fading. Validation of the Gaia transit astrometry and photometry was performed, followed by testing of the source environment to minimise contamination from Solar System objects, bright stars, and fainter near-neighbours. Results. We show that the Gaia Science Alerts project suffers from very low contamination, that is there are very few false-positives. We find that the external completeness for supernovae, CE = 0.46, is dominated by the Gaia scanning law and the requirement of detections from both fields-of-view. Where we have two or more scans the internal completeness is CI = 0.79 at 3 arcsec or larger from the centres of galaxies, but it drops closer in, especially within 1 arcsec. Conclusions. The per-transit photometry for Gaia transients is precise to 1% at G = 13, and 3% at G = 19. The per-transit astrometry is accurate to 55 mas when compared to Gaia DR2. The Gaia Science Alerts project is one of the most homogeneous and productive transient surveys in operation, and it is the only survey which covers the whole sky at high spatial resolution (subarcsecond), including the Galactic plane and bulge.

693 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2012

Machine Learning - The Art and Science of Algorithms that Make Sense of Data

P. Flach

962 sitasi en Computer Science

Detail Sumber

S2 Open Access 2020

A review of machine learning applications in wildfire science and management

P. Jain, Sean C. P. Coogan, Sriram Ganapathi Subramanian et al.

Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then, the field has rapidly progressed congruently with the wide adoption of machine learning (ML) methods in the environmental sciences. Here, we present a scoping review of ML applications in wildfire science and management. Our overall objective is to improve awareness of ML methods among wildfire researchers and managers, as well as illustrate the diverse and challenging range of problems in wildfire science available to ML data scientists. To that end, we first present an overview of popular ML approaches used in wildfire science to date and then review the use of ML in wildfire science as broadly categorized into six problem domains, including (i) fuels characterization, fire detection, and mapping; (ii) fire weather and climate change; (iii) fire occurrence, susceptibility, and risk; (iv) fire behavior prediction; (v) fire effects; and (vi) fire management. Furthermore, we discuss the advantages and limitations of various ML approaches relating to data size, computational requirements, generalizability, and interpretability, as well as identify opportunities for future advances in the science and management of wildfires within a data science context. In total, to the end of 2019, we identified 300 relevant publications in which the most frequently used ML methods across problem domains included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms. As such, there exists opportunities to apply more current ML methods — including deep learning and agent-based learning — in the wildfire sciences, especially in instances involving very large multivariate datasets. We must recognize, however, that despite the ability of ML models to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of fire processes across multiple scales, while the complexity of some ML methods such as deep learning requires a dedicated and sophisticated knowledge of their application. Finally, we stress that the wildfire research and management communities play an active role in providing relevant, high-quality, and freely available wildfire data for use by practitioners of ML methods.

634 sitasi en Computer Science, Mathematics

Detail DOI Sumber

S2 Open Access 2005

The elements of statistical learning: data mining, inference and prediction

James Franklin

4700 sitasi en Mathematics

Detail DOI Sumber

S2 Open Access 2003

A New Kind of Science

Raymond Kurzweil

nationwide data set of losses from 1975 to 1998 was compiled to assess the trends. Temporal patterns of deaths and injuries, monetary damages, and—in some cases—the number of events are systematically examined by year in chapter 5, and the authors undertake a systematic spatial assessment of the statewide totals in chapter 6. Explanations for some of the patterns are offered, particularly for the most significant disasters and for the states with most events or the greatest losses. Further refinement and evaluation of patterns of economic losses and death are undertaken by normalizing losses by population, land area, and gross domestic product (GDP). The authors advance the discussion from simple descriptions of loss patterns to explanations of the patterns of disaster-loss burden, and some surprises emerge from the arithmetic. For instance, North Dakota, Iowa, andMississippi not only suffered the greatest monetary losses per capita during the period, but also suffered the greatest losses of property and crops compared to their state GDP!For afinal analysis, the authors created an overall hazard score (averaged proportion of the states’ contributions to the national totals of events, deaths, and damages) and used it to rank the states. Using this ranking, states were assigned to categories of ‘‘proneness,’’ from highest (Florida, Texas, andCalifornia) to lowest (Rhode Island, Delaware, Alaska and other small or lightly populated states). The conclusion we are to draw is that the amount of loss a state has experienced indicates its disaster proneness. Finally, ‘‘Charting a Course for theNext Two Decades’’ by Cutter describes what is needed to produce the models and data appropriate for mitigation and planning assessments. In order for an effective assessment of events and losses to occur, progress is required in several areas: development of vulnerability science, the creation of a national hazard events and losses database, and the establishment of a national loss inventory and events clearinghouse. To do so, Cutter argues, we need to rethink thewaywe monitor, assess, andmanage our vulnerabilities. She briefly describes the shifts needed in data gathering and provision, sustainability and distributive justice, strategic planning, research funding, and societal awareness of issues that influence the prospects for disaster. While American Hazardscapes is intended to provide a broadunderstanding of the geography of loss due to hazards in the United States, it suffers from its openly acknowledged limitations. Though criticizing the quality of currently available data, the authors use those data to indicate the prospects for future disasters. The elimination of extreme events is no longer believed tobe the key loss reduction. Instead,we must identify and avoid places too dynamic for permanent occupation and adjust to the inevitable events in ways that limit prospects for loss. Mitigation must address the vulnerabilities that cause greater exposure and profound upset of our social systems and create more complex catastrophes. The data employed in this assessment describe (however imperfectly) the losses suffered over two and a half decades. The largest disasters overwhelm the patterns of loss in their analysis. The authors imply, based on proneness rankings, that those who lost the most are the most prone to loss. But in reality, losses are byproducts of the interplay of two dynamic geographies: the pattern of extreme events and the pattern of human use of the landscape. The former is often poorly understood, may not behave consistently, andmay operate on greater than twenty-five-year cycles. The latter may change so rapidly that it surpasses our capacity to measure it and map it, and postdisaster land use and human perception may be radically changed. These geographies were outside the scope of this book, however, and given new homeland security efforts and reorganization of the Federal Emergency Management Agency, the past is an even poorer indicator of the future.

3671 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 1988

The Matthew Effect in Science

R. Merton

6332 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2021

A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data

C. Fan, Meiling Chen, Xinghua Wang et al.

The rapid development in data science and the increasing availability of building operational data have provided great opportunities for developing data-driven solutions for intelligent building energy management. Data preprocessing serves as the foundation for valid data analyses. It is an indispensable step in building operational data analysis considering the intrinsic complexity of building operations and deficiencies in data quality. Data preprocessing refers to a set of techniques for enhancing the quality of the raw data, such as outlier removal and missing value imputation. This article serves as a comprehensive review of data preprocessing techniques for analysing massive building operational data. A wide variety of data preprocessing techniques are summarised in terms of their applications in missing value imputation, outlier detection, data reduction, data scaling, data transformation, and data partitioning. In addition, three state-of-the-art data science techniques are proposed to tackle practical data challenges in the building field, i.e., data augmentation, transfer learning, and semi-supervised learning. In-depth discussions have been presented to describe the pros and cons of existing preprocessing methods, possible directions for future research and potential applications in smart building energy management. The research outcomes are helpful for the development of data-driven research in the building field.

403 sitasi en

Detail DOI Sumber

S2 Open Access 2019

Ethical Issues Relating to Scientific Discovery in Exercise Science.

J. Navalta, W. Stone

This work aims to present concepts related to ethical issues in conducting and reporting scientific research in a clear and straightforward manner. Considerations around research design including authorship, sound research practices, non-discrimination in subject recruitment, objectivity, respect for intellectual property, and financial interests are detailed. Further, concepts relating to the conducting of research including the competency of the researcher, conflicts of interest, accurately representing data, and ethical practices in human and animal research are presented. Attention pertaining to the dissemination of research including plagiarism, duplicate submission, redundant publication, and figure manipulation is offered. Other considerations including responsible mentoring, respect for colleagues, and social responsibility are set forth. The International Journal of Exercise Science will now require a statement in all subsequent published manuscripts that the authors have complied with each of the ethics statements contained in this work.

462 sitasi en Medicine, Sociology

Detail DOI Sumber

DOAJ Open Access 2026

Green Finance and High-Quality Economic Development: Spatial Correlation, Technology Spillover, and Pollution Haven

Zunrong Zhou, Xiang Li

This study examines how green finance influences high-quality economic development, with a particular focus on its spatial spillover mechanisms. Specifically, we investigate the competing roles of technology spillover and the pollution haven effect. Using provincial panel data from China (2010–2021) and applying a Spatial Durbin Model (SDM), we deconstruct the total effect of green finance into three distinct components: the local technological progress effect, the positive technology spillover effect, and the negative pollution haven effect. While acknowledging limitations related to the macro-level data granularity and the indirect nature of the mechanism tests, our analysis yields three main findings. First, green finance development shows significant regional disparities. It has progressed most rapidly in the eastern region, remained relatively stable in the central region, and declined in the western region. Second, green finance exerts a strong positive direct effect on local high-quality economic development. This promoting effect becomes even stronger in more developed regions. Third, green finance generates significant negative spatial spillovers on neighboring regions. These are primarily driven by the pollution haven effect, which involves the cross-regional relocation of polluting industries. However, local technological progress partially mitigates these adverse externalities. Overall, our findings reveal the dual nature of the spatial externalities associated with green finance. They also highlight the urgency of coordinated regional environmental governance to prevent “green leakage” and to promote balanced, high-quality economic development.

Systems engineering, Technology (General)

Detail DOI Sumber

DOAJ Open Access 2026

How have outreach eye health services been delivered globally? Protocol for a scoping review

Jacqueline Ramke, Iris Gordon, Eric Lai et al.

Introduction In all countries, some population groups experience barriers to accessing eye health services, contributing to health inequities. Outreach is a common strategy used to deliver healthcare services to populations experiencing inequities. This scoping review aims to summarise the nature and extent of the existing literature describing outreach as a service delivery model to improve access to eye health services, particularly among populations experiencing inequities.Methods and analysis An information specialist will search academic databases (Medline, Embase and Global Health) without language restrictions to find peer-reviewed articles describing outreach eye health services, published in any country between 1 January 2010 and the search date. Grey literature sources will also be searched. In Covidence, two reviewers will independently screen titles and abstracts and subsequently relevant full texts against the inclusion criteria. Data extraction will also be performed independently by two reviewers in Covidence. This scoping review will summarise the characteristics of the included outreach eye health services, including the type of eye health service delivered, personnel involved, mode of transport, source of funding and whether the service targeted any specific PROGRESS-Plus group (Place of residence, Race/ethnicity/culture/language, Occupation, Gender/sex, Religion, Education, Socioeconomic status, Social capital, Plus). We will present our findings quantitatively using diagrams, tables and graphs.Ethics and dissemination Ethics approval was not sought, as this scoping review will use only publicly available reports. The results of this review will be disseminated through publication in a peer-reviewed journal and will be presented at eye health conferences. It will offer valuable insights for eye health providers, health and social service providers and policymakers who are interested in improving access to eye health services for populations experiencing inequities. This scoping review will inform a project in New Zealand which aims to develop outreach eye health services to populations experiencing inequities, such as unhoused people and refugees.Registration This protocol was registered on the Open Science Framework on 11 November 2025 (https://osf.io/vyz32).

Medicine

Detail DOI Sumber

CrossRef Open Access 2026

Data Science Jobs

en

Detail DOI Sumber

S2 Open Access 2016

The role of administrative data in the big data revolution in social science research.

R. Connelly, C. Playford, V. Gayle et al.

The term big data is currently a buzzword in social science, however its precise meaning is ambiguous. In this paper we focus on administrative data which is a distinctive form of big data. Exciting new opportunities for social science research will be afforded by new administrative data resources, but these are currently under appreciated by the research community. The central aim of this paper is to discuss the challenges associated with administrative data. We emphasise that it is critical for researchers to carefully consider how administrative data has been produced. We conclude that administrative datasets have the potential to contribute to the development of high-quality and impactful social science research, and should not be overlooked in the emerging field of big data.

305 sitasi en Computer Science, Medicine

Detail DOI Sumber

S2 Open Access 2016

The Materials Data Facility: Data Services to Advance Materials Science Research

B. Blaiszik, K. Chard, J. Pruyne et al.

301 sitasi en Engineering, Materials Science

Detail DOI Sumber

Hasil untuk "data science"