Hasil "data science"

S2 Open Access 2018

Promoting novelty, rigor, and style in energy social science: Towards codes of practice for appropriate methods and research design

Benjamin Sovacool, John Axsen, S. Sorrell

A series of weaknesses in creativity, research design, and quality of writing continue to handicap energy social science. Many studies ask uninteresting research questions, make only marginal contributions, and lack innovative methods or application to theory. Many studies also have no explicit research design, lack rigor, or suffer from mangled structure and poor quality of writing. To help remedy these shortcomings, this Review offers suggestions for how to construct research questions; thoughtfully engage with concepts; state objectives; and appropriately select research methods. Then, the Review offers suggestions for enhancing theoretical, methodological, and empirical novelty. In terms of rigor, codes of practice are presented across seven method categories: experiments, literature reviews, data collection, data analysis, quantitative energy modeling, qualitative analysis, and case studies. We also recommend that researchers beware of hierarchies of evidence utilized in some disciplines, and that researchers place more emphasis on balance and appropriateness in research design. In terms of style, we offer tips regarding macro and microstructure and analysis, as well as coherent writing. Our hope is that this Review will inspire more interesting, robust, multi-method, comparative, interdisciplinary and impactful research that will accelerate the contribution that energy social science can make to both theory and practice.

1057 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2016

Qualitative Descriptive Methods in Health Science Research

Karen Colorafi, B. Evans

1215 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2015

Deep learning applications and challenges in big data analytics

M. M. Najafabadi, Flavio Villanustre, T. Khoshgoftaar et al.

Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.

2324 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2019

From DFT to machine learning: recent approaches to materials science–a review

G. R. Schleder, A. C. Padilha, C. M. Acosta et al.

Recent advances in experimental and computational methods are increasing the quantity and complexity of generated data. This massive amount of raw data needs to be stored and interpreted in order to advance the materials science field. Identifying correlations and patterns from large amounts of complex data is being performed by machine learning algorithms for decades. Recently, the materials science community started to invest in these methodologies to extract knowledge and insights from the accumulated data. This review follows a logical sequence starting from density functional theory as the representative instance of electronic structure methods, to the subsequent high-throughput approach, used to generate large amounts of data. Ultimately, data-driven strategies which include data mining, screening, and machine learning techniques, employ the data generated. We show how these approaches to modern computational materials science are being used to uncover complexities and design novel materials with enhanced properties. Finally, we point to the present research problems, challenges, and potential future perspectives of this new exciting field.

744 sitasi en Physics

Detail DOI Sumber

S2 Open Access 2017

Secondary Data Analysis: A Method of which the Time Has Come

Melissa P. Johnston

797 sitasi en Computer Science

Detail Sumber

S2 Open Access 2017

Crowdsourcing Multiple Choice Science Questions

Johannes Welbl, Nelson F. Liu, Matt Gardner

We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions. We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.

784 sitasi en Computer Science, Mathematics

Detail DOI Sumber

S2 Open Access 2017

From Little Science to Big Science

R. Perrucci, C. Perrucci, M. Subramaniam

749 sitasi en Political Science

Detail Sumber

S2 Open Access 2019

A Systematic Review on Imbalanced Data Challenges in Machine Learning

H. kaur, H. Pannu, A. Malhi

In machine learning, the data imbalance imposes challenges to perform data analytics in almost all areas of real-world research. The raw primary data often suffers from the skewed perspective of data distribution of one class over the other as in the case of computer vision, information security, marketing, and medical science. The goal of this article is to present a comparative analysis of the approaches from the reference of data pre-processing, algorithmic and hybrid paradigms for contemporary imbalance data analysis techniques, and their comparative study in lieu of different data distribution and their application areas.

654 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2007

Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar

Lokman I. Meho, Kiduk Yang

1086 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 1998

Calibration of the Computer Science and Applications, Inc. accelerometer.

P. Freedson, E. Melanson, J. Sirard

3749 sitasi en Computer Science, Medicine

Detail DOI Sumber

S2 Open Access 1979

Statistics for experimenters : an introduction to design, data analysis, and model building

G. D. Booth, G. Box, William G. Hunter et al.

4640 sitasi en Business, Mathematics

Detail DOI Sumber

S2 Open Access 1991

Voronoi diagrams—a survey of a fundamental geometric data structure

F. Aurenhammer

4780 sitasi en Computer Science

Detail DOI Sumber

S2 Open Access 2000

Data Management and Analysis Methods

G. Ryan, H. Bernard

4502 sitasi en Sociology

Detail Sumber

S2 Open Access 2019

The Space Physics Environment Data Analysis System (SPEDAS)

V. Angelopoulos, P. Cruce, A. Drozdov et al.

With the advent of the Heliophysics/Geospace System Observatory (H/GSO), a complement of multi-spacecraft missions and ground-based observatories to study the space environment, data retrieval, analysis, and visualization of space physics data can be daunting. The Space Physics Environment Data Analysis System (SPEDAS), a grass-roots software development platform (www.spedas.org), is now officially supported by NASA Heliophysics as part of its data environment infrastructure. It serves more than a dozen space missions and ground observatories and can integrate the full complement of past and upcoming space physics missions with minimal resources, following clear, simple, and well-proven guidelines. Free, modular and configurable to the needs of individual missions, it works in both command-line (ideal for experienced users) and Graphical User Interface (GUI) mode (reducing the learning curve for first-time users). Both options have “crib-sheets,” user-command sequences in ASCII format that can facilitate record-and-repeat actions, especially for complex operations and plotting. Crib-sheets enhance scientific interactions, as users can move rapidly and accurately from exchanges of technical information on data processing to efficient discussions regarding data interpretation and science. SPEDAS can readily query and ingest all International Solar Terrestrial Physics (ISTP)-compatible products from the Space Physics Data Facility (SPDF), enabling access to a vast collection of historic and current mission data. The planned incorporation of Heliophysics Application Programmer’s Interface (HAPI) standards will facilitate data ingestion from distributed datasets that adhere to these standards. Although SPEDAS is currently Interactive Data Language (IDL)-based (and interfaces to Java-based tools such as Autoplot), efforts are under-way to expand it further to work with python (first as an interface tool and potentially even receiving an under-the-hood replacement). We review the SPEDAS development history, goals, and current implementation. We explain its “modes of use” with examples geared for users and outline its technical implementation and requirements with software developers in mind. We also describe SPEDAS personnel and software management, interfaces with other organizations, resources and support structure available to the community, and future development plans.

555 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2014

Big data of materials science: critical role of the descriptor.

L. Ghiringhelli, J. Vybíral, S. Levchenko et al.

Statistical learning of materials properties or functions so far starts with a largely silent, nonchallenged step: the choice of the set of descriptive parameters (termed descriptor). However, when the scientific connection between the descriptor and the actuating mechanisms is unclear, the causality of the learned descriptor-property relation is uncertain. Thus, a trustful prediction of new promising materials, identification of anomalies, and scientific advancement are doubtful. We analyze this issue and define requirements for a suitable descriptor. For a classic example, the energy difference of zinc blende or wurtzite and rocksalt semiconductors, we demonstrate how a meaningful descriptor can be found systematically.

669 sitasi en Medicine, Physics

Detail DOI Sumber

S2 Open Access 2022

Data-Driven Machine Learning in Environmental Pollution: Gains and Problems.

Xian Liu, Dawei Lu, A. Zhang et al.

The complexity and dynamics of the environment make it extremely difficult to directly predict and trace the temporal and spatial changes in pollution. In the past decade, the unprecedented accumulation of data, the development of high-performance computing power, and the rise of diverse machine learning (ML) methods provide new opportunities for environmental pollution research. The ML methodology has been used in satellite data processing to obtain ground-level concentrations of atmospheric pollutants, pollution source apportionment, and spatial distribution modeling of water pollutants. However, unlike the active practices of ML in chemical toxicity prediction, advanced algorithms such as deep neural networks in environmental process studies of pollutants are still deficient. In addition, over 40% of the environmental applications of ML go to air pollution, and its application range and acceptance in other aspects of environmental science remain to be increased. The use of ML methods to revolutionize environmental science and its problem-solving scenarios has its own challenges. Several issues should be taken into consideration, such as the tradeoff between model performance and interpretability, prerequisites of the machine learning model, model selection, and data sharing.

393 sitasi en Medicine

Detail DOI Sumber

S2 Open Access 2022

Citizen science in environmental and ecological sciences

D. Fraisl, G. Hager, B. Bedessem et al.

Citizen science is an increasingly acknowledged approach applied in many scientific domains, and particularly within the environmental and ecological sciences, in which non-professional participants contribute to data collection to advance scientific research. We present contributory citizen science as a valuable method to scientists and practitioners within the environmental and ecological sciences, focusing on the full life cycle of citizen science practice, from design to implementation, evaluation and data management. We highlight key issues in citizen science and how to address them, such as participant engagement and retention, data quality assurance and bias correction, as well as ethical considerations regarding data sharing. We also provide a range of examples to illustrate the diversity of applications, from biodiversity research and land cover assessment to forest health monitoring and marine pollution. The aspects of reproducibility and data sharing are considered, placing citizen science within an encompassing open science perspective. Finally, we discuss its limitations and challenges and present an outlook for the application of citizen science in multiple science domains. Contributory citizen science is a method in which non-professional participants contribute to data collection in whole or in part to advance scientific research. This Primer outlines the use of citizen science in the environmental and ecological sciences, discussing participant engagement, data quality assurance and bias correction.

362 sitasi en

Detail DOI Sumber

S2 Open Access 2021

Data-Driven Strategies for Accelerated Materials Design

R. Pollice, Gabriel dos Passos Gomes, Matteo Aldeghi et al.

Conspectus The ongoing revolution of the natural sciences by the advent of machine learning and artificial intelligence sparked significant interest in the material science community in recent years. The intrinsically high dimensionality of the space of realizable materials makes traditional approaches ineffective for large-scale explorations. Modern data science and machine learning tools developed for increasingly complicated problems are an attractive alternative. An imminent climate catastrophe calls for a clean energy transformation by overhauling current technologies within only several years of possible action available. Tackling this crisis requires the development of new materials at an unprecedented pace and scale. For example, organic photovoltaics have the potential to replace existing silicon-based materials to a large extent and open up new fields of application. In recent years, organic light-emitting diodes have emerged as state-of-the-art technology for digital screens and portable devices and are enabling new applications with flexible displays. Reticular frameworks allow the atom-precise synthesis of nanomaterials and promise to revolutionize the field by the potential to realize multifunctional nanoparticles with applications from gas storage, gas separation, and electrochemical energy storage to nanomedicine. In the recent decade, significant advances in all these fields have been facilitated by the comprehensive application of simulation and machine learning for property prediction, property optimization, and chemical space exploration enabled by considerable advances in computing power and algorithmic efficiency. In this Account, we review the most recent contributions of our group in this thriving field of machine learning for material science. We start with a summary of the most important material classes our group has been involved in, focusing on small molecules as organic electronic materials and crystalline materials. Specifically, we highlight the data-driven approaches we employed to speed up discovery and derive material design strategies. Subsequently, our focus lies on the data-driven methodologies our group has developed and employed, elaborating on high-throughput virtual screening, inverse molecular design, Bayesian optimization, and supervised learning. We discuss the general ideas, their working principles, and their use cases with examples of successful implementations in data-driven material discovery and design efforts. Furthermore, we elaborate on potential pitfalls and remaining challenges of these methods. Finally, we provide a brief outlook for the field as we foresee increasing adaptation and implementation of large scale data-driven approaches in material discovery and design campaigns.

349 sitasi en Computer Science, Medicine

Detail DOI Sumber

DOAJ Open Access 2026

The rise and evolution of cancer mechanobiology: a bibliometric trajectory of three decades of research

Boyan Liu, Boyan Liu, Xufeng Liu et al.

BackgroundA growing body of research indicates that mechanobiology plays a pivotal role in cancer pathogenesis and holds considerable therapeutic potential. However, a comprehensive bibliometric analysis of this interdisciplinary field is lacking, partly due to challenges in cross-database data integration. In this study, we aim to construct a systematic knowledge map of cancer mechanobiology to delineate its research progress, core structure, and emerging trends.MethodsIn this study, we integrated 1,947 publications from the Web of Science (WoS) Core Collection and Scopus (1976–2025). To address cross-database heterogeneity, we developed a novel, customized, multi-stage data-standardization workflow combining a bespoke Python parsing engine with fuzzy string matching algorithms and manual verification. The unified dataset was analyzed using CiteSpace, VOSviewer, and Bibliometrix.ResultsThe United States and China are the most prolific countries, while the University of California system is the most productive institution. Valerie M. Weaver is the most published author, while Matthew J. Paszek is the most co-cited, indicating foundational influence. Cell is the most influential journal based on co-citation frequency. Keyword analysis reveals a thematic evolution from “extracellular matrix stiffness” and “mechanotransduction” to frontier areas such as “cancer immunotherapy” and “YAP signaling protein.”ConclusionIn this study, we construct a comprehensive bibliometric map of cancer mechanobiology. Our findings elucidate the developmental trajectory and research hotspots of the field, providing a data-driven reference for future investigations, international collaborations, and clinical translation of physical oncology.

Therapeutics. Pharmacology

Detail DOI Sumber

CrossRef Open Access 2026

Data Science Education Through the Lens of Statisticians

Juana Sanchez

en

Detail DOI Sumber

Hasil untuk "data science"