Hasil untuk "data science"

Menampilkan 20 dari ~44692528 hasil · dari CrossRef, DOAJ, arXiv, Semantic Scholar

JSON API
S2 Open Access 2016
Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Randal S. Olson, Nathan Bartley, R. Urbanowicz et al.

As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning--pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.

589 sitasi en Computer Science
S2 Open Access 2017
Surgical data science for next-generation interventions

L. Maier-Hein, S. Vedula, S. Speidel et al.

Interventional healthcare will evolve from an artisanal craft based on the individual experiences, preferences and traditions of physicians into a discipline that relies on objective decision-making on the basis of large-scale data from heterogeneous sources.

482 sitasi en Computer Science, Medicine
S2 Open Access 2018
Data science empowering the public: Data-driven dashboards for transparent and accountable decision-making in smart cities

R. Matheus, M. Janssen, D. Maheshwari

Abstract Dashboards visualize a consolidated set data for a certain purpose which enables users to see what is happening and to initiate actions. Dashboards can be used by governments to support their decision-making and policy processes or to communicate and interact with the public. The objective of this paper is to understand and to support the design of dashboards for creating transparency and accountability. Two smart city cases are investigated showing that dashboards can improve transparency and accountability, however, realizing these benefits was cumbersome and encountered various risks and challenges. Challenges include insufficient data quality, lack of understanding of data, poor analysis, wrong interpretation, confusion about the outcomes, and imposing a pre-defined view. These challenges can easily result in misconceptions, wrong decision-making, creating a blurred picture resulting in less transparency and accountability, and ultimately in even less trust in the government. Principles guiding the design of dashboards are presented. Dashboards need to be complemented by mechanisms supporting citizens' engagement, data interpretation, governance and institutional arrangements.

321 sitasi en Computer Science, Business
S2 Open Access 2018
Situating Ecology as a Big-Data Science: Current Advances, Challenges, and Solutions

S. Farley, A. Dawson, S. Goring et al.

Ecology has joined a world of big data. Two complementary frameworks define big data: data that exceed the analytical capacities of individuals or disciplines or the “Four Vs” axes of volume, variety, veracity, and velocity. Variety predominates in ecoinformatics and limits the scalability of ecological science. Volume varies widely. Ecological velocity is low but growing as data throughput and societal needs increase. Ecological big-data systems include in situ and remote sensors, community data resources, biodiversity databases, citizen science, and permanent stations. Technological solutions include the development of open code- and data-sharing platforms, flexible statistical models that can handle heterogeneous data and sources of uncertainty, and cloud-computing delivery of high-velocity computing to large-volume analytics. Cultural solutions include training targeted to early and current scientific workforce and strengthening collaborations among ecologists and data scientists. The broader goal is to maximize the power, scalability, and timeliness of ecological insights and forecasting.

273 sitasi en Computer Science
DOAJ Open Access 2025
How life-cycle real-world evidence can bridge evidentiary gaps in precision oncology

Emanuel Krebs, Deirdre Weymann, Deirdre Weymann et al.

Precision oncology uses omics-based diagnostic technologies to inform histology-agnostic cancer treatment. To date, health system implementation remains limited owing to high uncertainty in regulatory and reimbursement evidence submissions. In this perspective, we describe a life-cycle approach to the evaluation of precision oncology technologies that addresses evidentiary uncertainty and is grounded in real-world evidence (RWE) derived using data routinely collected by healthcare systems. We consider the role for RWE in international regulatory and reimbursement decision-making, review common biases for observational precision oncology evaluations, make specific recommendations for RWE study design and analysis, and specify healthcare system requirements for data collection. We then explore how decision-grade real-world data can support the generation of decision-grade RWE, ultimately enabling real-world life-cycle assessment for precision oncology.

Medicine (General)
DOAJ Open Access 2025
Risk Factors of Hemophagocytic Lymphohistiocytosis in Adults with Fever of Unknown Origin: A Retrospective Study

Tian F, Xie N, Sun W et al.

Fangbing Tian,1 Nana Xie,1 Wenjin Sun,2 Wencong Zhang,1 Wenyuan Zhang,1 Jia Chen,1 Qiurong Ruan,3,* Jianxin Song1,* 1Department of Infectious Diseases, Tongji Hospital Affiliated to Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People’s Republic of China; 2Department of Infectious Diseases, Ezhou Central Hospital, Ezhou, People’s Republic of China; 3Institute of Pathology, Tongji Hospital Affiliated to Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People’s Republic of China*These authors contributed equally to this workCorrespondence: Jianxin Song, Department of Infectious Diseases, Tongji Hospital Affiliated to Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People’s Republic of China, Email songsingsjx@sina.com Qiurong Ruan, Institute of Pathology, Tongji Hospital Affiliated to Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People’s Republic of China, Email ruanqiurong@sina.comPurpose: Hemophagocytic lymphohistiocytosis (HLH) is a critical syndrome with a high mortality rate. In clinical practice, some patients with fever of unknown origin (FUO) can develop HLH, further complicating the diagnosis and treatment. However, studies on HLH in adults with FUO are limited. This study aimed to investigate the clinical characteristics of adult patients with FUO to facilitate the early identification of those at high risk of developing HLH.Patients and Methods: We collected data from hospitalized patients with FUO between January 2014 and December 2020. Risk factors for HLH in adults with FUO were analyzed using univariate and multivariate analysis.Results: A total of 988 patients with FUO were included in the study. The incidence of HLH in adults with FUO was 6.4%, with hematological tumors being the primary cause. Multivariate analysis indicated that skin rash and elevated alanine aminotransferase, total bilirubin, triglycerides, lactate dehydrogenase, and ferritin levels were independent risk factors for HLH in adults with FUO.Conclusion: This study revealed the incidence rate, etiology distribution, and risk factors for HLH in adults with FUO. Comprehensive assessment of clinical and laboratory data at admission can assist in the early identification of FUO patients at risk for HLH.Keywords: Hemophagocytic lymphohistiocytosis, fever of unknown origin, etiology distribution, risk factors

Medicine (General)
DOAJ Open Access 2025
Exploring Scientific Outputs about Globalization: A Conceptual Framework Study

Saleh Rahimi, Faramarz Soheili

IntroductionBibliometric analysis is widely acknowledged as a robust and systematic approach for examining extensive scholarly literature. It serves as a vital tool for mapping the landscape of contemporary research across various academic fields. The increase in bibliometric studies over the past decade highlights their growing importance in evaluating the evolution and impact of scientific inquiry. Among these methods, co-word analysis emerges as a powerful technique for uncovering conceptual connections between ideas and themes within a discipline. By analyzing term co-occurrences, this approach revealed underlying thematic clusters, prevailing trends, and evolving patterns over time, providing a dynamic perspective for interpreting the intellectual structure of a research domain.  Materials & MethodsThis study utilized bibliometric analysis to examine scholarly literature. Data were processed using VOSviewer, UCINet, and BibExcel software. The data were extracted from the Islamic World Science Citation Center (ISC) database using the keywords “globalization” or “globalisation”. Plain text files obtained from the ISC database were imported into BibExcel. Employing natural language processing techniques within this software, key terms (nouns or noun phrases) were extracted. A frequency threshold of 4 was established, meaning a term had to appear at least 4 times in the sample to be included in the bibliometric map. This threshold is recommended to effectively eliminate irrelevant terms. Following several processing steps, a symmetric matrix was created and converted into a correlation matrix. This matrix was then imported into VOSviewer, which assessed the strength of relationships between the remaining terms that met the threshold. The extracted data spanned 25 years (1999–2023) and included 1,281 documents containing 4,502 author keywords. After standardization, 2,169 unique keywords remained. By applying the threshold, a 162×162 matrix was generated with diagonal cell values set to zero. Cluster analysis was conducted using the K-means method in VOSviewer. Discussion of Results & ConclusionThe terms “globalization”, “Iran”, and “cultural globalization” ranked first to third with frequencies of 703, 54, and 45, respectively. The keyword “globalization” with 703 occurrences emerged as the central concept within the research domain.Cluster analysis in VOSviewer identified 11 clusters related to globalization concepts:Globalization and economyGeopolitics of globalizationGlocalizationGlobalization and anti-globalizationGlobalization and transnationalizationCultural globalizationGlobalization of educationGlobalization and national securityGlobalization and identityGlobalization and geocultureGlobalization and urban environmentUsing UCINet, centrality and density scores were calculated for each cluster, resulting in a strategic diagram. The origin of the diagram was set at the mean centrality (7.14) and density (0.541). Notably, Cluster 7 (globalization of education) exhibited the highest centrality (18.857) and density (1.451), indicating strong internal and external conceptual linkages. First Quadrant (High Density/Centrality): Clusters 6 (cultural globalization) and 7 (globalization of education) represented core themes characterized by high cohesion and centrality, demonstrating extensive interconnections with other aspects of globalization. Second Quadrant (High Density, Lower Centrality): Clusters 9 (globalization and identity) and 10 (globalization and geoculture) were specialized subfields that exhibited cohesion but had limited influence on broader research trends. Third Quadrant (Low Density/Centrality): Clusters 2 (geopolitics), 3 (glocalization), 4 (anti-globalization), 5 (transnationalization), 8 (national security), and 11 (urban environment) consisted of emerging or declining topics with underdeveloped connections. Fourth Quadrant (Low Density, High Potential): Cluster 1 (globalization and economy) showed low centrality but high potential for future growth, reflecting globalization impact on national and international economies through concepts like economic growth and the KOF Globalization Index.This study underscored globalization as an interdisciplinary topic that spans foundational concepts and specialized applications. Researchers are encouraged to investigate the emerging areas identified in the third quadrant: geopolitics of globalization, glocalization, anti-globalization, transnationalization, national security, and urban environment. Although currently underdeveloped, these themes hold significant potential for shaping future scholarly discourse.

Social Sciences, Sociology (General)
DOAJ Open Access 2025
Fusing content and social relationships: a multi-modal heterogeneous graph transformer approach for social bot detection

Jianhong Luo, Chaoqi Jin

Abstract Social bots pose a significant threat to online platforms, demanding robust methods to detect their increasingly complex behaviors. This paper introduces MM-HGT-Bot, a multi-modal framework that advances the field by operationalizing social network theory in a new way. Our core contribution is the deconstruction of social ties into two distinct, theoretically-grounded dimensions: information source selection (the following network) and potential influence (the follower network). Our architecture employs a Heterogeneous Graph Transformer (HGT) to learn the unique patterns emerging from these different relationship types. It then synergistically fuses these relational insights with context-aware representations of user-generated content. Extensive experiments on the widely-used Cresci-15 and Twibot-20 datasets demonstrate that our approach consistently outperforms state-of-the-art baselines. These findings highlight that a more fine-grained and theoretically-informed modeling of social relationships is crucial for building effective and robust bot detection systems.

Computer applications to medicine. Medical informatics
DOAJ Open Access 2025
Hospitalisations for physical abuse in infants and children less than 5 years, 2013‒2021: a multinational cohort study using administrative data from five European countries

Catherine Quantin, Jonathan Cottenet, Colleen Chambers et al.

Objectives Child physical abuse (CPA) is a global public health problem associated with lifelong negative consequences, yet reliable epidemiologic data are lacking. We did a multinational cohort study to analyse trends in CPA hospitalisations between 2013 and 2021. Method We used medico-administrative databases to identify children aged one month to five years hospitalised in Denmark, England, France, Ireland, and Wales. Analysing data on more than 12 million hospitalisations, we identified CPA using a validated algorithm based on International Classification of Diseases-10 codes (ICD-10 codes). We calculated the number, proportion, and incidence rate of children hospitalised for CPA, and the number and proportion of total hospitalisations for CPA, by year and age group (<1 and <5). We assessed the distribution of ICD-10 codes used to identify CPA, in each country. Results The pooled incidence rate of infants <1 year hospitalised for CPA was stable over time (around 42/100,000 per year), ranging on average from 33 to 48/100,000 between countries. Average incidence rates for infants were highest in England and lowest in Wales. The pooled proportion of infant CPA hospitalisations was around 0.17% per year (range 0.15–0.21%), increasing significantly during the COVID-19 pandemic in 2020 (0.21%). In children <5, the incidence rate (around 18/100,000 per year) and proportion of CPA hospitalisations (around 0.11% per year, range 0.10–0.14%) were lower than in infants but also increased in 2020 (0.14%). There were national differences in the distribution of ICD-10 codes used to record CPA and differences in year-on-year trends between countries. Conclusions This study is, to our knowledge, the first large-scale analysis examining trends in CPA hospitalisations in more than two European countries. We demonstrated that comparing temporal trends in CPA hospitalisations between countries is feasible, implying that hospital data are one of several valuable sources of information for surveillance of CPA.

Demography. Population. Vital events
DOAJ Open Access 2025
A Systematic Review on the Toxicology of European Union-Approved Triazole Fungicides in Cell Lines and Mammalian Models

Constantina-Bianca Vulpe, Adina-Daniela Iachimov-Datcu, Andrijana Pujicic et al.

Triazole fungicides are widely used in agriculture but may pose risks to human health through occupational, accidental, or environmental exposure. This systematic review aimed to evaluate the toxicity of ten European Union-approved triazole fungicides in rodent models and cell lines. A total of 70 studies were included, reporting quantitative in vivo oral, dermal, or inhalation toxicity in mammals or quantitative in vitro cytotoxicity in human or mammalian cell lines; the exclusion criteria comprised publications not in English or not accessible. Literature searches were conducted in Web of Science, Google Scholar, and the Pesticide Properties DataBase (PPDB), and risk of bias in included studies was assessed using ToxRTool. Due to heterogeneity in study designs, reporting formats, and endpoints, data were synthesized descriptively. Quantitative endpoints included LD<sub>50</sub>/LC<sub>50</sub> values for in vivo studies and LOEC, IC<sub>50</sub>, LC<sub>50</sub>, and EC<sub>50</sub> values for in vitro studies, while mechanistic endpoints highlighted apoptosis, oxidative stress, genotoxicity, and endoplasmic reticulum stress. Difenoconazole and tebuconazole were the most extensively studied compounds, whereas several triazoles had limited data. The limitations included heterogeneity of data and incomplete reporting, which restrict cross-study comparisons. Overall, the findings provide a comprehensive overview of potential human health hazards associated with EU-approved triazole fungicides and highlight critical knowledge gaps. The review was registered in Open Science Framework.

Therapeutics. Pharmacology, Toxicology. Poisons
arXiv Open Access 2025
You Can't Get There From Here: Redefining Information Science to address our sociotechnical futures

Scott Humr, Mustafa Canan

Current definitions of Information Science are inadequate to comprehensively describe the nature of its field of study and for addressing the problems that are arising from intelligent technologies. The ubiquitous rise of artificial intelligence applications and their impact on society demands the field of Information Science acknowledge the sociotechnical nature of these technologies. Previous definitions of Information Science over the last six decades have inadequately addressed the environmental, human, and social aspects of these technologies. This perspective piece advocates for an expanded definition of Information Science that fully includes the sociotechnical impacts information has on the conduct of research in this field. Proposing an expanded definition of Information Science that includes the sociotechnical aspects of this field should stimulate both conversation and widen the interdisciplinary lens necessary to address how intelligent technologies may be incorporated into society and our lives more fairly.

en cs.CY, cs.AI
arXiv Open Access 2025
RADx Data Hub: A Cloud Platform for FAIR, Harmonized COVID-19 Data

Marcos Martinez-Romero, Matthew Horridge, Nilesh Mistry et al.

The COVID-19 pandemic highlighted the urgent need for robust systems to enable rapid data collection, integration, and analysis for public health responses. Existing approaches often relied on disparate, non-interoperable systems, creating bottlenecks in comprehensive analyses and timely decision-making. To address these challenges, the U.S. National Institutes of Health (NIH) launched the Rapid Acceleration of Diagnostics (RADx) initiative in 2020, with the RADx Data Hub, a centralized repository for de-identified and curated COVID-19 data, as its cornerstone. The RADx Data Hub hosts diverse study data, including clinical data, testing results, smart sensor outputs, self-reported symptoms, and information on social determinants of health. Built on cloud infrastructure, the RADx Data Hub integrates metadata standards, interoperable formats, and ontology-based tools to adhere to the FAIR (Findable, Accessible, Interoperable, Reusable) principles for data sharing. Initially developed for COVID-19 research, its architecture and processes are adaptable to other scientific disciplines. This paper provides an overview of the data hosted by the RADx Data Hub and describes the platform's capabilities and architecture.

en cs.DB
CrossRef Open Access 2024
Data on the Margins &amp;ndash; Data from LGBTIQ+ Populations in European Social Science Data Archives

Jonas Recker, Anja Perry

Data gaps are a significant lack of data about marginalized groups existing due to unequal power relations (D’Ignazio and Klein, 2020). They both perpetuate and result in a dominance of male, white, hetero, and cis perspectives in how we make sense of and interact with the world. The most prominent data gap is the gender data gap notably described by Criado-Perez (2020). However, not only women, but all marginalized groups are affected by such gaps, as data about them are frequently not collected due to a disregard on behalf of those in power of the need to do so. LGBTIQ+ people, considered a ‘hidden population’ by demographers, are a case in point. The acronym is used to refer to lesbian, gay, bisexual, trans, intersex, and queer people, as well as all people with non-normative sexual or gender identities, including asexual and agender people, who do not consider themselves as falling under one of these labels. A first step towards identifying and closing data gaps is to take stock of data that already exist. In this paper we give an overview of LGBTIQ+ data in European social science archives. We researched all data archives of CESSDA ERIC, the Consortium of European Social Science Data Archives, and found 66 LGBTIQ+ datasets in 9 of the 34 member and associated archives and 1 former member archive. We discuss characteristics, coverages, and findability of the identified datasets and approach the question of potential data gaps by analyzing the keywords assigned to each dataset by the archive.

2 sitasi en
CrossRef Open Access 2024
Data? What Data?

Rene Bekkers

Scientific research is based on data. How should researchers treat and document the data they analyze? Research Data Management policies recommend that data should be “as open as possible, and as closed as necessary”. What does that mean in practice? <em> Only publish data you are allowed to publish </em> . The “as open as possible” principle certainly does not mean that researchers should make all data they have public.

DOAJ Open Access 2024
New roles of research data infrastructure in research paradigm evolution

Li Yizhan, Dong Lu, Fan Xiaoxiao et al.

Research data infrastructures form the cornerstone in both cyber and physical spaces, driving the progression of the data-intensive scientific research paradigm. This opinion paper presents an overview of global research data infrastructure, drawing insights from national roadmaps and strategic documents related to research data infrastructure. It emphasizes the pivotal role of research data infrastructures by delineating four new missions aimed at positioning them at the core of the current scientific research and communication ecosystem. The four new missions of research data infrastructures are: (1) as a pioneer, to transcend the disciplinary border and address complex, cutting-edge scientific and social challenges with problem- and data-oriented insights; (2) as an architect, to establish a digital, intelligent, flexible research and knowledge services environment; (3) as a platform, to foster the high-end academic communication; (4) as a coordinator, to balance scientific openness with ethics needs.

Information technology, Electronic computers. Computer science
DOAJ Open Access 2024
A Generative Super‐Resolution Model for Enhancing Tropical Cyclone Wind Field Intensity and Resolution

Joseph W. Lockwood, Avantika Gori, Pierre Gentine

Abstract Extreme winds associated with tropical cyclones (TCs) can cause significant loss of life and economic damage globally, highlighting the need for accurate, high‐resolution modeling and forecasting for wind. However, due to their coarse horizontal resolution, most global climate and weather models suffer from chronic underprediction of TC wind speeds, limiting their use for impact analysis and energy modeling. In this study, we introduce a cascading deep learning framework designed to downscale high‐resolution TC wind fields given low‐resolution data. Our approach maps 85 TC events from ERA5 data (0.25° resolution) to high‐resolution (0.05° resolution) observations at 6‐hr intervals. The initial component is a debiasing neural network designed to model accurate wind speed observations using ERA5 data. The second component employs a generative super‐resolution strategy based on a conditional denoising diffusion probabilistic model (DDPM) to enhance the spatial resolution and to produce ensemble estimates. The model is able to accurately model intensity and produce realistic radial profiles and fine‐scale spatial structures of wind fields, with a percentage mean bias of −3.74% compared to the high‐resolution observations. Our downscaling framework enables the prediction of high‐resolution wind fields using widely available low‐resolution and intensity wind data, allowing for the modeling of past events and the assessment of future TC risks.

Geophysics. Cosmic physics, Information technology
DOAJ Open Access 2024
The Impact of Socioeconomic Factors on Kidney Transplantation: A Systematic Review of Low- and Middle-Income Countries

Nguyen Xuong Duong, Minh Sam Thai, Ngoc Sinh Tran et al.

Kidney transplantation (KT) is a preferred treatment for end-stage renal disease (ESRD) because it offers better long-term survival and cost-effectiveness compared to dialysis. Significant global disparities persist in access to KT, particularly in low- and middle-income countries (LMICs). This study aims to assess the epidemiology and outcomes of KT in LMICs while examining the relationship between a country’s income level and its KT prevalence. A systematic review of the literature was conducted, with searches of PubMed, Scopus, and Web of Science from inception to 31 May 2024. Relevant articles reporting on the epidemiology and outcomes of KT or ESRD patients undergoing kidney replacement therapy (KRT) in LMICs were included. A total of 8054 articles were identified, with 972 articles selected for full-text screening after initial title and abstract review. Following full-text screening, 35 articles met the inclusion criteria. The data showed significant variation in KRT and KT prevalence across different geographical locations. Higher-income countries within LMICs tended to have higher KT prevalence rates. Barriers such as inadequate healthcare infrastructure, limited financial resources, and insufficient organ donation frameworks were identified as contributing factors to the low KT rates in these regions. The study highlights the disparities in KT access and prevalence in LMICs, underscoring the need for targeted interventions and international collaboration to address these gaps. Efforts to increase both living and deceased donor transplants, expand health system capacity, and incorporate KT in healthcare planning are needed to close this gap. Global partnerships spearheaded by organizations such as The Transplantation Society (TTS) and the International Society of Nephrology (ISN) are crucial for improving KT rates and outcomes in LMICs.

Diseases of the genitourinary system. Urology

Halaman 8 dari 2234627