Optimal hyperdimensional representation for learning and cognitive computation
Prathyush P. Poduval, Hamza Errahmouni Barkam, Xiangjian Liu
et al.
Hyperdimensional Computing (HDC) is a neurally inspired computing paradigm that leverages lightweight, high-dimensional operations to emulate key brain functions. Recent advances in HDC have primarily targeted two domains: learning, where the goal is to extract and generalize patterns for tasks such as classification, and cognitive computation, which requires accurate information retrieval for human-like reasoning. Although state-of-the-art HDC methods achieve strong performance in both areas, they lack a principled understanding of the fundamentally different requirements imposed by learning vs. cognition. In particular, existing works provide limited guidance on designing encoding methods that generate optimal hyperdimensional representations for these distinct tasks. In this study, we proposed the first universal hyperdimensional encoding method that dynamically adapts to the needs of both learning and cognitive computation. Our approach is based on neural-symbolic techniques that assign random complex hypervectors to atomic bases (e.g., alphabet definitions) and then apply algebraic operations in the high-dimensional hyperspace to control the correlation structure among encoded data points. Through theoretical analysis, we show that learning tasks benefit from correlated representations to maximize memorization and generalization capacity, whereas cognitive tasks require orthogonal, highly separable representations to enable accurate decoding and reasoning. We further derived a separation metric that quantifies this trade-off and validated it empirically across image classification and decoding tasks. Our results demonstrate that tuning the encoder to increase correlation improves classification accuracy from 65% to 95%, while maximizing separation enhances decoding accuracy from 85% to 100%. These findings provide the first systematic framework for designing hyperdimensional encoders that unify learning and cognition under a single, theoretically grounded representation model.
Electronic computers. Computer science
From Awareness to Application: Strengthening Recruitment for NSF S-STEM Scholarships in Computer Science
Xiaohui Yuan
Recruiting academically strong students into NSF S-STEM scholarship programs remains a persistent challenge in computer science education. This paper presents the design and initial implementation of a suite of targeted recruitment strategies for our NSF-funded project. Our recruitment strategy leverages multiple channels. Information sessions and early outreach efforts were employed to increase awareness and reduce perceived barriers to applying. Data from our recruitment includes applicant demographics, academic performance, financial aid profiles, recruitment source tracking, and survey responses on students awareness and decision-making processes. These data provide a foundation for evaluating the reach and effectiveness of various recruitment strategies and identifying factors that influence student application decisions. Quantitative and qualitative research approaches are employed to examine the implementation and outcomes of proactive recruitment strategies. Our preliminary analysis indicates that direct information sessions and departmental emails are effective recruitment strategies, accounting for a large portion of eligible applications. Our findings emphasize the importance of early communication about the program, clearly defined eligibility criteria, and a streamlined application process. By sharing ongoing progress and lessons learned from our project, this paper contributes evidence-based insights into recruitment practices and offers strategies that can be adapted by other institutions implementing NSF S-STEM programs.
Evaluating accuracy and reproducibility of large language model performance on critical care assessments in pharmacy education
Huibo Yang, Mengxuan Hu, Amoreena Most
et al.
BackgroundLarge language models (LLMs) have demonstrated impressive performance on medical licensing and diagnosis-related exams. However, comparative evaluations to optimize LLM performance and ability in the domain of comprehensive medication management (CMM) are lacking. The purpose of this evaluation was to test various LLMs performance optimization strategies and performance on critical care pharmacotherapy questions used in the assessment of Doctor of Pharmacy students.MethodsIn a comparative analysis using 219 multiple-choice pharmacotherapy questions, five LLMs (GPT-3.5, GPT-4, Claude 2, Llama2-7b and 2-13b) were evaluated. Each LLM was queried five times to evaluate the primary outcome of accuracy (i.e., correctness). Secondary outcomes included variance, the impact of prompt engineering techniques (e.g., chain-of-thought, CoT) and training of a customized GPT on performance, and comparison to third year doctor of pharmacy students on knowledge recall vs. knowledge application questions. Accuracy and variance were compared with student’s t-test to compare performance under different model settings.ResultsChatGPT-4 exhibited the highest accuracy (71.6%), while Llama2-13b had the lowest variance (0.070). All LLMs performed more accurately on knowledge recall vs. knowledge application questions (e.g., ChatGPT-4: 87% vs. 67%). When applied to ChatGPT-4, few-shot CoT across five runs improved accuracy (77.4% vs. 71.5%) with no effect on variance. Self-consistency and the custom-trained GPT demonstrated similar accuracy to ChatGPT-4 with few-shot CoT. Overall pharmacy student accuracy was 81%, compared to an optimal overall LLM accuracy of 73%. Comparing question types, six of the LLMs demonstrated equivalent or higher accuracy than pharmacy students on knowledge recall questions (e.g., self-consistency vs. students: 93% vs. 84%), but pharmacy students achieved higher accuracy than all LLMs on knowledge application questions (e.g., self-consistency vs. students: 68% vs. 80%).ConclusionChatGPT-4 was the most accurate LLM on critical care pharmacy questions and few-shot CoT improved accuracy the most. Average student accuracy was similar to LLMs overall, and higher on knowledge application questions. These findings support the need for future assessment of customized training for the type of output needed. Reliance on LLMs is only supported with recall-based questions.
Electronic computers. Computer science
Precipitation prediction over the upper Indus Basin from large-scale circulation patterns using Gaussian processes
Kenza Tazi, Andrew Orr, J. Scott Hosking
et al.
Water resources from the Indus Basin sustain over 270 million people. However, water security in this region is threatened by climate change. This is especially the case for the upper Indus Basin, where most frozen water reserves are expected to decrease significantly by the end of the century, leaving rainfall as the main driver of river flow. However, future precipitation estimates from global climate models differ greatly for this region. To address this uncertainty, this paper explores the feasibility of using probabilistic machine learning to map large-scale circulation fields, better represented by global climate models, to local precipitation over the upper Indus Basin. More specifically, Gaussian processes are trained to predict monthly ERA5 precipitation data over a 15-year horizon. This paper also explores different Gaussian process model designs, including a non-stationary covariance function to learn complex spatial relationships in the data. Going forward, this approach could be used to make more accurate predictions from global climate model outputs and better assess the probability of future precipitation extremes.
Environmental sciences, Electronic computers. Computer science
Implementation of personalized customization and enhanced experiences for cultural tourism resources using genetic algorithm-based virtual reality technology
Huiya Xing, Xiangyi Li, Min Liu
et al.
Cultural tourism is important for preserving cultural history and giving visitors immersive experiences, but tailoring it to each visitor's needs is still a major problem. It offers a distinct method of improving cultural tourism by combining Virtual Reality (VR), Genetic Algorithm (GA), and individual customization. Premature convergence and inadequate population variety are addressed by the Dynamic variety-Enhanced Genetic Algorithm (DDE-GA), a variation of the conventional GA. DDE-GA improves the investigation of possible solutions by dynamically modifying selection pressure according to population diversity, it makes it particularly useful for tackling optimization problems that are complicated, multi-modal, and highly dimensional. Creating an immersive environment that enables visitors to experience cultural heritage in a manner that is entirely tailored to their preferences, interests, and schedules is the objective of virtual reality technology. By adjusting to these individual parameters, the algorithm cleverly optimizes tourist itineraries. The DDE-GA-powered VR system works better than current methods, according to experimental data, with improvements in reaction time (1.1 s), accuracy (98 %), precision (97 %), and modeling error (0.10). When compared to convolutional algorithms, the suggested approach specifically enhances accuracy and drastically lowers error. This invention assists not only in satisfying tourists with individualized experiences but also in popularizing and preserving cultural traditions via the use of modern technology. The research concludes that integrating DDE-GA with VR technology substantially enhances personalized cultural tourism by optimizing routes based on user-specific preferences. This approach yields notable improvements in accuracy, precision, and response time while minimizing modeling errors. Furthermore, it contributes to both enriching tourist experiences and advancing cultural heritage conservation through innovative technological applications.
Information technology, Electronic computers. Computer science
Fix: externalizing network I/O in serverless computing
Yuhan Deng, Akshay Srivatsan, Sebastian Ingino
et al.
We describe a system for serverless computing where users, programs, and the underlying platform share a common representation of a computation: a deterministic procedure, run in an environment of well-specified data or the outputs of other computations. This representation externalizes I/O: data movement over the network is performed exclusively by the platform. Applications can describe the precise data needed at each stage, helping the provider schedule tasks and network transfers to reduce starvation. The design suggests an end-to-end argument for outsourced computing, shifting the service model from ``pay-for-effort'' to ``pay-for-results.''
Computing Linear Regions in Neural Networks with Skip Connections
Johnny Joyce, Jan Verschelde
Neural networks are important tools in machine learning. Representing piecewise linear activation functions with tropical arithmetic enables the application of tropical geometry. Algorithms are presented to compute regions where the neural networks are linear maps. Through computational experiments, we provide insights on the difficulty to train neural networks, in particular on the problems of overfitting and on the benefits of skip connections.
Significance of relative phase features for shouted and normal speech classification
Khomdet Phapatanaburi, Longbiao Wang, Meng Liu
et al.
Abstract Shouted and normal speech classification plays an important role in many speech-related applications. The existing works are often based on magnitude-based features and ignore phase-based features, which are directly related to magnitude information. In this paper, the importance of phase-based features is explored for the detection of shouted speech. The novel contributions of this work are as follows. (1) Three phase-based features, namely, relative phase (RP), linear prediction analysis estimated speech-based RP (LPAES-RP) and linear prediction residual-based RP (LPR-RP) features, are explored for shouted and normal speech classification. (2) We propose a new RP feature, called the glottal source-based RP (GRP) feature. The main idea of the proposed GRP feature is to exploit the difference between RP and LPAES-RP features to detect shouted speech. (3) A score combination of phase- and magnitude-based features is also employed to further improve the classification performance. The proposed feature and combination are evaluated using the shouted normal electroglottograph speech (SNE-Speech) corpus. The experimental findings show that the RP, LPAES-RP, and LPR-RP features provide promising results for the detection of shouted speech. We also find that the proposed GRP feature can provide better results than those of the standard mel-frequency cepstral coefficient (MFCC) feature. Moreover, compared to using individual features, the score combination of the MFCC and RP/LPAES-RP/LPR-RP/GRP features yields an improved detection performance. Performance analysis under noisy environments shows that the score combination of the MFCC and the RP/LPAES-RP/LPR-RP features gives more robust classification. These outcomes show the importance of RP features in distinguishing shouted speech from normal speech.
Acoustics. Sound, Electronic computers. Computer science
Study on the influencing factors of shared bikes connection to rail transit stations in Xiamen island(厦门岛轨道交通站点共享单车接驳影响因素研究)
饶传坤(RAO Chuankun), 雷思静(LEI Sijing)
With the advantages of high efficiency and convenience, shared bikes are rapidly gaining popularity in China, and become an important connection mode for rail transit stations, but it also brings many problems which affect urban transportation and environment. Based on multi-source big data such as Xiamen shared bikes and urban space, this article analyzes the spatial and temporal characteristics of shared bikes travel by Python and GIS, and explores their riding characteristics and the effects concerning land use, urban environment and other aspects in the station area. Shared bikes have the characteristics of short spatial and temporal distance utilization, which provides an important support for the connecting traffic of urban subway stations. The connection and utilization of shared bikes around the station are affected by various urban factors, which also reflect the stages and differences of the station development. By analyzing the characteristics of shared bikes trips, rail transit stations can be classified into four types, and the develop strategies should be taken according to the difference of station types, to improve the urban slow traffic system, promote the development level of rail transit stations.(共享单车凭借其高效便利的优势在全国各大城市快速普及,已成为轨道交通站点重要的接驳方式,但其供需失衡、乱停乱放等问题也给城市交通与环境带来挑战。利用厦门市共享单车及城市空间等多源大数据,通过Python、GIS解析共享单车出行的时空特征,探究轨道站域骑行特征及其来自土地利用、城市建成环境等方面的影响效应,结果表明:共享单车具有短时空距离利用的特征,为城市地铁站点的接驳交通提供了重要支撑;站点周边共享单车的接驳利用受城市多种因素影响,同时也反映了站点开发的阶段性和差异性;根据共享单车出行特征,轨道交通站点可归纳为四类,应根据站点的不同类型因站施策,完善城市慢行系统,促进共享单车骑行,加强轨道交通站域开发。)
Electronic computers. Computer science, Physics
A Hybrid Learning-Architecture for Improved Brain Tumor Recognition
Jose Dixon, Oluwatunmise Akinniyi, Abeer Abdelhamid
et al.
The accurate classification of brain tumors is an important step for early intervention. Artificial intelligence (AI)-based diagnostic systems have been utilized in recent years to help automate the process and provide more objective and faster diagnosis. This work introduces an enhanced AI-based architecture for improved brain tumor classification. We introduce a hybrid architecture that integrates vision transformer (ViT) and deep neural networks to create an ensemble classifier, resulting in a more robust brain tumor classification framework. The analysis pipeline begins with preprocessing and data normalization, followed by extracting three types of MRI-derived information-rich features. The latter included higher-order texture and structural feature sets to harness the spatial interactions between image intensities, which were derived using Haralick features and local binary patterns. Additionally, local deeper features of the brain images are extracted using an optimized convolutional neural networks (CNN) architecture. Finally, ViT-derived features are also integrated due to their ability to handle dependencies across larger distances while being less sensitive to data augmentation. The extracted features are then weighted, fused, and fed to a machine learning classifier for the final classification of brain MRIs. The proposed weighted ensemble architecture has been evaluated on publicly available and locally collected brain MRIs of four classes using various metrics. The results showed that leveraging the benefits of individual components of the proposed architecture leads to improved performance using ablation studies.
Industrial engineering. Management engineering, Electronic computers. Computer science
Final Report for CHESS: Cloud, High-Performance Computing, and Edge for Science and Security
Nathan Tallent, Jan Strube, Luanzheng Guo
et al.
Automating the theory-experiment cycle requires effective distributed workflows that utilize a computing continuum spanning lab instruments, edge sensors, computing resources at multiple facilities, data sets distributed across multiple information sources, and potentially cloud. Unfortunately, the obvious methods for constructing continuum platforms, orchestrating workflow tasks, and curating datasets over time fail to achieve scientific requirements for performance, energy, security, and reliability. Furthermore, achieving the best use of continuum resources depends upon the efficient composition and execution of workflow tasks, i.e., combinations of numerical solvers, data analytics, and machine learning. Pacific Northwest National Laboratory's LDRD "Cloud, High-Performance Computing (HPC), and Edge for Science and Security" (CHESS) has developed a set of interrelated capabilities for enabling distributed scientific workflows and curating datasets. This report describes the results and successes of CHESS from the perspective of open science.
The Existence, Transcendence, and Evolution of the Subject—A Method Based on Subject Information
Zheng Wu
Based on the modern dilemma of the existence of the subject, information philosophy is transformed into ontological “subject information”, and the basic elements of the virtual dimension and the real dimension are abstracted from it. And then, with the help of the alternate transformation of the virtual dimension information and the real dimension information, the existence and evolution of subject information are explored.
Electronic computers. Computer science
Comparing Measured Agile Software Development Metrics Using an Agile Model-Based Software Engineering Approach versus Scrum Only
Moe Huss, Daniel R. Herber, John M. Borky
This study compares the <i>reliability of estimation</i>, <i>productivity</i>, and <i>defect rate</i> metrics for sprints driven by a specific instance of the agile approach (i.e., scrum) and an agile model-Bbased software engineering (MBSE) approach called the integrated Scrum Model-Based System Architecture Process (sMBSAP) when developing a software system. The quasi-experimental study conducted ten sprints using each approach. The approaches were then evaluated based on their effectiveness in helping the <i>product development team</i> estimate the backlog items that they could build during a time-boxed sprint and deliver more product backlog items (PBI) with fewer defects. The <i>commitment reliability (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>C</mi><mi>R</mi></mrow></semantics></math></inline-formula>)</i> was calculated to compare the <i>reliability of estimation</i> with a measured average scrum-driven value of 0.81 versus a statistically different average sMBSAP-driven value of 0.94. Similarly, the average <i>sprint velocity</i> (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>S</mi><mi>V</mi></mrow></semantics></math></inline-formula>) for the scrum-driven sprints was 26.8 versus 31.8 for the MBSAP-driven sprints. The average <i>defect density</i> (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>D</mi><mi>D</mi></mrow></semantics></math></inline-formula>) for the scrum-driven sprints was 0.91, while that of the sMBSAP-driven sprints was 0.63. The average <i>defect leakage</i> (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>D</mi><mi>L</mi></mrow></semantics></math></inline-formula>) for the scrum-driven sprints was 0.20, while that of the sMBSAP-driven sprints was 0.15. The <i>t</i>-test analysis concluded that the sMBSAP-driven sprints were associated with a statistically significant larger mean <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>C</mi><mi>R</mi></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>S</mi><mi>V</mi></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>D</mi><mi>D</mi></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>D</mi><mi>L</mi></mrow></semantics></math></inline-formula> than that of the scrum-driven sprints. The overall results demonstrate formal quantitative benefits of an agile MBSE approach compared to an agile alone, thereby strengthening the case for considering agile MBSE methods within the software development community. Future work might include comparing agile and agile MBSE methods using alternative research designs and further software development objectives, techniques, and metrics.
Analytical Solution for Fluid Flow and Heat Transfer in a Three-Dimensional Inclined Horizontal Channel and Under The Influence of Thermal Radiation
Ahmed Jassim, AHMED SALAR
In this paper, the analytical solution to the problem of heat transfer and fluid flow was obtained by using the quadruple Laplace transform method. Temperature distribution and fluid flow distribution were shown, temperature and fluid flow increase when the value of z increases, as well as the effect of the radiation parameter shown, it was concluded that the temperature increase with the increase in the value of the radiation coefficient . Matlab was used to plot the results.
Mathematics, Electronic computers. Computer science
Prospects for Time-Domain and Multi-Messenger Science with AXIS
The AXIS Time-Domain, Multi-Messenger Science Working Group, :
et al.
The Advanced X-ray Imaging Satellite (AXIS) promises revolutionary science in the X-ray and multi-messenger time domain. AXIS will leverage excellent spatial resolution (<1.5 arcsec), sensitivity (80x that of Swift), and a large collecting area (5-10x that of Chandra) across a 24-arcmin diameter field of view to discover and characterize a wide range of X-ray transients from supernova-shock breakouts to tidal disruption events to highly variable supermassive black holes. The observatory's ability to localize and monitor faint X-ray sources opens up new opportunities to hunt for counterparts to distant binary neutron star mergers, fast radio bursts, and exotic phenomena like fast X-ray transients. AXIS will offer a response time of <2 hours to community alerts, enabling studies of gravitational wave sources, high-energy neutrino emitters, X-ray binaries, magnetars, and other targets of opportunity. This white paper highlights some of the discovery science that will be driven by AXIS in this burgeoning field of time domain and multi-messenger astrophysics.
en
astro-ph.HE, astro-ph.IM
Developing a Robust Computable Phenotype Definition Workflow to Describe Health and Disease in Observational Health Research
Jacob S. Zelko, Sarah Gasman, Shenita R. Freeman
et al.
Health informatics can inform decisions that practitioners, patients, policymakers, and researchers need to make about health and disease. Health informatics is built upon patient health data leading to the need to codify patient health information. Such standardization is required to compute population statistics (such as prevalence, incidence, etc.) that are common metrics used in fields such as epidemiology. Reliable decision-making about health and disease rests on our ability to organize, analyze, and assess data repositories that contain patient health data. While standards exist to structure and analyze patient data across patient data sources such as health information exchanges, clinical data repositories, and health data marketplaces, analogous best practices for rigorously defining patient populations in health informatics contexts do not exist. Codifying best practices for developing disease definitions could support the effective development of clinical guidelines, inform algorithms used in clinical decision support systems, and additional patient guidelines. In this paper, we present a workflow for the development of phenotype definitions. This workflow presents a series of recommendations for defining health and disease. Various examples within this paper are presented to demonstrate this workflow in health informatics contexts.
Language Cognition and Language Computation -- Human and Machine Language Understanding
Shaonan Wang, Nai Ding, Nan Lin
et al.
Language understanding is a key scientific issue in the fields of cognitive and computer science. However, the two disciplines differ substantially in the specific research questions. Cognitive science focuses on analyzing the specific mechanism of the brain and investigating the brain's response to language; few studies have examined the brain's language system as a whole. By contrast, computer scientists focus on the efficiency of practical applications when choosing research questions but may ignore the most essential laws of language. Given these differences, can a combination of the disciplines offer new insights for building intelligent language models and studying language cognitive mechanisms? In the following text, we first review the research questions, history, and methods of language understanding in cognitive and computer science, focusing on the current progress and challenges. We then compare and contrast the research of language understanding in cognitive and computer sciences. Finally, we review existing work that combines insights from language cognition and language computation and offer prospects for future development trends.
A Blockchain Application Prototype for the Internet of Things
Mansour Mededjel, Ghalem Belalem, Fatima Zohra Nesrine Benadda
et al.
The emergence of the Internet of things (IoT), associated with the explosion in the number of connected objects, and the growth in user needs, makes the Internet network very complex. IoT objects are diverse and heterogeneous, which requires establishing interoperability and efficient identity management on the one hand. On the other hand, centralized architectures such as cloud-based ones can have overhead and high latency, with a potential risk of failure. Facing these challenges, Blockchain technology, with its decentralized architecture based on a distributed peer-to-peer network, offers a new infrastructure that allows IoT objects to interact reliably and securely. In this paper, a new approach is proposed with a three-layer architecture: layer of sensing and collection of data made up of the IoT network, layer of processing and saving of data exchanges at the Blockchain level, and access and visualization layer via a web interface. The prototype implemented in this study allows all transactions (data exchanges) generated by IoT devices to be recorded and stored on a dedicated Blockchain, assuring the security of IoT objects' communications. This prototype also enables access to and visualization of all data and information, thus enhancing the IoT network's transparency.
ILA4: Overcoming missing values in machine learning datasets – An inductive learning approach
Ammar Elhassan, Saleh M. Abu-Soud, Firas Alghanim
et al.
This article introduces ILA4: A new algorithm designed to handle datasets with missing values. ILA4 is inspired by a series of ILA algorithms which also handle missing data with further enhancements. ILA4 is applied to datasets with varying completeness and also compared to other, known approaches for handling datasets with missing values. In the majority of cases, ILA4 produced favorable performance that is on a par with many established approaches for treating missing values including algorithms that are based on the Most Common Value (MCV), the Most Common Value Restricted to a Concept (MCVRC), and those that utilize the Delete strategy. ILA4 was also compared with three known algorithms namely: Logistic Regression, Naïve Bayes, and Random Forest; the accuracy obtained by ILA4 is comparable or better than the best results obtained from these three algorithms.
Electronic computers. Computer science
A Computational Inflection for Scientific Discovery
Tom Hope, Doug Downey, Oren Etzioni
et al.
We stand at the foot of a significant inflection in the trajectory of scientific discovery. As society continues on its fast-paced digital transformation, so does humankind's collective scientific knowledge and discourse. We now read and write papers in digitized form, and a great deal of the formal and informal processes of science are captured digitally -- including papers, preprints and books, code and datasets, conference presentations, and interactions in social networks and collaboration and communication platforms. The transition has led to the creation and growth of a tremendous amount of information -- much of which is available for public access -- opening exciting opportunities for computational models and systems that analyze and harness it. In parallel, exponential growth in data processing power has fueled remarkable advances in artificial intelligence, including large neural language models capable of learning powerful representations from unstructured text. Dramatic changes in scientific communication -- such as the advent of the first scientific journal in the 17th century -- have historically catalyzed revolutions in scientific thought. The confluence of societal and computational trends suggests that computer science is poised to ignite a revolution in the scientific process itself.