Ming Wang, Zheng Yan, Ting Wang et al.
Hasil untuk "Cybernetics"
Menampilkan 20 dari ~134499 hasil · dari arXiv, DOAJ, Semantic Scholar, CrossRef
Hongkun Jin, Hongcheng Jiang, Zejun Zhang et al.
Transformer-based methods have demonstrated strong potential in hyperspectral pansharpening by modeling long-range dependencies. However, their effectiveness is often limited by redundant token representations and a lack of multi-scale feature modeling. Hyperspectral images exhibit intrinsic spectral priors (e.g., abundance sparsity) and spatial priors (e.g., non-local similarity), which are critical for accurate reconstruction. From a spectral-spatial perspective, Vision Transformers (ViTs) face two major limitations: they struggle to preserve high-frequency components--such as material edges and texture transitions--and suffer from attention dispersion across redundant tokens. These issues stem from the global self-attention mechanism, which tends to dilute high-frequency signals and overlook localized details. To address these challenges, we propose the Token-wise High-frequency Augmentation Transformer (THAT), a novel framework designed to enhance hyperspectral pansharpening through improved high-frequency feature representation and token selection. Specifically, THAT introduces: (1) Pivotal Token Selective Attention (PTSA) to prioritize informative tokens and suppress redundancy; (2) a Multi-level Variance-aware Feed-forward Network (MVFN) to enhance high-frequency detail learning. Experiments on standard benchmarks show that THAT achieves state-of-the-art performance with improved reconstruction quality and efficiency. The source code is available at https://github.com/kailuo93/THAT.
Hailong Zhang, Yinfeng Yu, Liejun Wang et al.
Audio-visual navigation represents a significant area of research in which intelligent agents utilize egocentric visual and auditory perceptions to identify audio targets. Conventional navigation methodologies typically adopt a staged modular design, which involves first executing feature fusion, then utilizing Gated Recurrent Unit (GRU) modules for sequence modeling, and finally making decisions through reinforcement learning. While this modular approach has demonstrated effectiveness, it may also lead to redundant information processing and inconsistencies in information transmission between the various modules during the feature fusion and GRU sequence modeling phases. This paper presents IRCAM-AVN (Iterative Residual Cross-Attention Mechanism for Audiovisual Navigation), an end-to-end framework that integrates multimodal information fusion and sequence modeling within a unified IRCAM module, thereby replacing the traditional separate components for fusion and GRU. This innovative mechanism employs a multi-level residual design that concatenates initial multimodal sequences with processed information sequences. This methodological shift progressively optimizes the feature extraction process while reducing model bias and enhancing the model's stability and generalization capabilities. Empirical results indicate that intelligent agents employing the iterative residual cross-attention mechanism exhibit superior navigation performance.
Obaidullah Zaland, Chanh Nguyen, Florian T. Pokorny et al.
Federated Learning (FL) is an emerging distributed machine learning paradigm, where the collaborative training of a model involves dynamic participation of devices to achieve broad objectives. In contrast, classical machine learning (ML) typically requires data to be located on-premises for training, whereas FL leverages numerous user devices to train a shared global model without the need to share private data. Current robotic manipulation tasks are constrained by the individual capabilities and speed of robots due to limited low-latency computing resources. Consequently, the concept of cloud robotics has emerged, allowing robotic applications to harness the flexibility and reliability of computing resources, effectively alleviating their computational demands across the cloud-edge continuum. Undoubtedly, within this distributed computing context, as exemplified in cloud robotic manipulation scenarios, FL offers manifold advantages while also presenting several challenges and opportunities. In this paper, we present fundamental concepts of FL and their connection to cloud robotic manipulation. Additionally, we envision the opportunities and challenges associated with realizing efficient and reliable cloud robotic manipulation at scale through FL, where researchers adopt to design and verify FL models in either centralized or decentralized settings.
Simona-Vasilica Oprea, Adela Bâra
In this work, the utility of multimodal vision–language models (VLMs) for visual product understanding in e-commerce is investigated, focusing on two complementary models: ColQwen2 (<i>vidore/colqwen2-v1.0</i>) and ColPali (<i>vidore/colpali-v1.2-hf)</i>. These models are integrated into two architectures and evaluated across various product interpretation tasks, including image-grounded question answering, brand recognition and visual retrieval based on natural language prompts. ColQwen2, built on the Qwen2-VL backbone with LoRA-based adapter hot-swapping, demonstrates strong performance, allowing end-to-end image querying and text response synthesis. It excels at identifying attributes such as brand, color or usage based solely on product images and responds fluently to user questions. In contrast, ColPali, which utilizes the PaliGemma backbone, is optimized for explainability. It delivers detailed visual-token alignment maps that reveal how specific regions of an image contribute to retrieval decisions, offering transparency ideal for diagnostics or educational applications. Through comparative experiments using footwear imagery, it is demonstrated that ColQwen2 is highly effective in generating accurate responses to product-related questions, while ColPali provides fine-grained visual explanations that reinforce trust and model accountability.
Ali Kavoosi, Morgan P. Mitchell, Raveen Kariyawasam et al.
Sleep Stage Classification (SSC) is a labor-intensive task, requiring experts to examine hours of electrophysiological recordings for manual classification. This is a limiting factor when it comes to leveraging sleep stages for therapeutic purposes. With increasing affordability and expansion of wearable devices, automating SSC may enable deployment of sleep-based therapies at scale. Deep Learning has gained increasing attention as a potential method to automate this process. Previous research has shown accuracy comparable to manual expert scores. However, previous approaches require sizable amount of memory and computational resources. This constrains the ability to classify in real time and deploy models on the edge. To address this gap, we aim to provide a model capable of predicting sleep stages in real-time, without requiring access to external computational sources (e.g., mobile phone, cloud). The algorithm is power efficient to enable use on embedded battery powered systems. Our compact sleep stage classifier can be deployed on most off-the-shelf microcontrollers (MCU) with constrained hardware settings. This is due to the memory footprint of our approach requiring significantly fewer operations. The model was tested on three publicly available data bases and achieved performance comparable to the state of the art, whilst reducing model complexity by orders of magnitude (up to 280 times smaller compared to state of the art). We further optimized the model with quantization of parameters to 8 bits with only an average drop of 0.95% in accuracy. When implemented in firmware, the quantized model achieves a latency of 1.6 seconds on an Arm CortexM4 processor, allowing its use for on-line SSC-based therapies.
Jan Kijonka, Jan Kijonka, Petr Vavra et al.
Introduction: This study proposes an algorithm for preprocessing VCG records to obtain a representative QRS loop.Methods: The proposed algorithm uses the following methods: Digital filtering to remove noise from the signal, wavelet-based detection of ECG fiducial points and isoelectric PQ intervals, spatial alignment of QRS loops, QRS time synchronization using root mean square error minimization and ectopic QRS elimination. The representative QRS loop is calculated as the average of all QRS loops in the VCG record. The algorithm is evaluated on 161 VCG records from a database of 58 healthy control subjects, 69 patients with myocardial infarction, and 34 patients with bundle branch block. The morphologic intra-individual beat-to-beat variability rate is calculated for each VCG record.Results and Discussion: The maximum relative deviation is 12.2% for healthy control subjects, 19.3% for patients with myocardial infarction, and 17.2% for patients with bundle branch block. The performance of the algorithm is assessed by measuring the morphologic variability before and after QRS time synchronization and ectopic QRS elimination. The variability is reduced by a factor of 0.36 for healthy control subjects, 0.38 for patients with myocardial infarction, and 0.41 for patients with bundle branch block. The proposed algorithm can be used to generate a representative QRS loop for each VCG record. This representative QRS loop can be used to visualize, compare, and further process VCG records for automatic VCG record classification.
Esther Taiwo, Ahmed Akinsola, Edward Tella et al.
This study is focused on the ethics of Artificial Intelligence and its application in the United States, the paper highlights the impact AI has in every sector of the US economy and multiple facets of the technological space and the resultant effect on entities spanning businesses, government, academia, and civil society. There is a need for ethical considerations as these entities are beginning to depend on AI for delivering various crucial tasks, which immensely influence their operations, decision-making, and interactions with each other. The adoption of ethical principles, guidelines, and standards of work is therefore required throughout the entire process of AI development, deployment, and usage to ensure responsible and ethical AI practices. Our discussion explores eleven fundamental 'ethical principles' structured as overarching themes. These encompass Transparency, Justice, Fairness, Equity, Non- Maleficence, Responsibility, Accountability, Privacy, Beneficence, Freedom, Autonomy, Trust, Dignity, Sustainability, and Solidarity. These principles collectively serve as a guiding framework, directing the ethical path for the responsible development, deployment, and utilization of artificial intelligence (AI) technologies across diverse sectors and entities within the United States. The paper also discusses the revolutionary impact of AI applications, such as Machine Learning, and explores various approaches used to implement AI ethics. This examination is crucial to address the growing concerns surrounding the inherent risks associated with the widespread use of artificial intelligence.
Stepan Perminov, Ivan Kalinov, Dzmitry Tsetserukou
Numerous mobile robots with mounted Ultraviolet-C (UV-C) lamps were developed recently, yet they cannot work in the same space as humans without irradiating them by UV-C. This paper proposes a novel modular and scalable Human-Aware Genetic-based Coverage Path Planning algorithm (GHACPP), that aims to solve the problem of disinfecting of unknown environments by UV-C irradiation and preventing human eyes and skin from being harmed. The proposed genetic-based algorithm alternates between the stages of exploring a new area, generating parts of the resulting disinfection trajectory, called mini-trajectories, and updating the current state around the robot. The system performance in effectiveness and human safety is validated and compared with one of the latest state-of-the-art online coverage path planning algorithms called SimExCoverage-STC. The experimental results confirmed both the high level of safety for humans and the efficiency of the developed algorithm in terms of decrease of path length (by 37.1%), number (39.5%) and size (35.2%) of turns, and time (7.6%) to complete the disinfection task, with a small loss in the percentage of area covered (0.6%), in comparison with the state-of-the-art approach.
Mawloud Mosbah
Query, expressing the user need and requirement, has an important role, in an information retrieval system, for reaching a high accuracy search. In this paper, we present an overview of the different refinement operations that the query may undergo, in the sake to enhance performance of an information retrieval system, such as: automatic query formulation through words prevision, query reformulation, query expansion, and query optimization.
Muhammad Ghifari Ridwan, Thomas Altmann, Ahmed Yousry et al.
Efficient and reliable desalination through seawater reverse osmosis (SWRO) mandates optimized pre-treatment strategies to minimize organic and inorganic fouling. Coagulation, the process of agglomerating colloidal particles using chemical coagulants, in combination with media filtration to reduce colloidal fouling on reverse osmosis membranes is commonly used in seawater pretreatment. Due to its inherent complexity and the absence of physical models to quantify the efficiency of coagulation, overdosing of coagulants is ubiquitously observed to maintain filtered water quality. To address this problem, we use Artificial neural networks (ANNs) to optimize coagulant dosing by predicting the SDI after chemical dosing. The model is developed by using large-scale plant data comprising of different seawater physical parameters and plant operational data including pH, SDI, turbidity, coagulant dosing rate, and flocculant dosing rate. By using feature engineering, selection, and our domain knowledge, new input parameters are derived, irrelevant parameters are eliminated, and these are used as inputs to train the model. The developed ANNs model achieved a prediction accuracy of 95% also outperforms other machine learning methods, and upon industrialization it reduced annual coagulant consumption by 11.7% when implemented in a commercial SWRO plant producing 216,000 m3/day of desalinated water.
A.G. Nalimov, V.V. Kotlyar
A combined high-aperture metalens in a thin silicon nitride film, which consists of two inclined sector metalens, is considered. Each sector metalens consists of a set of binary subwavelength gratings. The diameter of the lens is 14 μm. It has been shown using time-domain finite difference method that the metalens can simultaneously detect optical vortices with two topological charges –1 and –2, in almost the entire visible wavelength range. The metalens can distinguish several wavelengths that are focused at different points in the focal plane: a 1 nm change in wavelength results in a focal spot shift of about 4 nm. When the metalens is illuminated by a Gaussian beam with left-handed circular polarization, two optical vortices with topological charges 1 and 2 are simultaneously formed at 6 nm between each other at focal distance equals 6 nm. This metalens can be used to increase information in transmission channel in wireless telecommunication systems by selecting the space-time modes of laser radiation with different topological charges and different wavelengths. The considered microlens is an example of a compact demultiplexer.
Xiaoyi Gu, Hongliang Ren
Robot-assisted technologies are being investigated to overcome the limitations of the current solutions for transoral surgeries, which suffer from constrained insertion ports, lengthy and indirect passageways, and narrow anatomical structures. This paper reviews distal dexterity mechanisms, variable stiffness mechanisms, and triangulation mechanisms, which are closely related to the specific technical challenges of transoral robotic surgery (TORS). According to the structure features in moving and orienting end effectors, the distal dexterity designs can be classified into 4 categories: serial mechanism, continuum mechanism, parallel mechanism, and hybrid mechanism. To ensure adequate adaptability, conformability, and safety, surgical robots must have high flexibility, which can be achieved by varying the stiffness. Variable stiffness (VS) mechanisms based on their working principles in TORS include phase-transition-based VS mechanism, jamming-based VS mechanism, and structure-based VS mechanism. Triangulations aim to obtain enough workspace and create adequate traction and counter traction for various operations, including visualization, retraction, dissection, and suturing, with independently controllable manipulators. The merits and demerits of these designs are discussed to provide a reference for developing new surgical robotic systems (SRSs) capable of overcoming the limitations of existing systems and addressing challenges imposed by TORS procedures.
Gongyang Li, Zhi Liu, Dan Zeng et al.
Salient object detection (SOD) in optical remote sensing images (RSIs), or RSI-SOD, is an emerging topic in understanding optical RSIs. However, due to the difference between optical RSIs and natural scene images (NSIs), directly applying NSI-SOD methods to optical RSIs fails to achieve satisfactory results. In this paper, we propose a novel Adjacent Context Coordination Network (ACCoNet) to explore the coordination of adjacent features in an encoder-decoder architecture for RSI-SOD. Specifically, ACCoNet consists of three parts: an encoder, Adjacent Context Coordination Modules (ACCoMs), and a decoder. As the key component of ACCoNet, ACCoM activates the salient regions of output features of the encoder and transmits them to the decoder. ACCoM contains a local branch and two adjacent branches to coordinate the multi-level features simultaneously. The local branch highlights the salient regions in an adaptive way, while the adjacent branches introduce global information of adjacent levels to enhance salient regions. Additionally, to extend the capabilities of the classic decoder block (i.e., several cascaded convolutional layers), we extend it with two bifurcations and propose a Bifurcation-Aggregation Block to capture the contextual information in the decoder. Extensive experiments on two benchmark datasets demonstrate that the proposed ACCoNet outperforms 22 state-of-the-art methods under nine evaluation metrics, and runs up to 81 fps on a single NVIDIA Titan X GPU. The code and results of our method are available at https://github.com/MathLee/ACCoNet.
Yujin WU, Mohamed Daoudi, Ali Amad et al.
Existing multimodal stress/pain recognition approaches generally extract features from different modalities independently and thus ignore cross-modality correlations. This paper proposes a novel geometric framework for multimodal stress/pain detection utilizing Symmetric Positive Definite (SPD) matrices as a representation that incorporates the correlation relationship of physiological and behavioural signals from covariance and cross-covariance. Considering the non-linearity of the Riemannian manifold of SPD matrices, well-known machine learning techniques are not suited to classify these matrices. Therefore, a tangent space mapping method is adopted to map the derived SPD matrix sequences to the vector sequences in the tangent space where the LSTM-based network can be applied for classification. The proposed framework has been evaluated on two public multimodal datasets, achieving both the state-of-the-art results for stress and pain detection tasks.
Toby St Clere Smithe
We extend our earlier work on the compositional structure of cybernetic systems in order to account for the embodiment of such systems. All their interactions proceed through their bodies' boundaries: sensations impinge on their surfaces, and actions correspond to changes in their configurations. We formalize this morphological perspective using polynomial functors. The 'internal universes' of systems are shown to constitute an indexed category of statistical games over polynomials; their dynamics form an indexed category of behaviours. We characterize 'active inference doctrines' as indexed functors between such categories, resolving a number of open problems in our earlier work, and pointing to a formalization of the 'free energy principle' as adjoint to such doctrines. We illustrate our framework through fundamental examples from biology, including homeostasis, morphogenesis, and autopoiesis, and suggest a formal connection between spatial navigation and the process of proof.
Fernando Alonso-Fernandez, Julian Fierrez, Daniel Ramos et al.
As biometric technology is increasingly deployed, it will be common to replace parts of operational systems with newer designs. The cost and inconvenience of reacquiring enrolled users when a new vendor solution is incorporated makes this approach difficult and many applications will require to deal with information from different sources regularly. These interoperability problems can dramatically affect the performance of biometric systems and thus, they need to be overcome. Here, we describe and evaluate the ATVS-UAM fusion approach submitted to the quality-based evaluation of the 2007 BioSecure Multimodal Evaluation Campaign, whose aim was to compare fusion algorithms when biometric signals were generated using several biometric devices in mismatched conditions. Quality measures from the raw biometric data are available to allow system adjustment to changing quality conditions due to device changes. This system adjustment is referred to as quality-based conditional processing. The proposed fusion approach is based on linear logistic regression, in which fused scores tend to be log-likelihood-ratios. This allows the easy and efficient combination of matching scores from different devices assuming low dependence among modalities. In our system, quality information is used to switch between different system modules depending on the data source (the sensor in our case) and to reject channels with low quality data during the fusion. We compare our fusion approach to a set of rule-based fusion schemes over normalized scores. Results show that the proposed approach outperforms all the rule-based fusion schemes. We also show that with the quality-based channel rejection scheme, an overall improvement of 25% in the equal error rate is obtained.
Halaman 18 dari 6725