X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. Inspired by feature selection methods in machine learning, a simple stepwise network expansion approach is employed that expands a single axis in each step, such that good accuracy to complexity trade-off is achieved. To expand X3D to a specific target complexity, we perform progressive forward expansion followed by backward contraction. X3D achieves state-of-the-art performance while requiring 4.8x and 5.5x fewer multiply-adds and parameters for similar accuracy as previous work. Our most surprising finding is that networks with high spatiotemporal resolution can perform well, while being extremely light in terms of network width and parameters. We report competitive accuracy at unprecedented efficiency on video classification and detection benchmarks. Code is available at: https://github.com/facebookresearch/SlowFast.
1275 sitasi
en
Computer Science
U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection
Xuebin Qin, Zichen Zhang, Chenyang Huang
et al.
Abstract In this paper, we design a simple yet powerful deep network architecture, U2-Net, for salient object detection (SOD). The architecture of our U2-Net is a two-level nested U-structure. The design has the following advantages: (1) it is able to capture more contextual information from different scales thanks to the mixture of receptive fields of different sizes in our proposed ReSidual U-blocks (RSU), (2) it increases the depth of the whole architecture without significantly increasing the computational cost because of the pooling operations used in these RSU blocks. This architecture enables us to train a deep network from scratch without using backbones from image classification tasks. We instantiate two models of the proposed architecture, U2-Net (176.3 MB, 30 FPS on GTX 1080Ti GPU) and U2-Net† (4.7 MB, 40 FPS), to facilitate the usage in different environments. Both models achieve competitive performance on six SOD datasets. The code is available: https://github.com/NathanUA/U-2-Net .
2083 sitasi
en
Computer Science
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck
Current speaker verification techniques rely on a neural network to extract speaker representations. The successful x-vector architecture is a Time Delay Neural Network (TDNN) that applies statistics pooling to project variable-length utterances into fixed-length speaker characterizing embeddings. In this paper, we propose multiple enhancements to this architecture based on recent trends in the related fields of face verification and computer vision. Firstly, the initial frame layers can be restructured into 1-dimensional Res2Net modules with impactful skip connections. Similarly to SE-ResNet, we introduce Squeeze-and-Excitation blocks in these modules to explicitly model channel interdependencies. The SE block expands the temporal context of the frame layer by rescaling the channels according to global properties of the recording. Secondly, neural networks are known to learn hierarchical features, with each layer operating on a different level of complexity. To leverage this complementary information, we aggregate and propagate features of different hierarchical levels. Finally, we improve the statistics pooling module with channel-dependent frame attention. This enables the network to focus on different subsets of frames during each of the channel's statistics estimation. The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art TDNN based systems on the VoxCeleb test sets and the 2019 VoxCeleb Speaker Recognition Challenge.
1843 sitasi
en
Computer Science, Engineering
Going Deeper in Spiking Neural Networks: VGG and Residual Architectures
Abhronil Sengupta, Yuting Ye, Robert Y. Wang
et al.
Over the past few years, Spiking Neural Networks (SNNs) have become popular as a possible pathway to enable low-power event-driven neuromorphic hardware. However, their application in machine learning have largely been limited to very shallow neural network architectures for simple problems. In this paper, we propose a novel algorithmic technique for generating an SNN with a deep architecture, and demonstrate its effectiveness on complex visual recognition problems such as CIFAR-10 and ImageNet. Our technique applies to both VGG and Residual network architectures, with significantly better accuracy than the state-of-the-art. Finally, we present analysis of the sparse event-driven computations to demonstrate reduced hardware overhead when operating in the spiking domain.
1174 sitasi
en
Computer Science, Medicine
Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models
Roman Klokov, V. Lempitsky
We present a new deep learning architecture (called Kdnetwork) that is designed for 3D model recognition tasks and works with unstructured point clouds. The new architecture performs multiplicative transformations and shares parameters of these transformations according to the subdivisions of the point clouds imposed onto them by kdtrees. Unlike the currently dominant convolutional architectures that usually require rasterization on uniform twodimensional or three-dimensional grids, Kd-networks do not rely on such grids in any way and therefore avoid poor scaling behavior. In a series of experiments with popular shape recognition benchmarks, Kd-networks demonstrate competitive performance in a number of shape recognition tasks such as shape classification, shape retrieval and shape part segmentation.
1020 sitasi
en
Computer Science
Resnet in Resnet: Generalizing Residual Architectures
S. Targ, Diogo Almeida, Kevin Lyman
Residual networks (ResNets) have recently achieved state-of-the-art on challenging computer vision tasks. We introduce Resnet in Resnet (RiR): a deep dual-stream architecture that generalizes ResNets and standard CNNs and is easily implemented with no computational overhead. RiR consistently improves performance over ResNets, outperforms architectures with similar amounts of augmentation on CIFAR-10, and establishes a new state-of-the-art on CIFAR-100.
1048 sitasi
en
Computer Science, Mathematics
Microservices: Yesterday, Today, and Tomorrow
N. Dragoni, Saverio Giallorenzo, Alberto Lluch-Lafuente
et al.
Microservices is an architectural style inspired by service-oriented computing that has recently started gaining popularity. Before presenting the current state-of-the-art in the field, this chapter reviews the history of software architecture, the reasons that led to the diffusion of objects and services first, and microservices later. Finally, open problems and future challenges are introduced. This survey primarily addresses newcomers to the discipline, while offering an academic viewpoint on the topic. In addition, we investigate some practical issues and point out some potential solutions.
1113 sitasi
en
Computer Science, Engineering
ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules.
Andrew Leaver-Fay, M. Tyka, Steven M. Lewis
et al.
1742 sitasi
en
Medicine, Computer Science
Research Commentary - The New Organizing Logic of Digital Innovation: An Agenda for Information Systems Research
Youngjin Yoo, O. Henfridsson, K. Lyytinen
2548 sitasi
en
Computer Science
Developing Multi-agent Systems with JADE
F. Bellifemine, A. Poggi, G. Rimassa
2701 sitasi
en
Computer Science
The Biofilm Matrix
D. Allison
2987 sitasi
en
Medicine, Chemistry
Best practices for convolutional neural networks applied to visual document analysis
P. Simard, David Steinkraus, John C. Platt
3032 sitasi
en
Computer Science
Protocols and Architectures for Wireless Sensor Networks
H. Karl, A. Willig
2625 sitasi
en
Computer Science
Automatically characterizing large scale program behavior
T. Sherwood, Erez Perelman, Greg Hamerly
et al.
1870 sitasi
en
Computer Science
CATH--a hierarchic classification of protein domain structures.
C. Orengo, A. Michie, S. Jones
et al.
2701 sitasi
en
Medicine, Biology
Transactional Memory: Architectural Support For Lock-free Data Structures
M. Herlihy
2584 sitasi
en
Computer Science
Resource-Constrained Edge AI Solution for Real-Time Pest and Disease Detection in Chili Pepper Fields
Hoyoung Chung, Jin-Hwi Kim, Junseong Ahn
et al.
This paper presents a low-cost, fully on-premise Edge Artificial Intelligence (AI) system designed to support real-time pest and disease detection in open-field chili pepper cultivation. The proposed architecture integrates AI-Thinker ESP32-CAM module (ESP32-CAM) image acquisition nodes (“Sticks”) with a Raspberry Pi 5–based edge server (“Module”), forming a plug-and-play Internet of Things (IoT) pipeline that enables autonomous operation upon simple power-up, making it suitable for aging farmers and resource-limited environments. A Leaf-First 2-Stage vision model was developed by combining YOLOv8n-based leaf detection with a lightweight ResNet-18 classifier to improve the diagnostic accuracy for small lesions commonly occurring in dense pepper foliage. To address network instability, which is a major challenge in open-field agriculture, the system adopted a dual-protocol communication design using Hyper Text Transfer Protocol (HTTP) for Joint Photographic Experts Group (JPEG) transmission and Message Queuing Telemetry Transport (MQTT) for event-driven feedback, enhanced by Redis-based asynchronous buffering and state recovery. Deployment-oriented experiments under controlled conditions demonstrated an average end-to-end latency of 0.86 s from image capture to Light Emitting Diode (LED) alert, validating the system’s suitability for real-time decision support in crop management. Compared to heavier models (e.g., YOLOv11 and ResNet-50), the lightweight architecture reduced the computational cost by more than 60%, with minimal loss in detection accuracy. This study highlights the practical feasibility of resource-constrained Edge AI systems for open-field smart farming by emphasizing system-level integration, robustness, and real-time operability, and provides a deployment-oriented framework for future extension to other crops.
UV-induced reorganization of 3D genome mediates DNA damage response
Veysel Oğulcan Kaya, Ogün Adebali
Abstract While it is well-established that UV radiation threatens genomic integrity, the precise mechanisms by which cells orchestrate DNA damage response and repair within the context of 3D genome architecture remain unclear. Here, we address this gap by investigating the UV-induced reorganization of the 3D genome and its critical role in mediating damage response. Employing temporal maps of contact matrices and transcriptional profiles, we illustrate the immediate and holistic changes in genome architecture post-irradiation, emphasizing the significance of this reconfiguration for effective DNA repair processes. We demonstrate that UV radiation triggers a comprehensive restructuring of the 3D genome organization at all levels, including loops, topologically associating domains and compartments. Through the analysis of DNA damage and excision repair maps, we uncover a correlation between genome folding, gene regulation, damage formation probability, and repair efficacy. We show that adaptive reorganization of the 3D genome is a key mediator of the damage response, providing new insights into the complex interplay of genomic structure and cellular defense mechanisms against UV-induced damage, thereby advancing our understanding of cellular resilience.
Safe Maneuvering, Efficient Navigation and Intelligent Management for Ships
Chunhui Zhou, Yixiong He, Liang Huang
Maritime transport, serving as the cornerstone of global supply chains, facilitates over 80% of international trade by volume [...]
Naval architecture. Shipbuilding. Marine engineering, Oceanography
High-Precision Coal Mine Microseismic P-Wave Arrival Picking via Physics-Constrained Deep Learning
Kai Qin, Zhigang Deng, Xiaohan Li
et al.
The automatic identification of P-wave arrival times in microseismic signals is crucial for the intelligent monitoring and early warning of dynamic hazards in coal mines. Traditional methods suffer from low accuracy and poor stability due to complex underground geological conditions and substantial noise interference. This paper proposes a microseismic P-wave arrival time automatic picking model that integrates physical constraints with a deep learning architecture. This study trained and optimized the model using a high-quality, manually labeled dataset. A systematic comparison with the AR picker algorithm and the short-term–long-term average ratio method revealed that this model achieved a precision of 96.60%, a recall of 90.59%, and an F1 score of 93.50% on the test set, with a P-wave arrival time-picking error of less than 20 ms. The average arrival time error was only 5.49 ms, significantly outperforming traditional methods. In cross-mining area generalization tests, the model performed excellently in two mining areas with consistent sampling frequencies (1000 Hz) and high signal-to-noise ratios, demonstrating good engineering transferability. However, its performance decreased in a mining area with a higher sampling rate and stronger noise, indicating its sensitivity to data acquisition parameters. This study developed a high-precision, robust, and potentially cross-domain adaptive model for automatically picking microseismic P-wave arrival times. This model provides support for the automation, precision, and intelligence of coal mine microseismic monitoring systems and has significant practical value in promoting real-time early warning and risk prevention for mine dynamic hazards.