YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors
Chien-Yao Wang, Alexey Bochkovskiy, H. Liao
Real-time object detection is one of the most important research topics in computer vision. As new approaches regarding architecture optimization and training optimization are continually being developed, we have found two research topics that have spawned when dealing with these latest state-of-the-art methods. To address the topics, we propose a trainable bag-of-freebies oriented solution. We combine the flexible and efficient training tools with the proposed architecture and the compound scaling method. YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 120 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. Source code is released in https://github.com/WongKinYiu/yolov7.
9907 sitasi
en
Computer Science
Generative Adversarial Networks
I. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza
et al.
Generative Adversarial Networks (GANs) are a type of deep learning techniques that have shown remarkable success in generating realistic images, videos, and other types of data. This paper provides a comprehensive guide to GANs, covering their architecture, loss functions, training methods, applications, evaluation metrics, challenges, and future directions. We begin with an introduction to GANs and their historical development, followed by a review of the background and related work. We then provide a detailed overview of the GAN architecture, including the generator and discriminator networks, and discuss the key design choices and variations. Next, we review the loss functions utilized in GANs, including the original minimax objective, as well as more recent approaches s.a. Wasserstein distance and gradient penalty. We then delve into the training of GANs, discussing common techniques s.a. alternating optimization, minibatch discrimination, and spectral normalization. We also provide a survey of the various applications of GANs across domains. In addition, we review the evaluation metrics utilized to assess the diversity and quality of GAN-produced data. Furthermore, we discuss the challenges and open issues in GANs, including mode collapse, training instability, and ethical considerations. Finally, we provide a glimpse into the future directions of GAN research, including improving scalability, developing new architectures, incorporating domain knowledge, and exploring new applications. Overall, this paper serves as a comprehensive guide to GANs, providing both theoretical and practical insights for researchers and practitioners in the field.
30460 sitasi
en
Engineering, Computer Science
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke, Sam Gross, Francisco Massa
et al.
Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it was designed from first principles to support an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several commonly used benchmarks.
50978 sitasi
en
Computer Science, Mathematics
Analyzing and Improving the Image Quality of StyleGAN
Tero Karras, S. Laine, M. Aittala
et al.
The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.
6858 sitasi
en
Computer Science, Engineering
Improving Language Understanding by Generative Pre-Training
Alec Radford, Karthik Narasimhan
14531 sitasi
en
Computer Science
Attention is All you Need
Ashish Vaswani, Noam Shazeer, Niki Parmar
et al.
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
172423 sitasi
en
Computer Science
Intel SGX Explained
Victor Costan, S. Devadas
2104 sitasi
en
Computer Science
ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars
A. Shafiee, Anirban Nag, N. Muralimanohar
et al.
A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (dot-product) operations. A recent project, DaDianNao, adopts a near data processing approach, where a specialized neural functional unit performs all the digital arithmetic operations and receives input weights from adjacent eDRAM banks. This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner. While the use of crossbar memory as an analog dot-product engine is well known, no prior work has designed or characterized a full-fledged accelerator based on crossbars. In particular, our work makes the following contributions: (i) We design a pipelined architecture, with some crossbars dedicated for each neural network layer, and eDRAM buffers that aggregate data between pipeline stages. (ii) We define new data encoding techniques that are amenable to analog computations and that can reduce the high overheads of analog-to-digital conversion (ADC). (iii) We define the many supporting digital components required in an analog CNN accelerator and carry out a design space exploration to identify the best balance of memristor storage/compute, ADCs, and eDRAM storage on a chip. On a suite of CNN and DNN workloads, the proposed ISAAC architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density (respectively), relative to the state-of-the-art DaDianNao architecture.
1915 sitasi
en
Computer Science
CDD/SPARCLE: functional classification of proteins via subfamily domain architectures
Aron Marchler-Bauer, Bo Yu, Lianyi Han
et al.
NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints. An archive of pre-computed domain annotation is maintained for proteins tracked by NCBI's Entrez database, and live search services are offered as well. CDD curation staff supplements a comprehensive collection of protein domain and protein family models, which have been imported from external providers, with representations of selected domain families that are curated in-house and organized into hierarchical classifications of functionally distinct families and sub-families. CDD also supports comparative analyses of protein families via conserved domain architectures, and a recent curation effort focuses on providing functional characterizations of distinct subfamily architectures using SPARCLE: Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.
2444 sitasi
en
Medicine, Biology
Error bounds for approximations with deep ReLU networks
D. Yarotsky
We study expressive power of shallow and deep neural networks with piece-wise linear activation functions. We establish new rigorous upper and lower bounds for the network complexity in the setting of approximations in Sobolev spaces. In particular, we prove that deep ReLU networks more efficiently approximate smooth functions than shallow networks. In the case of approximations of 1D Lipschitz functions we describe adaptive depth-6 network architectures more efficient than the standard shallow architecture.
1440 sitasi
en
Computer Science, Mathematics
Convolutional Two-Stream Network Fusion for Video Action Recognition
Christoph Feichtenhofer, A. Pinz, Andrew Zisserman
Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information. We study a number of ways of fusing ConvNet towers both spatially and temporally in order to best take advantage of this spatio-temporal information. We make the following findings: (i) that rather than fusing at the softmax layer, a spatial and temporal network can be fused at a convolution layer without loss of performance, but with a substantial saving in parameters, (ii) that it is better to fuse such networks spatially at the last convolutional layer than earlier, and that additionally fusing at the class prediction layer can boost accuracy, finally (iii) that pooling of abstract convolutional features over spatiotemporal neighbourhoods further boosts performance. Based on these studies we propose a new ConvNet architecture for spatiotemporal fusion of video snippets, and evaluate its performance on standard benchmarks where this architecture achieves state-of-the-art results.
2749 sitasi
en
Computer Science
Bilinear CNN Models for Fine-Grained Visual Recognition
Tsung-Yu Lin, Aruni RoyChowdhury, Subhransu Maji
We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an image descriptor. This architecture can model local pairwise feature interactions in a translationally invariant manner which is particularly useful for fine-grained categorization. It also generalizes various orderless texture descriptors such as the Fisher vector, VLAD and O2P. We present experiments with bilinear models where the feature extractors are based on convolutional neural networks. The bilinear form simplifies gradient computation and allows end-to-end training of both networks using image labels only. Using networks initialized from the ImageNet dataset followed by domain specific fine-tuning we obtain 84.1% accuracy of the CUB-200-2011 dataset requiring only category labels at training time. We present experiments and visualizations that analyze the effects of fine-tuning and the choice two networks on the speed and accuracy of the models. Results show that the architecture compares favorably to the existing state of the art on a number of fine-grained datasets while being substantially simpler and easier to train. Moreover, our most accurate model is fairly efficient running at 8 frames/sec on a NVIDIA Tesla K40 GPU. The source code for the complete system will be made available at http://vis-www.cs.umass.edu/bcnn.
2032 sitasi
en
Computer Science
Named data networking
Lixia Zhang, Alexander Afanasyev, Jeffrey Burke
et al.
Named Data Networking (NDN) is one of five projects funded by the U.S. National Science Foundation under its Future Internet Architecture Program. NDN has its roots in an earlier project, Content-Centric Networking (CCN), which Van Jacobson first publicly presented in 2006. The NDN project investigates Jacobson's proposed evolution from today's host-centric network architecture (IP) to a data-centric network architecture (NDN). This conceptually simple shift has far-reaching implications for how we design, develop, deploy, and use networks and applications. We describe the motivation and vision of this new architecture, and its basic components and operations. We also provide a snapshot of its current design, development status, and research challenges. More information about the project, including prototype implementations, publications, and annual reports, is available on named-data.net.
2181 sitasi
en
Biology, Computer Science
Long short-term memory recurrent neural network architectures for large scale acoustic modeling
Hasim Sak, A. Senior, F. Beaufays
2543 sitasi
en
Computer Science
SlideMamba: entropy-based adaptive fusion of GNN and Mamba for enhanced representation learning in digital pathology
Shakib Khan, Fariba Dambandkhameneh, Nazim Shaikh
et al.
Abstract Whole–slide image (WSI) analysis requires integrating fine-grained spatial structure with long-range tissue context. This work introduces SlideMamba, a hybrid framework that performs embedding-level fusion of a graph neural network (capturing local topology) and a Mamba state-space branch (modeling global context) via entropy-based confidence weighting. The adaptive fusion emphasizes the branch with lower predictive entropy, providing a principled mechanism to combine complementary feature streams and improving multi-scale representation learning. Effectiveness is demonstrated on two clinically relevant tasks with class imbalance: (i) mutation/fusion prediction from the OAK clinical trial WSIs (40 $$\times$$ ), where SlideMamba attains PRAUC $$0.740 \pm 0.033$$ , exceeding fixed-fusion (GAT-Mamba $$0.632 \pm 0.015$$ ) and single-branch baselines (Mamba $$0.630 \pm 0.015$$ , SlideGraph+ $$0.730 \pm 0.026$$ , MIL $$0.502 \pm 0.039$$ , TransMIL $$0.390 \pm 0.016$$ ); and (ii) LUAD vs. LUSC classification on an independent proprietary cohort (20 $$\times$$ ), where SlideMamba achieves PRAUC of $$0.969 \pm 0.015$$ , outperforming MIL (0.946 ± 0.037), TransMIL (0.929 ± 0.033), SlideGraph+ (0.945 ± 0.025), GAT-Mamba (0.935 ± 0.011), Mamba (0.962 ± 0.012). Beyond performance gains, the inclusion of the Mamba backbone ensures computational efficiency by avoiding the quadratic complexity of standard attention mechanisms. Furthermore, the adaptive fusion weights provide inherent interpretability, offering clinicians insight into whether local cellular graphs or global tissue architecture drove the final prediction. These attributes suggest SlideMamba offers a clinically feasible path toward spatially-resolved, precision computational pathology.
Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks
R. Jacobs, Michael I. Jordan, A. Barto
635 sitasi
en
Computer Science
Study on the Bearing Characteristics of a Novel Inner Support Structure for Deep Foundation Pits Based on Full-Scale Experiments
Xingui Zhang, Jianhang Liang, Gang Wei
et al.
Traditional internal support systems for deep foundation pits often suffer from issues such as insufficient stiffness, excessive displacement, and large support areas. To address these problems, the authors developed a novel spatial steel joist internal support system. Based on a large-scale field model test, this study investigates the bearing characteristics of the proposed system in deep foundation pits. A stiffness formulation for the novel support system was analytically derived and experimentally validated through a calibrated finite element model. After validation with test results, the effects of different vertical prestressing forces on the structure were analyzed. The results indicate that the proposed system provides significant support in deep foundation pits. The application of both horizontal and vertical prestressing increases the internal forces within the support structure while reducing overall displacement. The numerical predictions of horizontal displacement, bending moment, and the axial force distribution of the main support, as well as their development trends, align well with the model test results. Moreover, increasing the prestressing force of the steel tie rods effectively controls the deformation of the vertical arch support and enhances the stability of the spatial structure. The derived stiffness formula shows a small error compared with the finite element results, demonstrating its high accuracy. Furthermore, the diagonal support increases the stiffness of the lower chord bar support by 28.24%.
Non-Destructive Monitoring of External Quality of Date Palm Fruit (<i>Phoenix dactylifera</i> L.) During Frozen Storage Using Digital Camera and Flatbed Scanner
Younes Noutfia, Ewa Ropelewska, Zbigniew Jóźwiak
et al.
The emergence of new technologies focusing on “computer vision” has contributed significantly to the assessment of fruit quality. In this study, an innovative approach based on image analysis was used to assess the external quality of fresh and frozen ‘Mejhoul’ and ‘Boufeggous’ date palm cultivars stored for 6 months at −10 °C and −18 °C. Their quality was evaluated, in a non-destructive manner, based on texture features extracted from images acquired using a digital camera and flatbed scanner. The whole process of image processing was carried out using MATLAB R2024a and Q-MAZDA 23.10 software. Then, extracted features were used as inputs for pre-established algorithms–groups within WEKA 3.9 software to classify frozen date fruit samples after 0, 2, 4, and 6 months of storage. Among 599 features, only 5 to 36 attributes were selected as powerful predictors to build desired classification models based on the “Functions-Logistic” classifier. The general architecture exhibited clear differences in classification accuracy depending mainly on the frozen storage period and imaging device. Accordingly, confusion matrices showed high classification accuracy (CA), which could reach 0.84 at M0 for both cultivars at the two frozen storage temperatures. This CA indicated a remarkable decrease at M2 and M4 before re-increasing by M6, confirming slight changes in external quality before the end of storage. Moreover, the developed models on the basis of flatbed scanner use allowed us to obtain a high correctness rate that could attain 97.7% in comparison to the digital camera, which did not exceed 85.5%. In perspectives, physicochemical attributes can be added to developed models to establish correlation with image features and predict the behavior of date fruit under storage.
Applications of Shaped-Charge Learning
Boris Galitsky
It is well known that deep learning (DNN) has strong limitations due to a lack of explainability and weak defense against possible adversarial attacks. These attacks would be a concern for autonomous teams producing a state of high entropy for the team’s structure. In our first article for this Special Issue, we propose a <i>meta-learning/DNN</i> → <i>kNN</i> architecture that overcomes these limitations by integrating deep learning with explainable nearest neighbor learning (kNN). This architecture is named “shaped charge”. The focus of the current article is the empirical validation of “shaped charge”. We evaluate the proposed architecture for summarization, question answering, and content creation tasks and observe a significant improvement in performance along with enhanced usability by team members. We observe a substantial improvement in question answering accuracy and also the truthfulness of the generated content due to the application of the shaped-charge learning approach.
Identification of a genomic DNA sequence that quantitatively modulates KLF1 transcription factor expression in differentiating human hematopoietic cells
M. N. Gnanapragasam, A. Planutis, J. A. Glassberg
et al.
Abstract The onset of erythropoiesis is under strict developmental control, with direct and indirect inputs influencing its derivation from the hematopoietic stem cell. A major regulator of this transition is KLF1/EKLF, a zinc finger transcription factor that plays a global role in all aspects of erythropoiesis. Here, we have identified a short, conserved enhancer element in KLF1 intron 1 that is important for establishing optimal levels of KLF1 in mouse and human cells. Chromatin accessibility of this site exhibits cell-type specificity and is under developmental control during the differentiation of human CD34+ cells towards the erythroid lineage. This site binds GATA1, SMAD1, TAL1, and ETV6. In vivo editing of this region in cell lines and primary cells reduces KLF1 expression quantitatively. However, we find that, similar to observations seen in pedigrees of families with KLF1 mutations, downstream effects are variable, suggesting that the global architecture of the site is buffered towards keeping the KLF1 genetic region in an active state. We propose that modification of intron 1 in both alleles is not equivalent to complete loss of function of one allele.