A comparative study of fine-tuning deep learning models for plant disease identification
Edna Chebet Too, Li Yujian, Sam Njuki
et al.
Abstract Deep learning has recently attracted a lot of attention with the aim to develop a quick, automatic and accurate system for image identification and classification. In this work, the focus was on fine-tuning and evaluation of state-of-the-art deep convolutional neural network for image-based plant disease classification. An empirical comparison of the deep learning architecture is done. The architectures evaluated include VGG 16, Inception V4, ResNet with 50, 101 and 152 layers and DenseNets with 121 layers. The data used for the experiment is 38 different classes including diseased and healthy images of leafs of 14 plants from plantVillage. Fast and accurate models for plant disease identification are desired so that accurate measures can be applied early. Thus, alleviating the problem of food security. In our experiment, DenseNets has tendency’s to consistently improve in accuracy with growing number of epochs, with no signs of overfitting and performance deterioration. Moreover, DenseNets requires a considerably less number of parameters and reasonable computing time to achieve state-of-the-art performances. It achieves a testing accuracy score of 99.75% to beat the rest of the architectures. Keras with Theano backend was used to perform the training of the architectures.
1118 sitasi
en
Computer Science
Clinically applicable deep learning for diagnosis and referral in retinal disease
Jeffrey De Fauw, J. Ledsam, Bernardino Romera-Paredes
et al.
2293 sitasi
en
Computer Science, Medicine
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu
et al.
Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. In previous work, we have shown that such architectures are comparable to state-of-the-art ASR systems on dictation tasks, but it was not clear if such architectures would be practical for more challenging tasks such as voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural side, we show that word piece models can be used instead of graphemes. We also introduce a multi-head attention architecture, which offers improvements over the commonly-used single-head attention. On the optimization side, we explore synchronous training, scheduled sampling, label smoothing, and minimum word error rate optimization, which are all shown to improve accuracy. We present results with a unidirectional LSTM encoder for streaming recognition. On a 12, 500 hour voice search task, we find that the proposed changes improve the WER from 9.2% to 5.6%, while the best conventional system achieves 6.7%; on a dictation task our model achieves a WER of 4.1% compared to 5% for the conventional system.
1184 sitasi
en
Computer Science, Engineering
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks
Xiaohan Ding, Yuchen Guo, Guiguang Ding
et al.
As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, the research community is soliciting the architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve the performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels. For an off-the-shelf architecture, we replace the standard square-kernel convolutional layers with ACBs to construct an Asymmetric Convolutional Network (ACNet), which can be trained to reach a higher level of accuracy. After training, we equivalently convert the ACNet into the same original architecture, thus requiring no extra computations anymore. We have observed that ACNet can improve the performance of various models on CIFAR and ImageNet by a clear margin. Through further experiments, we attribute the effectiveness of ACB to its capability of enhancing the model's robustness to rotational distortions and strengthening the central skeleton parts of square convolution kernels.
841 sitasi
en
Computer Science
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo, Sangdoo Yun, Dongyoon Han
et al.
Vision Transformer (ViT) extends the application range of transformers from language processing to computer vision tasks as being an alternative architecture against the existing convolutional neural networks (CNN). Since the transformer-based architecture has been innovative for computer vision modeling, the design convention towards an effective architecture has been less studied yet. From the successful design principles of CNN, we investigate the role of spatial dimension conversion and its effectiveness on transformer-based architecture. We particularly attend to the dimension reduction principle of CNNs; as the depth increases, a conventional CNN increases channel dimension and decreases spatial dimensions. We empirically show that such a spatial dimension reduction is beneficial to a transformer architecture as well, and propose a novel Pooling-based Vision Transformer (PiT) upon the original ViT model. We show that PiT achieves the improved model capability and generalization performance against ViT. Throughout the extensive experiments, we further show PiT outperforms the baseline on several tasks such as image classification, object detection, and robustness evaluation. Source codes and ImageNet models are available at https://github.com/naver-ai/pit.
724 sitasi
en
Computer Science
High-performance bulk thermoelectrics with all-scale hierarchical architectures
K. Biswas, Jiaqing He, I. Blum
et al.
4157 sitasi
en
Medicine, Materials Science
Design Structure Matrix Methods and Applications
S. Eppinger, Tyson R. Browning
1007 sitasi
en
Engineering
Neural Network Ensembles
L. K. Hansen, P. Salamon
2619 sitasi
en
Computer Science
The Geology of Fluvial Deposits: Sedimentary Facies, Basin Analysis, and Petroleum Geology
A. Miall
Xylem Structure and the Ascent of Sap
M. Zimmermann
Stability and transparency in bilateral teleoperation
D. Lawrence
2220 sitasi
en
Engineering, Computer Science
Service-oriented computing: concepts, characteristics and directions
M. Papazoglou
1528 sitasi
en
Computer Science
Intelligence Without Reason
R. Brooks
2227 sitasi
en
Computer Science
The SPLASH-2 programs: characterization and methodological considerations
S. Woo, Moriyoshi Ohara, Evan Torrie
et al.
4257 sitasi
en
Computer Science
Optimizing polymorphic tomato picking detection: improved YOLOv8n architecture to tackle data under complex environments
Qiang Li, Jie Mao, Pengxin Zhao
et al.
IntroductionIn modern agriculture, tomatoes, as key economic crops, face challenges during harvesting due to complex growth environments; traditional object detection technologies are limited by performance and struggle to accurately identify and locate ripe and small-target tomatoes under leaf occlusion and uneven illumination.MethodsTo address these issues, this study sets YOLOv8n as the baseline model, focusing on improving it to enhance performance per tomato detection’s core needs. First, it analyzes YOLOv8n’s inherent bottlenecks in feature extraction and small-target recognition, then proposes targeted schemes: specifically, to boost feature extraction, a Space-to-Depth convolution module (SPD) is introduced by restructuring convolutional operations; to improve small-target detection, a dedicated small-target detection layer is added and integrated with the Parallelized Patch-Aware Attention mechanism (PPA); meanwhile, to balance performance and efficiency, a lightweight Slim-Neck structure and a self-developed Detect_CBAM detection head are adopted; finally, the Distance-Intersection over Union loss function (DIoU) optimizes gradient distribution during training. Experiments are conducted on the self-built “tomato_dataset” (7,160 images, divided into 5,008 for training, 720 for validation, 1,432 for testing) with evaluation metrics including bounding box precision, recall, mAP@0.5, mAP@0.5:0.95, Parameters, and FLOPS, and performance comparisons made with mainstream YOLO models (YOLOv5n, YOLOv6n, YOLOv8n), lightweight models (SSD-MobileNetv2, EfficientDet-D0), and two-stage algorithms (Faster R-CNN, Cascade R-CNN).ResultsResults show the improved model achieves 89.6% precision, 87.3% recall, 93.5% mAP@0.5, 58.6% mAP@0.5:0.95, significantly outperforming YOLOv8n and most comparative models, and the two-stage algorithms in both detection accuracy and efficiency.DiscussionIn conclusion, this study solves detection problems of ripe and small-target tomatoes in polymorphic environments, improves the model’s accuracy and robustness, provides reliable technical support for automated harvesting, and contributes to modern agricultural intelligent development.
Design in the Age of Predictive Architecture: From Digital Models to Parametric Code to Latent Space
José Carlos López Cervantes, Cintya Eva Sánchez Morales
Over the last three decades, architecture has undergone a sustained digital transformation that has progressively displaced the ontology of the geometric generator, understood here as the primary artefact through which form is produced, controlled, and legitimized. This paper argues that, within one extended digital epoch, three successive regimes have reconfigured architectural agency. First, a digital model regime, in which computer-generated 3D models become the main generators of geometry. Second, a parametric code regime, in which scripted relations and numerical parameters supersede the individual model as the core design object, defining a space of possibilities rather than a single instance. Third, an emerging latent regime, in which diffusion and transformer systems produce high plausibility synthetic images as image-first generators and subsequently impose a post hoc image-to-geometry translation requirement. To make this shifting paradigm comparable across time, the paper uses the blob as a stable morphological reference and develops a comparative reading of four blobs, Kiesler’s Endless House, Greg Lynn’s Embryological House, Marc Fornes’ Vaulted Willow, and an author-generated GenAI blob curated from a traceable AI image archive, to show how the geometric generator migrates from object, to model, to code, to latent image-space. As a pre-digital hinge case, Kiesler is selected not only for anticipating blob-like continuity, but for clarifying a recurrent disciplinary tension, “ form first generators” that precede tectonic and programmatic rationalization. The central hypothesis is that GenAI introduces an ontological shift not primarily at the level of style, but at the level of architectural judgement and evidentiary legitimacy. The project can begin with a predictive image that is visually convincing yet tectonically underdetermined. To name this condition, the paper proposes the plausibility gap, the mismatch between visual plausibility and tectonic intelligibility, as an operational criterion for evaluating image-first workflows, and for specifying the verification tasks required to stabilize them as architecture. Selection establishes evidentiary legitimacy, while a friction map and Gap Index externalize the translation pressure required to turn predictive imagery into accountable geometry, making the plausibility gap operational rather than merely asserted. The paper concludes by outlining implications for authorship, pedagogy, and disciplinary judgement in emerging multi-agent design ecologies.
Preventing Posterior Collapse with DVAE for Text Modeling
Tianbao Song, Zongyi Huang, Xin Liu
et al.
This paper introduces a novel variational autoencoder model termed DVAE to prevent posterior collapse in text modeling. DVAE employs a dual-path architecture within its decoder: path A and path B. Path A makes the direct input of text instances into the decoder, whereas path B replaces a subset of word tokens in the text instances with a generic unknown token before their input into the decoder. A stopping strategy is implemented, wherein both paths are concurrently active during the early phases of training. As the model progresses towards convergence, path B is removed. To further refine the performance, a KL weight dropout method is employed, which randomly sets certain dimensions of the KL weight to zero during the annealing process. DVAE compels the latent variables to encode more information about the input texts through path B and fully utilize the expressiveness of the decoder, as well as avoiding the local optimum when path B is active through path A and the stopping strategy. Furthermore, the KL weight dropout method augments the number of active units within the latent variables. Experimental results show the excellent performance of DVAE in density estimation, representation learning, and text generation.
Framework of Variables Impacting Form Generation in Parametric Architecture Using Computational Design Tools (Rhino-Grasshopper)
Shady Ayoub, Sherif El- Attar, Eman Shaqoor
et al.
There are many variables that, in turn, affect the formation in architecture and architectural form, and there are various trends with different bases for generating the form. The study aims to formulate a matrix containing many variables for parametric form generation. The research methodology is divided into three approaches: the first approach is a literature review, the second approach is an analysis, and the last one is a case study. The matrix contains architectural style, formal characteristics, mathematical parameters, parametric tools, generation method, and application using Rhino/Grasshopper. The process of designing a building's exterior begins with identifying influencing factors, such as spatial context, programming needs, and parametric variables, followed by classifying and describing parameters, including geometric dimensions and their roles. Possible values for each parameter are determined to set design boundaries. Parametric tools like Rhino and Grasshopper are then integrated to facilitate modeling, simulation, and iterative exploration of design alternatives. While performance simulations evaluate factors like energy efficiency and thermal comfort. The design undergoes iterative improvements until it meets the project's specific goals and requirements, ensuring a systematic and efficient approach to achieving an optimal exterior design. The conclusion is that the proposed matrix can evaluate the generated forms to select a more suitable form based on Height, Number of Floors, Number of Faces, Bending Angle, and Rotation Angle, according to the matrix variables.
Engineering (General). Civil engineering (General)
Improvement scheme of OSS intelligence capability for autonomous network
ZHAO Yongjian, ZHAO Zhanchun, ZHANG Ding
et al.
Driven by both business and technology, the communication industry is focused on smart network operations. It is an industry consensus to promote network intelligent operation with autonomous networks as the driving force. It was analyzed that operational support system (OSS) was the core component of the three-layer architecture of autonomous network, and the key to improve the level of autonomous networks was to enhance the intelligence capability of OSS. Specific implementation schemes such as the definition method of OSS product business scope, the analysis of the shortcoming of OSS product ability,and systematic improvement of OSS product capabilities driven by autonomous networks were elaborated. The solution was described in detail using the digital operation value scenario of broadband services as an example. Finally, the intelligent ability map of OSS driven by autonomous network was discussed. Improving the intelligence capability of OSS for autonomous network can effectively promote the research and development direction of OSS, and guide the planning and research and development of OSS products for operators.
Telecommunication, Technology
Enhancing Emergency Vehicle Detection: A Deep Learning Approach with Multimodal Fusion
Muhammad Zohaib, Muhammad Asim, Mohammed ELAffendi
Emergency vehicle detection plays a critical role in ensuring timely responses and reducing accidents in modern urban environments. However, traditional methods that rely solely on visual cues face challenges, particularly in adverse conditions. The objective of this research is to enhance emergency vehicle detection by leveraging the synergies between acoustic and visual information. By incorporating advanced deep learning techniques for both acoustic and visual data, our aim is to significantly improve the accuracy and response times. To achieve this goal, we developed an attention-based temporal spectrum network (ATSN) with an attention mechanism specifically designed for ambulance siren sound detection. In parallel, we enhanced visual detection tasks by implementing a Multi-Level Spatial Fusion YOLO (MLSF-YOLO) architecture. To combine the acoustic and visual information effectively, we employed a stacking ensemble learning technique, creating a robust framework for emergency vehicle detection. This approach capitalizes on the strengths of both modalities, allowing for a comprehensive analysis that surpasses existing methods. Through our research, we achieved remarkable results, including a misdetection rate of only 3.81% and an accuracy of 96.19% when applied to visual data containing emergency vehicles. These findings represent significant progress in real-world applications, demonstrating the effectiveness of our approach in improving emergency vehicle detection systems.