Small object detection remains a challenging task due to limited pixel resolution, complex backgrounds, and high sensitivity to bounding box variations in aerial images. Although Detection Transformer (DETR)-based methods have made progress, they still face significant limitations in small object detection, primarily due to their reliance on global features, which fail to capture fine-grained details and are sensitive to background noise and bounding box variations. This study proposes Progressive Fusion (PF)-DETR, a model specifically designed to refine small object detection through progressive feature fusion techniques. Central to our approach is the Cross-Scale Feature Fusion with S2 (S2-CCFF) module, which integrates multi-level features with an S2 layer to preserve small object details. Coupled with SPace-to-Depth convolution (SPDConv) downsampling, this module reduces computational cost while maintaining critical information. Additionally, Cross Stage Partial Omni-Kernel Fusion (CSPOK-Fusion) Module achieves progressive fusion by gradually integrating multi-scale features from local, large, and global branches through successive convolutional layers, effectively refining the feature representation at each stage, mitigating background interference and occlusion effects to enhance cross-scale spatial representation. We further introduce a Parallelized Patch-Aware (PPA) attention module in the Backbone network to prioritize small object features, significantly addressing information loss. Finally, Normalized Wasserstein Distance (NWD) loss function is incorporated to heighten robustness against minor localization errors by aligning bounding box positioning and shape, thus boosting detection accuracy. Experimental results on the VisDrone and NWPU VHR-10 datasets revealed that PF-DETR surpasses existing state-of-the-art methods, establishing its effectiveness and adaptability in complex aerial detection tasks.
Container startup latency is a critical performance metric for CI/CD pipelines, serverless computing, and auto-scaling systems, yet practitioners lack empirical guidance on how infrastructure choices affect this latency. We present a systematic measurement study that decomposes Docker container startup into constituent operations across three heterogeneous infrastructure tiers: Azure Premium SSD (cloud SSD), Azure Standard HDD (cloud HDD), and macOS Docker Desktop (developer workstation with hypervisor-based virtualization). Using a reproducible benchmark suite that executes 50 iterations per test across 10 performance dimensions, we quantify previously under-characterized relationships between infrastructure configuration and container runtime behavior. Our key findings include: (1) container startup is dominated by runtime overhead rather than image size, with only 2.5% startup variation across images ranging from 5 MB to 155 MB on SSD; (2) storage tier selection imposes a 2.04x startup penalty (HDD 1157 ms vs. SSD 568 ms); (3) Docker Desktop's hypervisor layer introduces a 2.69x startup penalty and 9.5x higher CPU throttling variance compared to native Linux; (4) OverlayFS write performance collapses by up to two orders of magnitude compared to volume mounts on SSD-backed storage; and (5) Linux namespace creation contributes only 8-10 ms (<1.5%) of total startup time. All measurement scripts, raw data, and analysis tools are publicly available.
Taking the domain of polar questions in Bosnian/Croatian/Montenegrin/Serbian (BCMS) as the empirical background, the paper probes into the syntax–phonology (CS-PF) interface and discusses insertion and movement as PF-repair strategies mitigating against the lack of convergence at PF. Contra previous accounts, the analysis treats li (lexicalization of Q) as a ‘run-of-the-mill’ 2P clitic in BCMS, whose host cannot always be provided by syntax. I provide evidence against Prosodic Inversion—‘the usual suspect’ for post-syntactic movement in Slavic—thus adding to the body of evidence that Prosodic Inversion does not take place in BCMS. I argue that the PF Movement in such cases has to be raising and adopt Local Dislocation to account for them. Probing into the interaction between Future I and polar questions provides further insights into the ordering of PF Movement operations in BCMS.
ppOpen-AT is a domain-specific language designed to ease the workload for developers creating libraries with auto-tuning (AT) capabilities. It consists of a set of directives that allow for the automatic generation of code necessary for AT by placing annotations in the source program. This approach significantly reduces the effort required by numerical library developers. This technical report details the implementation of the AT software and its extended functions, and provides an explanation of the internal specifications of ppOpen-AT.
Fraser Young, Rachel Mason, Rosie E. Morris
et al.
Walking/gait quality is a useful clinical tool to assess general health and is now broadly described as the sixth vital sign. This has been mediated by advances in sensing technology, including instrumented walkways and three-dimensional motion capture. However, it is wearable technology innovation that has spawned the highest growth in instrumented gait assessment due to the capabilities for monitoring within and beyond the laboratory. Specifically, instrumented gait assessment with wearable inertial measurement units (IMUs) has provided more readily deployable devices for use in any environment. Contemporary IMU-based gait assessment research has shown evidence of the robust quantifying of important clinical gait outcomes in, e.g., neurological disorders to gather more insightful habitual data in the home and community, given the relatively low cost and portability of IMUs. The aim of this narrative review is to describe the ongoing research regarding the need to move gait assessment out of bespoke settings into habitual environments and to consider the shortcomings and inefficiencies that are common within the field. Accordingly, we broadly explore how the Internet of Things (IoT) could better enable routine gait assessment beyond bespoke settings. As IMU-based wearables and algorithms mature in their corroboration with alternate technologies, such as computer vision, edge computing, and pose estimation, the role of IoT communication will enable new opportunities for remote gait assessment.
Running gait assessment is essential for the development of technical optimization strategies as well as to inform injury prevention and rehabilitation. Currently, running gait assessment relies on (i) visual assessment, exhibiting subjectivity and limited reliability, or (ii) use of instrumented approaches, which often carry high costs and can be intrusive due to the attachment of equipment to the body. Here, the use of an IoT-enabled markerless computer vision smartphone application based upon Google’s pose estimation model BlazePose was evaluated for running gait assessment for use in low-resource settings. That human pose estimation architecture was used to extract contact time, swing time, step time, knee flexion angle, and foot strike location from a large cohort of runners. The gold-standard Vicon 3D motion capture system was used as a reference. The proposed approach performs robustly, demonstrating good (ICC(2,1) > 0.75) to excellent (ICC(2,1) > 0.90) agreement in all running gait outcomes. Additionally, temporal outcomes exhibit low mean error (0.01–0.014 s) in left foot outcomes. However, there are some discrepancies in right foot outcomes, due to occlusion. This study demonstrates that the proposed low-cost and markerless system provides accurate running gait assessment outcomes. The approach may help routine running gait assessment in low-resource environments.
This paper investigates the use of OpenMP for parallel post processing in obejct detection on personal Android devices, where resources like computational power, memory, and battery are limited. Specifically, it explores various configurations of thread count, CPU affinity, and chunk size on a Redmi Note 10 Pro with an ARM Cortex A76 CPU. The study finds that using four threads offers a maximum post processing speedup of 2.3x but increases overall inference time by 2.7x. A balanced configuration of two threads achieves a 1.8x speedup in post processing and a 2% improvement in overall program performance.
Background: Turning is a complex measure of gait that accounts for over 50% of daily steps. Traditionally, turning has been measured in a research grade laboratory setting, however, there is demand for a low-cost and portable solution to measure turning using wearable technology. This study aimed to determine the suitability of a low-cost inertial sensor-based device (AX6, Axivity) to assess turning, by simultaneously capturing and comparing to a turn algorithm output from a previously validated reference inertial sensor-based device (Opal), in healthy young adults. Methodology: Thirty participants (aged 23.9 ± 4.89 years) completed the following turning protocol wearing the AX6 and reference device: a turn course, a two-minute walk (including 180° turns) and turning in place, alternating 360° turn right and left. Both devices were attached at the lumbar spine, one Opal via a belt, and the AX6 via double sided tape attached directly to the skin. Turning measures included number of turns, average turn duration, angle, velocity, and jerk. Results: Agreement between the outcomes from the AX6 and reference device was good to excellent for all turn characteristics (all ICCs > 0.850) during the turning 360° task. There was good agreement for all turn characteristics (all ICCs > 0.800) during the two-minute walk task, except for moderate agreement for turn angle (ICC 0.683). Agreement for turn outcomes was moderate to good during the turns course (ICCs range; 0.580 to 0.870). Conclusions: A low-cost wearable sensor, AX6, can be a suitable and fit-for-purpose device when used with validated algorithms for assessment of turning outcomes, particularly during continuous turning tasks. Future work needs to determine the suitability and validity of turning in aging and clinical cohorts within low-resource settings.
This note considers the problem of statistical inference of the parameters of the input process to a queue from periodic workload observations. The main focus is the open problem of constructing statistically efficient estimators for a given observation scheme, in the sense of minimizing the asymptotic variance of the estimation error.
Interacting networks are different in nature to single networks. The study of queuing processes on interacting networks is underdeveloped. It presents new mathematical challenges and is of importance to applications. This area of operations research deserves careful study: queuing theory needs to incorporate high-order network interactions in the performance analysis of a queuing system.
With the booming of next generation sequencing technology and its implementation in clinical practice and life science research, the need for faster and more efficient data analysis methods becomes pressing in the field of sequencing. Here we report on the evaluation of an optimized germline mutation calling pipeline, HummingBird, by assessing its performance against the widely accepted BWA-GATK pipeline. We found that the HummingBird pipeline can significantly reduce the running time of the primary data analysis for whole genome sequencing and whole exome sequencing while without significantly sacrificing the variant calling accuracy. Thus, we conclude that expansion of such software usage will help to improve the primary data analysis efficiency for next generation sequencing.
We present an approach that can be useful when the network or system performance is described by a model that is not Markovian. Although most performance models are based on Markov chains or Markov processes, in some cases the Markov property does not hold. This can occur, for example, when the system exhibits long range dependencies. For such situations, and other non-Markovian cases, our method may provide useful help.
In this paper, we prove that under mild stochastic assumptions, work-conserving disciplines are asymptotic optimal for minimizing total completion time. As a byproduct of our analysis, we obtain tight upper bound on the competitive ratios of work-conserving disciplines on minimizing the metric of flow time.
Data serialization is a common and crucial component in high performance computing. In this paper, I present a C++11 based serialization library for performance critical systems. It provides an interface similar to Boost but up to 150% faster and beats several popular serialization libraries.