Video sharing platforms (VSPs) have become central information hubs but also facilitate the spread of information disorder, from misleading narratives to fabricated content. This survey synthesizes research on VSPs' multimedia ecosystems across three dimensions: (1) types of information disorder, (2) methodological approaches, and (3) platform features. We conclude by identifying key challenges and open questions for future research.
Luca Rossetto, Klaus Schoeffmann, Cathal Gurrin
et al.
This report presents the results of the 13th Video Browser Showdown, held at the 2024 International Conference on Multimedia Modeling on the 29th of January 2024 in Amsterdam, the Netherlands.
While traditional video representations are organized around discrete image frames, event-based video is a new paradigm that forgoes image frames altogether. Rather, pixel samples are temporally asynchronous and independent of one another. Until now, researchers have lacked a cohesive software framework for exploring the representation, compression, and applications of event-based video. I present the AD$Δ$ER software suite to fill this gap. This framework includes utilities for transcoding framed and multimodal event-based video sources to a common representation, rate control mechanisms, lossy compression, application support, and an interactive GUI for transcoding and playback. In this paper, I describe these various software components and their usage.
Rehana Mahfuz, Yinyi Guo, Arvind Krishna Sridhar
et al.
Metrics to evaluate audio captions simply provide a score without much explanation regarding what may be wrong in case the score is low. Manual human intervention is needed to find any shortcomings of the caption. In this work, we introduce a metric which automatically identifies the shortcomings of an audio caption by detecting the misses and false alarms in a candidate caption with respect to a reference caption, and reports the recall, precision and F-score. Such a metric is very useful in profiling the deficiencies of an audio captioning model, which is a milestone towards improving the quality of audio captions.
Inferring the quality of network services is the vital basis of optimization for network operators. However, prevailing real-time video streaming applications adopt encryption for security, leaving it a problem to extract Quality of Service (QoS) indicators of real-time video. In this paper, we propose DaI, a traffic-based real-time video quality estimator. DaI can partially decrypt the encrypted real-time video data and applies machine learning methods to estimate key objective Quality of Experience (QoE) metrics of real-time video. According to the experimental results, DaI can estimate objective QoE metrics with an average accuracy of 79%.
In this paper, we investigate whether robust hashing has a possibility to robustly detect fake-images even when multiple manipulation techniques such as JPEG compression are applied to images for the first time. In an experiment, the proposed fake detection with robust hashing is demonstrated to outperform state-of-the-art one under the use of various datasets including fake images generated with GANs.
Multimedia related research and development has evolved rapidly in the last few years with advancements in hardware, software and network infrastructures. As a result, multimedia has been integrated into domains like Healthcare and Medicine, Human facial feature extraction and tracking, pose recognition, disparity estimation, etc. This survey gives an overview of the various multimedia technologies and algorithms developed in the domains mentioned.
Due to the excellent compression and high real-time performance, x265 is widely used in practical applications. Combined with CU-tree based pre-analysis, x265 rate control can obtain high rate-distortion (R-D) performance. However, the pre-analysis information is not fully utilized, and the accuracy of rate control is not satisfactory in x265 because of an empirical linear model. In this paper, we propose an improved cost-guided rate control scheme for x265. Firstly, the pre-analysis information is further used to refine the bit allocation. Secondly, CU-tree is combined with the lambda-domain model for more accurate rate control and higher R-D performance. Experimental results show that compared with the original x265, our method can achieve 10.3\% BD-rate gain with only 0.22\textperthousand bitrate error.
Lucjan Janowski, Jakub Nawała, Werner Robitza
et al.
It is believed that consistent notation helps the research community in many ways. First and foremost, it provides a consistent interface of communication. Subjective experiments described according to uniform rules are easier to understand and analyze. Additionally, a comparison of various results is less complicated. In this publication we describe notation proposed by VQEG (Video Quality Expert Group) working group SAM (Statistical Analysis and Methods).
This short paper provides further details of the Sloth Search System, which was developed by the NECTEC team for the Video Browser Showdown (VBS) 2018.
Jean-Marc Valin, Nathan E. Egge, Thomas Daede
et al.
Daala is a new royalty-free video codec based on perceptually-driven coding techniques. We explore using its keyframe format for still picture coding and show how it has improved over the past year. We believe the technology used in Daala could be the basis of an excellent, royalty-free image format.
Reading text is one of the essential needs of the visually impaired people. We developed a mobile system that can read Turkish scene and book text, using a fast gradient-based multi-scale text detection algorithm for real-time operation and Tesseract OCR engine for character recognition. We evaluated the OCR accuracy and running time of our system on a new, publicly available mobile Turkish scene text dataset we constructed and also compared with state-of-the-art systems. Our system proved to be much faster, able to run on a mobile device, with OCR accuracy comparable to the state-of-the-art.
In the digital watermarking with DCT method,the watermark is located within a range of DCT coefficients of the cover image. In this paper to use the low-frequency band, a new method is proposed by using a combination of the DCT and PCA transform. The proposed method is compared to other DCT methods, our method is robust and keeps the quality of cover image, also increases capacity of the watermarking.
For watermarking of the digital grayscale image its Gray planes have been used. With the help of the introduced representation over Gray planes the LSB embedding method and detection have been discussed. It found that data, a binary image, hidden in the Gray planes is more robust to JPEG lossy compression than in the bit planes.
Cost-efficient compressive sensing of big media data with fast reconstructed high-quality results is very challenging. In this paper, we propose a new large-scale image compressive sensing method, composed of operator-based strategy in the context of fixed point continuation method and weighted LASSO with tree structure sparsity pattern. The main characteristic of our method is free from any assumptions and restrictions. The feasibility of our method is verified via simulations and comparisons with state-of-the-art algorithms.
This paper analyzes a revised fragile watermarking scheme proposed by Botta et al. which was developed as a revision of the watermarking scheme previously proposed by Rawat et al. A new attack is presented that allows an attacker to apply a valid watermark on tampered images, therefore circumventing the protection that the watermarking scheme under study was supposed to offer. Furthermore, the presented attack has very low computational and memory requirements.
An adaptive visible/invisible watermarking scheme is done to prevent the privacy and preserving copyright protection of digital data using Hadamard transform based on the scaling factor of the image. The value of scaling factor depends on the control parameter. The scaling factor is calculated to embedded the watermark. Depend upon the control parameter the visible and invisible watermarking is determined. The proposed Hadamard transform domain method is more robust again image/signal processing attacks. Furthermore, it also shows that the proposed method confirm the efficiency through various performance analysis and experimental results.
Although techniques for separate image and audio steganography are widely known, relatively little has been described concerning the hiding of information within video streams ("video steganography"). In this paper we review the current state of the art in this field, and describe the key issues we have encountered in developing a practical video steganography system. A supporting video is also available online at http://www.youtube.com/watch?v=YhnlHmZolRM
With the advancement of communication technology,data is exchanged digitally over the network. At the other side the technology is also proven as a tool for unauthorized access to attackers. Thus the security of data to be transmitted digitally should get prime focus. Data hiding is the common approach to secure data. In steganography technique, the existence of data is concealed. GA is an emerging component of AI to provide suboptimal solutions. In this paper the use of GA in Steganography is explored to find future scope of research.