Hasil untuk "cs.MM"

Menampilkan 20 dari ~183354 hasil · dari CrossRef, arXiv

JSON API
arXiv Open Access 2025
A Survey of Information Disorder on Video-Sharing Platforms

Meiyu Li, Wei Ai, Naeemul Hassan

Video sharing platforms (VSPs) have become central information hubs but also facilitate the spread of information disorder, from misleading narratives to fabricated content. This survey synthesizes research on VSPs' multimedia ecosystems across three dimensions: (1) types of information disorder, (2) methodological approaches, and (3) platform features. We conclude by identifying key challenges and open questions for future research.

en cs.MM, cs.CY
arXiv Open Access 2024
Results of the 2024 Video Browser Showdown

Luca Rossetto, Klaus Schoeffmann, Cathal Gurrin et al.

This report presents the results of the 13th Video Browser Showdown, held at the 2024 International Conference on Multimedia Modeling on the 29th of January 2024 in Amsterdam, the Netherlands.

en cs.MM, cs.IR
arXiv Open Access 2024
An Open Software Suite for Event-Based Video

Andrew C. Freeman

While traditional video representations are organized around discrete image frames, event-based video is a new paradigm that forgoes image frames altogether. Rather, pixel samples are temporally asynchronous and independent of one another. Until now, researchers have lacked a cohesive software framework for exploring the representation, compression, and applications of event-based video. I present the AD$Δ$ER software suite to fill this gap. This framework includes utilities for transcoding framed and multimodal event-based video sources to a common representation, rate control mechanisms, lossy compression, application support, and an interactive GUI for transcoding and playback. In this paper, I describe these various software components and their usage.

en cs.MM, cs.CV
arXiv Open Access 2023
Detecting False Alarms and Misses in Audio Captions

Rehana Mahfuz, Yinyi Guo, Arvind Krishna Sridhar et al.

Metrics to evaluate audio captions simply provide a score without much explanation regarding what may be wrong in case the score is low. Manual human intervention is needed to find any shortcomings of the caption. In this work, we introduce a metric which automatically identifies the shortcomings of an audio caption by detecting the misses and false alarms in a candidate caption with respect to a reference caption, and reports the recall, precision and F-score. Such a metric is very useful in profiling the deficiencies of an audio captioning model, which is a milestone towards improving the quality of audio captions.

en cs.MM
arXiv Open Access 2022
DaI: Decrypt and Infer the Quality of Real-Time Video Streaming

Sheng Cheng

Inferring the quality of network services is the vital basis of optimization for network operators. However, prevailing real-time video streaming applications adopt encryption for security, leaving it a problem to extract Quality of Service (QoS) indicators of real-time video. In this paper, we propose DaI, a traffic-based real-time video quality estimator. DaI can partially decrypt the encrypted real-time video data and applies machine learning methods to estimate key objective Quality of Experience (QoE) metrics of real-time video. According to the experimental results, DaI can estimate objective QoE metrics with an average accuracy of 79%.

en cs.MM, cs.NI
arXiv Open Access 2021
Fake-image detection with Robust Hashing

Miki Tanaka, Hitoshi Kiya

In this paper, we investigate whether robust hashing has a possibility to robustly detect fake-images even when multiple manipulation techniques such as JPEG compression are applied to images for the first time. In an experiment, the proposed fake detection with robust hashing is demonstrated to outperform state-of-the-art one under the use of various datasets including fake images generated with GANs.

en cs.MM, cs.CV
arXiv Open Access 2021
Multimedia Technology Applications and Algorithms: A Survey

Palak Tiwary, Sanjida Ahmed

Multimedia related research and development has evolved rapidly in the last few years with advancements in hardware, software and network infrastructures. As a result, multimedia has been integrated into domains like Healthcare and Medicine, Human facial feature extraction and tracking, pose recognition, disparity estimation, etc. This survey gives an overview of the various multimedia technologies and algorithms developed in the domains mentioned.

en cs.MM, cs.CV
arXiv Open Access 2021
Revisiting Pre-analysis Information Based Rate Control in x265

Hewei Liu

Due to the excellent compression and high real-time performance, x265 is widely used in practical applications. Combined with CU-tree based pre-analysis, x265 rate control can obtain high rate-distortion (R-D) performance. However, the pre-analysis information is not fully utilized, and the accuracy of rate control is not satisfactory in x265 because of an empirical linear model. In this paper, we propose an improved cost-guided rate control scheme for x265. Firstly, the pre-analysis information is further used to refine the bit allocation. Secondly, CU-tree is combined with the lambda-domain model for more accurate rate control and higher R-D performance. Experimental results show that compared with the original x265, our method can achieve 10.3\% BD-rate gain with only 0.22\textperthousand bitrate error.

en cs.MM
arXiv Open Access 2019
Notation for Subject Answer Analysis

Lucjan Janowski, Jakub Nawała, Werner Robitza et al.

It is believed that consistent notation helps the research community in many ways. First and foremost, it provides a consistent interface of communication. Subjective experiments described according to uniform rules are easier to understand and analyze. Additionally, a comparison of various results is less complicated. In this publication we describe notation proposed by VQEG (Video Quality Expert Group) working group SAM (Statistical Analysis and Methods).

en cs.MM
arXiv Open Access 2016
Daala: A Perceptually-Driven Still Picture Codec

Jean-Marc Valin, Nathan E. Egge, Thomas Daede et al.

Daala is a new royalty-free video codec based on perceptually-driven coding techniques. We explore using its keyframe format for still picture coding and show how it has improved over the past year. We believe the technology used in Daala could be the basis of an excellent, royalty-free image format.

en cs.MM
arXiv Open Access 2016
MT3S: Mobile Turkish Scene Text-to-Speech System for the Visually Impaired

Muhammet Bastan, Hilal Kandemir, Busra Canturk

Reading text is one of the essential needs of the visually impaired people. We developed a mobile system that can read Turkish scene and book text, using a fast gradient-based multi-scale text detection algorithm for real-time operation and Tesseract OCR engine for character recognition. We evaluated the OCR accuracy and running time of our system on a new, publicly available mobile Turkish scene text dataset we constructed and also compared with state-of-the-art systems. Our system proved to be much faster, able to run on a mobile device, with OCR accuracy comparable to the state-of-the-art.

en cs.MM
arXiv Open Access 2015
A New Method For Digital Watermarking Based on Combination of DCT and PCA

Arash Saboori, S. Abolfazl Hosseini

In the digital watermarking with DCT method,the watermark is located within a range of DCT coefficients of the cover image. In this paper to use the low-frequency band, a new method is proposed by using a combination of the DCT and PCA transform. The proposed method is compared to other DCT methods, our method is robust and keeps the quality of cover image, also increases capacity of the watermarking.

arXiv Open Access 2015
Embedding of binary image in the Gray planes

V. N. Gorbachev, L. A. Denisov, E. M. Kainarova

For watermarking of the digital grayscale image its Gray planes have been used. With the help of the introduced representation over Gray planes the LSB embedding method and detection have been discussed. It found that data, a binary image, hidden in the Gray planes is more robust to JPEG lossy compression than in the bit planes.

en cs.MM
arXiv Open Access 2015
Compressive Sensing of Large-Scale Images: An Assumption-Free Approach

Wei-Jie Liang, Gang-Xuan Lin, Chun-Shien Lu

Cost-efficient compressive sensing of big media data with fast reconstructed high-quality results is very challenging. In this paper, we propose a new large-scale image compressive sensing method, composed of operator-based strategy in the context of fixed point continuation method and weighted LASSO with tree structure sparsity pattern. The main characteristic of our method is free from any assumptions and restrictions. The feasibility of our method is verified via simulations and comparisons with state-of-the-art algorithms.

en cs.MM, cs.IT
arXiv Open Access 2015
On the Security of a Revised Fragile Watermarking Scheme

Daniel Caragata

This paper analyzes a revised fragile watermarking scheme proposed by Botta et al. which was developed as a revision of the watermarking scheme previously proposed by Rawat et al. A new attack is presented that allows an attacker to apply a valid watermark on tampered images, therefore circumventing the protection that the watermarking scheme under study was supposed to offer. Furthermore, the presented attack has very low computational and memory requirements.

en cs.MM, cs.CR
arXiv Open Access 2014
An Adaptive Watermarking Process in Hadamard Transform

Parvathavarthini S., Shanthakumari R

An adaptive visible/invisible watermarking scheme is done to prevent the privacy and preserving copyright protection of digital data using Hadamard transform based on the scaling factor of the image. The value of scaling factor depends on the control parameter. The scaling factor is calculated to embedded the watermark. Depend upon the control parameter the visible and invisible watermarking is determined. The proposed Hadamard transform domain method is more robust again image/signal processing attacks. Furthermore, it also shows that the proposed method confirm the efficiency through various performance analysis and experimental results.

en cs.MM
arXiv Open Access 2014
Developing a Video Steganography Toolkit

James Ridgway, Mike Stannett

Although techniques for separate image and audio steganography are widely known, relatively little has been described concerning the hiding of information within video streams ("video steganography"). In this paper we review the current state of the art in this field, and describe the key issues we have encountered in developing a practical video steganography system. A supporting video is also available online at http://www.youtube.com/watch?v=YhnlHmZolRM

en cs.MM
arXiv Open Access 2014
Genetic Algorithm in Audio Steganography

Manisha Rana, Rohit Tanwar

With the advancement of communication technology,data is exchanged digitally over the network. At the other side the technology is also proven as a tool for unauthorized access to attackers. Thus the security of data to be transmitted digitally should get prime focus. Data hiding is the common approach to secure data. In steganography technique, the existence of data is concealed. GA is an emerging component of AI to provide suboptimal solutions. In this paper the use of GA in Steganography is explored to find future scope of research.

Halaman 4 dari 9168