diveXplore at the Video Browser Showdown 2024
Klaus Schoeffmann, Sahar Nasirihaghighi
According to our experience from VBS2023 and the feedback from the IVR4B special session at CBMI2023, we have largely revised the diveXplore system for VBS2024. It now integrates OpenCLIP trained on the LAION-2B dataset for image/text embeddings that are used for free-text and visual similarity search, a query server that is able to distribute different queries and merge the results, a user interface optimized for fast browsing, as well as an exploration view for large clusters of similar videos (e.g., weddings, paraglider events, snow and ice scenery, etc.).
Point Cloud Streaming with Latency-Driven Implicit Adaptation using MoQ
Andrew Freeman, Michael Rudolph, Amr Rizk
Point clouds are a promising video representation for virtual and augmented reality. Their high-bitrate, however, has so far limited the practicality of live streaming systems. In this work, we leverage the delivery timeout feature within the Media Over QUIC protocol to perform implicit server-side adaptation based on an application's latency target. Through experimentation with several publisher and network configurations, we demonstrate that our system unlocks a unique trade-off on a per-client basis: applications with lower latency requirements will receive lower-quality video, while applications with more relaxed latency requirements will receive higher-quality video.
The Metaverse from a Multimedia Communications Perspective
Haiwei Dong, Jeannie S. A. Lee
eXtended reality (XR) technologies such as virtual reality and 360° stereoscopic streaming enable the concept of the Metaverse, an immersive virtual space for collaboration and interaction. To ensure high fidelity display of immersive media, the bandwidth, latency and network traffic patterns will need to be considered to ensure a user's Quality of Experience (QoE). In this article, examples and calculations are explored to demonstrate the requirements of the abovementioned parameters. Additionally, future methods such as network-awareness using reinforcement learning (RL) and XR content awareness using spatial or temporal difference in the frames could be explored from a multimedia communications perspective.
Evaluation of a course mediatised with Xerte
Ghalia Merzougui, Roumaissa Dehkal, Maheiddine Djoudi
Interactive multimedia educational content has recently been of interest to attract attention on the learner and increase understanding by the latter. In parallel several open source authoring tools offer a quick and easy production of this type of content. As such, our contribution is to mediatize a course i.e. 'English' with the authoring system 'Xerte' which is intended both for simple users and developers in ActionScript. An experiment of course is conducted on a sample of a private school's students. At the end of this experience, we administered a questionnaire to evaluate the device, the results obtained, evidenced by the favorable reception of interactive multimedia integration in educational content.
Video Processing on the Edge for Multimedia IoT Systems
Yang Cao, Zeyu Xu, Peng Qin
et al.
In this article, we first survey the current situation of video processing on the edge for multimedia Internet-of-Things (M-IoT) systems in three typical scenarios, i.e., smart cities, satellite networks, and Internet-of-Vehicles. By summarizing a general model of the edge video processing, the importance of developing an edge computing platform is highlighted. Then, we give a method of implementing cooperative video processing on an edge computing platform based on light-weighted virtualization technologies. Performance evaluation is conducted and some insightful observations can be obtained. Moreover, we summarize challenges and opportunities of realizing effective edge video processing for M-IoT systems.
StegNet: Mega Image Steganography Capacity with Deep Convolutional Network
Pin Wu, Yang Yang, Xiaoqiang Li
Traditional image steganography often leans interests towards safely embedding hidden information into cover images with payload capacity almost neglected. This paper combines recent deep convolutional neural network methods with image-into-image steganography. It successfully hides the same size images with a decoding rate of 98.2% or bpp (bits per pixel) of 23.57 by changing only 0.76% of the cover image on average. Our method directly learns end-to-end mappings between the cover image and the embedded image and between the hidden image and the decoded image. We~further show that our embedded image, while with mega payload capacity, is still robust to statistical analysis.
An improved watermarking scheme for Internet applications
Christophe Guyeux, Jacques M. Bahi
In this paper, a data hiding scheme ready for Internet applications is proposed. An existing scheme based on chaotic iterations is improved, to respond to some major Internet security concerns, such as digital rights management, communication over hidden channels, and social search engines. By using Reed Solomon error correcting codes and wavelets domain, we show that this data hiding scheme can be improved to solve issues and requirements raised by these Internet fields.
A First Look at Quality of Mobile Live Streaming Experience: the Case of Periscope
Matti Siekkinen, Enrico Masala, Teemu Kämäräinen
Live multimedia streaming from mobile devices is rapidly gaining popularity but little is known about the QoE they provide. In this paper, we examine the Periscope service. We first crawl the service in order to understand its usage patterns. Then, we study the protocols used, the typical quality of experience indicators, such as playback smoothness and latency, video quality, and the energy consumption of the Android application.
Generic-Precision algorithm for DCT-Cordic architectures
Imen Ben Saad, Younes Lahbib, Yassine Hachaïchi
et al.
In this paper we propose a generic algorithm to calculate the rotation parameters of CORDIC angles required for the Discrete Cosine Transform algorithm (DCT). This leads us to increase the precision of calculation meeting any accuracy.Our contribution is to use this decomposition in CORDIC based DCT which is appropriate for domains which require high quality and top precision. We then propose a hardware implementation of the novel transformation, and as expected, a substantial improvement in PSNR quality is found.
A Novel Approach for Image Steganography in Spatial Domain
Fatema Akhter
This paper presents a new approach for hiding information in digital image in spatial domain. In this approach three bits of message is embedded in a pixel using Lucas number system but only one bit plane is allowed for alternation. The experimental results show that the proposed method has the larger capacity of embedding data, high peak signal to noise ratio compared to existing methods and is hardly detectable for steganolysis algorithm.
Arabic Text Watermarking: A Review
Reem Ahmed Alotaibi, Lamiaa A. Elrefaei
The using of the internet with its technologies and applications have been increased rapidly. So, protecting the text from illegal use is too needed . Text watermarking is used for this purpose. Arabic text has many characteristics such existing of diacritics , kashida (extension character) and points above or under its letters .Each of Arabic letters can take different shapes with different Unicode. These characteristics are utilized in the watermarking process. In this paper, several methods are discussed in the area of Arabic text watermarking with its advantages and disadvantages .Comparison of these methods is done in term of capacity, robustness and Imperceptibility.
Color to Gray and Back transformation for distributing color digital images
V. N. Gorbachev, E. M. Kaynarova, I. K. Metelev
et al.
The Color to Gray and Back transformation watermarking with a secrete key is considered. Color is embedded into the bit planes of the luminosity component of the YUV color space with the help of a block algorithm that allows using not only the least significant bits. An application of the problem of distributing color digital images from a data base among legitimate users is discussed. The proposed protocol can protect original images from unauthorized copying.
An Approach for Text Steganography Based on Markov Chains
H. Hernan Moraldo
A text steganography method based on Markov chains is introduced, together with a reference implementation. This method allows for information hiding in texts that are automatically generated following a given Markov model. Other Markov - based systems of this kind rely on big simplifications of the language model to work, which produces less natural looking and more easily detectable texts. The method described here is designed to generate texts within a good approximation of the original language model provided.
Image compression using anti-forensics method
M. S. Sreelakshmi, D. Venkataraman
A large number of image forensics methods are available which are capable of identifying image tampering. But these techniques are not capable of addressing the anti-forensics method which is able to hide the trace of image tampering. In this paper anti-forensics method for digital image compression has been proposed. This anti-forensics method is capable of removing the traces of image compression. Additionally, technique is also able to remove the traces of blocking artifact that are left by image compression algorithms that divide an image into segments during compression process. This method is targeted to remove the compression fingerprints of JPEG compression.
Optimal Frame Transmission for Scalable Video with Hierarchical Prediction Structure
Saied Mehdian, Ben Liang
An optimal frame transmission scheme is presented for streaming scalable video over a link with limited capacity. The objective is to select a transmission sequence of frames and their transmission schedule such that the overall video quality is maximized. The problem is solved for two general classes of hierarchical prediction structures, which include as a special case the popular dyadic structure. Based on a new characterization of the interdependence among frames in terms of trees, structural properties of an optimal transmission schedule are derived. These properties lead to the development of a jointly optimal frame selection and scheduling algorithm, which has computational complexity that is quadratic in the number of frames. Simulation results show that the optimal scheme substantially outperforms three existing alternatives.
Comparison of Speech Activity Detection Techniques for Speaker Recognition
Md. Sahidullah, Goutam Saha
Speech activity detection (SAD) is an essential component for a variety of speech processing applications. It has been observed that performances of various speech based tasks are very much dependent on the efficiency of the SAD. In this paper, we have systematically reviewed some popular SAD techniques and their applications in speaker recognition. Speaker verification system using different SAD technique are experimentally evaluated on NIST speech corpora using Gaussian mixture model- universal background model (GMM-UBM) based classifier for clean and noisy conditions. It has been found that two Gaussian modeling based SAD is comparatively better than other SAD techniques for different types of noises.
Compression and Quantitative Analysis of Buffer Map Message in P2P Streaming System
Chunxi Li, Changjia Chen, DahMing Chiu
BM compression is a straightforward and operable way to reduce buffer message length as well as to improve system performance. In this paper, we thoroughly discuss the principles and protocol progress of different compression schemes, and for the first time present an original compression scheme which can nearly remove all redundant information from buffer message. Theoretical limit of compression rates are deduced in the theory of information. Through the analysis of information content and simulation with our measured BM trace of UUSee, the validity and superiority of our compression scheme are validated in term of compression ratio.
Alternatives to speech in low bit rate communication systems
Cristina Videira Lopes, Pedro M. Q. Aguiar
This paper describes a framework and a method with which speech communication can be analyzed. The framework consists of a set of low bit rate, short-range acoustic communication systems, such as speech, but that are quite different from speech. The method is to systematically compare these systems according to different objective functions such as data rate, computational overhead, psychoacoustic effects and semantics. One goal of this study is to better understand the nature of human communication. Another goal is to identify acoustic communication systems that are more efficient than human speech for some specific purposes.
An Optimal Prefix Replication Strategy for VoD Services
M Dakshayini, T R GopalaKrishnan Nair
In this paper we propose scalable proxy servers cluster architecture of interconnected proxy servers for high quality and high availability services. We also propose an optimal regional popularity based video prefix replication strategy and a scene change based replica caching algorithm that utilizes the zipf-like video popularity distribution to maximize the availability of videos closer to the client and request-servicing rate thereby reducing the client rejection ratio and the response time for the client. The simulation results of our proposed architecture and algorithm show the greater achievement in maximizing the availability of videos, client request-servicing rate and in reduction of initial start-up latency and client rejection ratio.
A New Trend in Optimization on Multi Overcomplete Dictionary toward Inpainting
SeyyedMajid Valiollahzadeh, Mohammad Nazari, Massoud Babaie-Zadeh
et al.
Recently, great attention was intended toward overcomplete dictionaries and the sparse representations they can provide. In a wide variety of signal processing problems, sparsity serves a crucial property leading to high performance. Inpainting, the process of reconstructing lost or deteriorated parts of images or videos, is an interesting application which can be handled by suitably decomposition of an image through combination of overcomplete dictionaries. This paper addresses a novel technique of such a decomposition and investigate that through inpainting of images. Simulations are presented to demonstrate the validation of our approach.