Text2VR: Automated instruction Generation in Virtual Reality using Large language Models for Assembly Task
Subin Raj Peter
Virtual Reality (VR) has emerged as a powerful tool for workforce training, offering immersive, interactive, and risk-free environments that enhance skill acquisition, decision-making, and confidence. Despite its advantages, developing VR applications for training remains a significant challenge due to the time, expertise, and resources required to create accurate and engaging instructional content. To address these limitations, this paper proposes a novel approach that leverages Large Language Models (LLMs) to automate the generation of virtual instructions from textual input. The system comprises two core components: an LLM module that extracts task-relevant information from the text, and an intelligent module that transforms this information into animated demonstrations and visual cues within a VR environment. The intelligent module receives input from the LLM module and interprets the extracted information. Based on this, an instruction generator creates training content using relevant data from a database. The instruction generator generates the instruction by changing the color of virtual objects and creating animations to illustrate tasks. This approach enhances training effectiveness and reduces development overhead, making VR-based training more scalable and adaptable to evolving industrial needs.
HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection
Han Wang, Zhuoran Wang, Roy Ka-Wei Lee
Detecting hate speech in videos remains challenging due to the complexity of multimodal content and the lack of fine-grained annotations in existing datasets. We present HateClipSeg, a large-scale multimodal dataset with both video-level and segment-level annotations, comprising over 11,714 segments labeled as Normal or across five Offensive categories: Hateful, Insulting, Sexual, Violence, Self-Harm, along with explicit target victim labels. Our three-stage annotation process yields high inter-annotator agreement (Krippendorff's alpha = 0.817). We propose three tasks to benchmark performance: (1) Trimmed Hateful Video Classification, (2) Temporal Hateful Video Localization, and (3) Online Hateful Video Classification. Results highlight substantial gaps in current models, emphasizing the need for more sophisticated multimodal and temporally aware approaches. The HateClipSeg dataset are publicly available at https://github.com/Social-AI-Studio/HateClipSeg.git.
A survey of manifold learning and its applications for multimedia
Hannes Fassold
Manifold learning is an emerging research domain of machine learning. In this work, we give an introduction into manifold learning and how it is employed for important application fields in multimedia.
A computational analysis on the relationship between melodic originality and thematic fame in classical music from the Romantic period
Hudson Griffith
In this work, the researcher presents a novel approach to calculating melodic originality based on the research by Simonton (1994). This novel formula is then applied to a dataset of 428 classical music pieces from the Romantic period to analyze the relationship between melodic originality and thematic fame.
The Beauty of Repetition in Machine Composition Scenarios
Zhejing Hu, Xiao Ma, Yan Liu
et al.
Repetition, a basic form of artistic creation, appears in most musical works and delivers enthralling aesthetic experiences.
You were saying? -- Spoken Language in the V3C Dataset
Luca Rossetto
This paper presents an analysis of the distribution of spoken language in the V3C video retrieval benchmark dataset based on automatically generated transcripts. It finds that a large portion of the dataset is covered by spoken language. Since language transcripts can be quickly and accurately described, this has implications for retrieval tasks such as known-item search.
Competitive Video Retrieval with vitrivr at the Video Browser Showdown 2018 - Final Notes
Luca Rossetto, Ivan Giangreco, Ralph Gasser
et al.
This paper presents an after-the-fact summary of the participation of the vitrivr system to the 2018 Video Browser Showdown. A particular focus is on additions made since the original publication and the systems performance during the competition.
Perceptual Compressive Sensing based on Contrast Sensitivity Function: Can we avoid non-visible redundancies acquisition?
Seyed Hamid Safavi, Farah Torkamani-Azar
In this paper, we propose a novel CS approach in which the acquisition of non-visible information is also avoided.
StegIbiza: Steganography in Club Music Implemented in Python
Krzysztof Szczypiorski, Wojciech Zydecki
This paper introduces the implementation of steganography method called StegIbiza, which uses tempo modulation as hidden message carrier. With the use of Python scripting language, a bit string was encoded and decoded using WAV and MP3 files. Once the message was hidden into a music files, an internet radio was created to evaluate broadcast possibilities. No dedicated music or signal processing equipment was used in this StegIbiza implementation
A Note on Efficiency of Downsampling and Color Transformation in Image Quality Assessment
Hossein Ziaei Nafchi, Mohamed Cheriet
Several existing and successful full reference image quality assessment (IQA) models use linear color transformation and downsampling before measuring similarity or quality of images. This paper indicates to the right order of these two procedures and that the existing models have not chosen the more efficient approach. In addition, efficiency of these metrics is not compared in a fair basis in the literature.
A proposal project for a blind image quality assessment by learning distortions from the full reference image quality assessments
StΓ©fane Paris
This short paper presents a perspective plan to build a null reference image quality assessment. Its main goal is to deliver both the objective score and the distortion map for a given distorted image without the knowledge of its reference image.
Compressive sensing based velocity estimation in video data
Ana Miletic, Nemanja Ivanovic
This paper considers the use of compressive sensing based algorithms for velocity estimation of moving vehicles. The procedure is based on sparse reconstruction algorithms combined with time-frequency analysis applied to video data. This algorithm provides an accurate estimation of object's velocity even in the case of a very reduced number of available video frames. The influence of crucial parameters is analysed for different types of moving vehicles.
A Digital Watermarking Approach Based on DCT Domain Combining QR Code and Chaotic Theory
Qingbo Kang, Ke Li, Jichun Yang
This paper proposes a robust watermarking approach based on Discrete Cosine Transform domain that combines Quick Response Code and chaotic system.
Robust Lossless Semi Fragile Information Protection in Images
Pushkar Dixit, Nishant Singh, Jay Prakash Gupta
Internet security finds it difficult to keep the information secure and to maintain the integrity of the data. Sending messages over the internet secretly is one of the major tasks as it is widely used for passing the message.
Survey of Cognitive Radio Techniques in Wireless Network
Lu Lu
In this report, I surveyed the cognitive radio technique in wireless networks. Researched several kinds of cognitive techniques about their advantages and disadvantages.
Web Publishing of the Files Obtained by Flash
Virgiliu Streian, Adela Ionescu
The aim of this article is to familiarize the user with the Web publishing of the files obtained by Flash. The article contains an overview of Macromedia Flash 5, as well as the running of a Playing Flash movie, information on Flash and Generator, the publishing of Flash movies, a HTLM publishing for Flash Player files and publishing by Generator templates.
Virtual Reality
Dan L. Lacrama, Dorina Fera
This paper is focused on the presentation of Virtual Reality principles together with the main implementation methods and techniques. An overview of the main development directions is included.
Computer Art in the Former Soviet Bloc
Eric Engle
Documents early computer art in the Soviet bloc and describes Marxist art theory.
From Digital Television to Internet?
Vita Hinze-Hoare
This paper provides a general technical overview of the Multimedia Home Platform (MHP) specifications. MHP is a generic interface between digital applications and user machines, whether they happen to be set top boxes, digital TV sets or Multimedia PC's. MHP extends the DVB open standards. Addressed are MHP architexture, System core and MHP Profiles.
Security Analysis of A Chaos-based Image Encryption Algorithm
Shiguo Lian, Jinsheng Sun, Zhiquan Wang
The security of Fridrich Image Encryption Algorithm against brute-force attack, statistical attack, known-plaintext attack and select-plaintext attack is analyzed by investigating the properties of the involved chaotic maps and diffusion functions. Based on the given analyses, some means are proposed to strengthen the overall performance of the focused cryptosystem.