Hasil "cs.MM" - JURNALIN

arXiv Open Access 2025

Text2VR: Automated instruction Generation in Virtual Reality using Large language Models for Assembly Task

Subin Raj Peter

Virtual Reality (VR) has emerged as a powerful tool for workforce training, offering immersive, interactive, and risk-free environments that enhance skill acquisition, decision-making, and confidence. Despite its advantages, developing VR applications for training remains a significant challenge due to the time, expertise, and resources required to create accurate and engaging instructional content. To address these limitations, this paper proposes a novel approach that leverages Large Language Models (LLMs) to automate the generation of virtual instructions from textual input. The system comprises two core components: an LLM module that extracts task-relevant information from the text, and an intelligent module that transforms this information into animated demonstrations and visual cues within a VR environment. The intelligent module receives input from the LLM module and interprets the extracted information. Based on this, an instruction generator creates training content using relevant data from a database. The instruction generator generates the instruction by changing the color of virtual objects and creating animations to illustrate tasks. This approach enhances training effectiveness and reduces development overhead, making VR-based training more scalable and adaptable to evolving industrial needs.

en cs.CV, cs.HC

Detail Sumber

arXiv Open Access 2025

HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection

Han Wang, Zhuoran Wang, Roy Ka-Wei Lee

Detecting hate speech in videos remains challenging due to the complexity of multimodal content and the lack of fine-grained annotations in existing datasets. We present HateClipSeg, a large-scale multimodal dataset with both video-level and segment-level annotations, comprising over 11,714 segments labeled as Normal or across five Offensive categories: Hateful, Insulting, Sexual, Violence, Self-Harm, along with explicit target victim labels. Our three-stage annotation process yields high inter-annotator agreement (Krippendorff's alpha = 0.817). We propose three tasks to benchmark performance: (1) Trimmed Hateful Video Classification, (2) Temporal Hateful Video Localization, and (3) Online Hateful Video Classification. Results highlight substantial gaps in current models, emphasizing the need for more sophisticated multimodal and temporally aware approaches. The HateClipSeg dataset are publicly available at https://github.com/Social-AI-Studio/HateClipSeg.git.

en cs.CV, cs.AI

Detail DOI Sumber

arXiv Open Access 2023

A survey of manifold learning and its applications for multimedia

Hannes Fassold

Manifold learning is an emerging research domain of machine learning. In this work, we give an introduction into manifold learning and how it is employed for important application fields in multimedia.

en cs.MM, cs.AI

Detail Sumber

arXiv Open Access 2022

A computational analysis on the relationship between melodic originality and thematic fame in classical music from the Romantic period

Hudson Griffith

In this work, the researcher presents a novel approach to calculating melodic originality based on the research by Simonton (1994). This novel formula is then applied to a dataset of 428 classical music pieces from the Romantic period to analyze the relationship between melodic originality and thematic fame.

en cs.MM

Detail Sumber

arXiv Open Access 2022

The Beauty of Repetition in Machine Composition Scenarios

Zhejing Hu, Xiao Ma, Yan Liu et al.

Repetition, a basic form of artistic creation, appears in most musical works and delivers enthralling aesthetic experiences.

en cs.MM

Detail DOI Sumber

arXiv Open Access 2022

You were saying? -- Spoken Language in the V3C Dataset

Luca Rossetto

This paper presents an analysis of the distribution of spoken language in the V3C video retrieval benchmark dataset based on automatically generated transcripts. It finds that a large portion of the dataset is covered by spoken language. Since language transcripts can be quickly and accurately described, this has implications for retrieval tasks such as known-item search.

en cs.MM, cs.IR

Detail Sumber

arXiv Open Access 2018

Competitive Video Retrieval with vitrivr at the Video Browser Showdown 2018 - Final Notes

Luca Rossetto, Ivan Giangreco, Ralph Gasser et al.

This paper presents an after-the-fact summary of the participation of the vitrivr system to the 2018 Video Browser Showdown. A particular focus is on additions made since the original publication and the systems performance during the competition.

en cs.MM

Detail Sumber

arXiv Open Access 2017

Perceptual Compressive Sensing based on Contrast Sensitivity Function: Can we avoid non-visible redundancies acquisition?

Seyed Hamid Safavi, Farah Torkamani-Azar

In this paper, we propose a novel CS approach in which the acquisition of non-visible information is also avoided.

en cs.MM

Detail Sumber

arXiv Open Access 2017

StegIbiza: Steganography in Club Music Implemented in Python

Krzysztof Szczypiorski, Wojciech Zydecki

This paper introduces the implementation of steganography method called StegIbiza, which uses tempo modulation as hidden message carrier. With the use of Python scripting language, a bit string was encoded and decoded using WAV and MP3 files. Once the message was hidden into a music files, an internet radio was created to evaluate broadcast possibilities. No dedicated music or signal processing equipment was used in this StegIbiza implementation

en cs.MM

Detail Sumber

arXiv Open Access 2016

A Note on Efficiency of Downsampling and Color Transformation in Image Quality Assessment

Hossein Ziaei Nafchi, Mohamed Cheriet

Several existing and successful full reference image quality assessment (IQA) models use linear color transformation and downsampling before measuring similarity or quality of images. This paper indicates to the right order of these two procedures and that the existing models have not chosen the more efficient approach. In addition, efficiency of these metrics is not compared in a fair basis in the literature.

en cs.MM

Detail Sumber

arXiv Open Access 2015

A proposal project for a blind image quality assessment by learning distortions from the full reference image quality assessments

Stéfane Paris

This short paper presents a perspective plan to build a null reference image quality assessment. Its main goal is to deliver both the objective score and the distortion map for a given distorted image without the knowledge of its reference image.

en cs.MM, cs.CV

Detail DOI Sumber

arXiv Open Access 2015

Compressive sensing based velocity estimation in video data

Ana Miletic, Nemanja Ivanovic

This paper considers the use of compressive sensing based algorithms for velocity estimation of moving vehicles. The procedure is based on sparse reconstruction algorithms combined with time-frequency analysis applied to video data. This algorithm provides an accurate estimation of object's velocity even in the case of a very reduced number of available video frames. The influence of crucial parameters is analysed for different types of moving vehicles.

en cs.MM

Detail Sumber

arXiv Open Access 2014

A Digital Watermarking Approach Based on DCT Domain Combining QR Code and Chaotic Theory

Qingbo Kang, Ke Li, Jichun Yang

This paper proposes a robust watermarking approach based on Discrete Cosine Transform domain that combines Quick Response Code and chaotic system.

en cs.MM, cs.CR

Detail DOI Sumber

arXiv Open Access 2014

Robust Lossless Semi Fragile Information Protection in Images

Pushkar Dixit, Nishant Singh, Jay Prakash Gupta

Internet security finds it difficult to keep the information secure and to maintain the integrity of the data. Sending messages over the internet secretly is one of the major tasks as it is widely used for passing the message.

en cs.MM

Detail Sumber

arXiv Open Access 2011

Survey of Cognitive Radio Techniques in Wireless Network

Lu Lu

In this report, I surveyed the cognitive radio technique in wireless networks. Researched several kinds of cognitive techniques about their advantages and disadvantages.

en cs.MM

Detail Sumber

arXiv Open Access 2009

Web Publishing of the Files Obtained by Flash

Virgiliu Streian, Adela Ionescu

The aim of this article is to familiarize the user with the Web publishing of the files obtained by Flash. The article contains an overview of Macromedia Flash 5, as well as the running of a Playing Flash movie, information on Flash and Generator, the publishing of Flash movies, a HTLM publishing for Flash Player files and publishing by Generator templates.

en cs.MM

Detail Sumber

arXiv Open Access 2009

Virtual Reality

Dan L. Lacrama, Dorina Fera

This paper is focused on the presentation of Virtual Reality principles together with the main implementation methods and techniques. An overview of the main development directions is included.

en cs.MM

Detail Sumber

arXiv Open Access 2008

Computer Art in the Former Soviet Bloc

Eric Engle

Documents early computer art in the Soviet bloc and describes Marxist art theory.

en cs.MM, cs.CY

Detail Sumber

arXiv Open Access 2004

From Digital Television to Internet?

Vita Hinze-Hoare

This paper provides a general technical overview of the Multimedia Home Platform (MHP) specifications. MHP is a generic interface between digital applications and user machines, whether they happen to be set top boxes, digital TV sets or Multimedia PC's. MHP extends the DVB open standards. Addressed are MHP architexture, System core and MHP Profiles.

en cs.MM, cs.CY

Detail Sumber

arXiv Open Access 2006

Security Analysis of A Chaos-based Image Encryption Algorithm

Shiguo Lian, Jinsheng Sun, Zhiquan Wang

The security of Fridrich Image Encryption Algorithm against brute-force attack, statistical attack, known-plaintext attack and select-plaintext attack is analyzed by investigating the properties of the involved chaotic maps and diffusion functions. Based on the given analyses, some means are proposed to strengthen the overall performance of the focused cryptosystem.

en cs.MM, cs.CR

Detail Sumber

Hasil untuk "cs.MM"