{"results":[{"id":"arxiv_2508.03699","title":"Text2VR: Automated instruction Generation in Virtual Reality using Large language Models for Assembly Task","authors":[{"name":"Subin Raj Peter"}],"abstract":"Virtual Reality (VR) has emerged as a powerful tool for workforce training, offering immersive, interactive, and risk-free environments that enhance skill acquisition, decision-making, and confidence. Despite its advantages, developing VR applications for training remains a significant challenge due to the time, expertise, and resources required to create accurate and engaging instructional content. To address these limitations, this paper proposes a novel approach that leverages Large Language Models (LLMs) to automate the generation of virtual instructions from textual input. The system comprises two core components: an LLM module that extracts task-relevant information from the text, and an intelligent module that transforms this information into animated demonstrations and visual cues within a VR environment. The intelligent module receives input from the LLM module and interprets the extracted information. Based on this, an instruction generator creates training content using relevant data from a database. The instruction generator generates the instruction by changing the color of virtual objects and creating animations to illustrate tasks. This approach enhances training effectiveness and reduces development overhead, making VR-based training more scalable and adaptable to evolving industrial needs.","source":"arXiv","year":2025,"language":"en","subjects":["cs.CV","cs.HC","cs.MM"],"url":"https://arxiv.org/abs/2508.03699","pdf_url":"https://arxiv.org/pdf/2508.03699","is_open_access":true,"published_at":"2025-07-19T07:37:48Z","score":69},{"id":"arxiv_2508.01712","title":"HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection","authors":[{"name":"Han Wang"},{"name":"Zhuoran Wang"},{"name":"Roy Ka-Wei Lee"}],"abstract":"Detecting hate speech in videos remains challenging due to the complexity of multimodal content and the lack of fine-grained annotations in existing datasets. We present HateClipSeg, a large-scale multimodal dataset with both video-level and segment-level annotations, comprising over 11,714 segments labeled as Normal or across five Offensive categories: Hateful, Insulting, Sexual, Violence, Self-Harm, along with explicit target victim labels. Our three-stage annotation process yields high inter-annotator agreement (Krippendorff's alpha = 0.817). We propose three tasks to benchmark performance: (1) Trimmed Hateful Video Classification, (2) Temporal Hateful Video Localization, and (3) Online Hateful Video Classification. Results highlight substantial gaps in current models, emphasizing the need for more sophisticated multimodal and temporally aware approaches. The HateClipSeg dataset are publicly available at https://github.com/Social-AI-Studio/HateClipSeg.git.","source":"arXiv","year":2025,"language":"en","subjects":["cs.CV","cs.AI"],"doi":"10.1145/3746027.3758289","url":"https://arxiv.org/abs/2508.01712","pdf_url":"https://arxiv.org/pdf/2508.01712","is_open_access":true,"published_at":"2025-08-03T10:46:06Z","score":69},{"id":"arxiv_2310.12986","title":"A survey of manifold learning and its applications for multimedia","authors":[{"name":"Hannes Fassold"}],"abstract":"Manifold learning is an emerging research domain of machine learning. In this work, we give an introduction into manifold learning and how it is employed for important application fields in multimedia.","source":"arXiv","year":2023,"language":"en","subjects":["cs.MM","cs.AI"],"url":"https://arxiv.org/abs/2310.12986","pdf_url":"https://arxiv.org/pdf/2310.12986","is_open_access":true,"published_at":"2023-09-08T07:16:45Z","score":67},{"id":"arxiv_2210.12201","title":"A computational analysis on the relationship between melodic originality and thematic fame in classical music from the Romantic period","authors":[{"name":"Hudson Griffith"}],"abstract":"In this work, the researcher presents a novel approach to calculating melodic originality based on the research by Simonton (1994). This novel formula is then applied to a dataset of 428 classical music pieces from the Romantic period to analyze the relationship between melodic originality and thematic fame.","source":"arXiv","year":2022,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/2210.12201","pdf_url":"https://arxiv.org/pdf/2210.12201","is_open_access":true,"published_at":"2022-10-21T19:03:29Z","score":66},{"id":"arxiv_2209.11426","title":"The Beauty of Repetition in Machine Composition Scenarios","authors":[{"name":"Zhejing Hu"},{"name":"Xiao Ma"},{"name":"Yan Liu"},{"name":"Gong Chen"},{"name":"Yongxu Liu"}],"abstract":"Repetition, a basic form of artistic creation, appears in most musical works and delivers enthralling aesthetic experiences.","source":"arXiv","year":2022,"language":"en","subjects":["cs.MM"],"doi":"10.1145/3503161.3548130","url":"https://arxiv.org/abs/2209.11426","pdf_url":"https://arxiv.org/pdf/2209.11426","is_open_access":true,"published_at":"2022-09-23T05:58:22Z","score":66},{"id":"arxiv_2212.07835","title":"You were saying? -- Spoken Language in the V3C Dataset","authors":[{"name":"Luca Rossetto"}],"abstract":"This paper presents an analysis of the distribution of spoken language in the V3C video retrieval benchmark dataset based on automatically generated transcripts. It finds that a large portion of the dataset is covered by spoken language. Since language transcripts can be quickly and accurately described, this has implications for retrieval tasks such as known-item search.","source":"arXiv","year":2022,"language":"en","subjects":["cs.MM","cs.IR"],"url":"https://arxiv.org/abs/2212.07835","pdf_url":"https://arxiv.org/pdf/2212.07835","is_open_access":true,"published_at":"2022-12-15T13:44:28Z","score":66},{"id":"arxiv_1805.02371","title":"Competitive Video Retrieval with vitrivr at the Video Browser Showdown 2018 - Final Notes","authors":[{"name":"Luca Rossetto"},{"name":"Ivan Giangreco"},{"name":"Ralph Gasser"},{"name":"Heiko Schuldt"}],"abstract":"This paper presents an after-the-fact summary of the participation of the vitrivr system to the 2018 Video Browser Showdown. A particular focus is on additions made since the original publication and the systems performance during the competition.","source":"arXiv","year":2018,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/1805.02371","pdf_url":"https://arxiv.org/pdf/1805.02371","is_open_access":true,"published_at":"2018-05-07T07:06:19Z","score":62},{"id":"arxiv_1702.05718","title":"Perceptual Compressive Sensing based on Contrast Sensitivity Function: Can we avoid non-visible redundancies acquisition?","authors":[{"name":"Seyed Hamid Safavi"},{"name":"Farah Torkamani-Azar"}],"abstract":"In this paper, we propose a novel CS approach in which the acquisition of non-visible information is also avoided.","source":"arXiv","year":2017,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/1702.05718","pdf_url":"https://arxiv.org/pdf/1702.05718","is_open_access":true,"published_at":"2017-02-19T08:21:20Z","score":61},{"id":"arxiv_1705.07788","title":"StegIbiza: Steganography in Club Music Implemented in Python","authors":[{"name":"Krzysztof Szczypiorski"},{"name":"Wojciech Zydecki"}],"abstract":"This paper introduces the implementation of steganography method called StegIbiza, which uses tempo modulation as hidden message carrier. With the use of Python scripting language, a bit string was encoded and decoded using WAV and MP3 files. Once the message was hidden into a music files, an internet radio was created to evaluate broadcast possibilities. No dedicated music or signal processing equipment was used in this StegIbiza implementation","source":"arXiv","year":2017,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/1705.07788","pdf_url":"https://arxiv.org/pdf/1705.07788","is_open_access":true,"published_at":"2017-05-22T14:56:49Z","score":61},{"id":"arxiv_1606.06152","title":"A Note on Efficiency of Downsampling and Color Transformation in Image Quality Assessment","authors":[{"name":"Hossein Ziaei Nafchi"},{"name":"Mohamed Cheriet"}],"abstract":"Several existing and successful full reference image quality assessment (IQA) models use linear color transformation and downsampling before measuring similarity or quality of images. This paper indicates to the right order of these two procedures and that the existing models have not chosen the more efficient approach. In addition, efficiency of these metrics is not compared in a fair basis in the literature.","source":"arXiv","year":2016,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/1606.06152","pdf_url":"https://arxiv.org/pdf/1606.06152","is_open_access":true,"published_at":"2016-06-20T14:52:47Z","score":60},{"id":"arxiv_1512.04354","title":"A proposal project for a blind image quality assessment by learning distortions from the full reference image quality assessments","authors":[{"name":"Stéfane Paris"}],"abstract":"This short paper presents a perspective plan to build a null reference image quality assessment. Its main goal is to deliver both the objective score and the distortion map for a given distorted image without the knowledge of its reference image.","source":"arXiv","year":2015,"language":"en","subjects":["cs.MM","cs.CV"],"doi":"10.1109/QoMEX.2012.6263876","url":"https://arxiv.org/abs/1512.04354","pdf_url":"https://arxiv.org/pdf/1512.04354","is_open_access":true,"published_at":"2015-11-04T12:21:04Z","score":59},{"id":"arxiv_1502.06103","title":"Compressive sensing based velocity estimation in video data","authors":[{"name":"Ana Miletic"},{"name":"Nemanja Ivanovic"}],"abstract":"This paper considers the use of compressive sensing based algorithms for velocity estimation of moving vehicles. The procedure is based on sparse reconstruction algorithms combined with time-frequency analysis applied to video data. This algorithm provides an accurate estimation of object's velocity even in the case of a very reduced number of available video frames. The influence of crucial parameters is analysed for different types of moving vehicles.","source":"arXiv","year":2015,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/1502.06103","pdf_url":"https://arxiv.org/pdf/1502.06103","is_open_access":true,"published_at":"2015-02-21T13:19:34Z","score":59},{"id":"arxiv_1407.7337","title":"A Digital Watermarking Approach Based on DCT Domain Combining QR Code and Chaotic Theory","authors":[{"name":"Qingbo Kang"},{"name":"Ke Li"},{"name":"Jichun Yang"}],"abstract":"This paper proposes a robust watermarking approach based on Discrete Cosine Transform domain that combines Quick Response Code and chaotic system.","source":"arXiv","year":2014,"language":"en","subjects":["cs.MM","cs.CR"],"doi":"10.1109/WOCN.2014.6923098","url":"https://arxiv.org/abs/1407.7337","pdf_url":"https://arxiv.org/pdf/1407.7337","is_open_access":true,"published_at":"2014-07-28T07:04:04Z","score":58},{"id":"arxiv_1407.4865","title":"Robust Lossless Semi Fragile Information Protection in Images","authors":[{"name":"Pushkar Dixit"},{"name":"Nishant Singh"},{"name":"Jay Prakash Gupta"}],"abstract":"Internet security finds it difficult to keep the information secure and to maintain the integrity of the data. Sending messages over the internet secretly is one of the major tasks as it is widely used for passing the message.","source":"arXiv","year":2014,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/1407.4865","pdf_url":"https://arxiv.org/pdf/1407.4865","is_open_access":true,"published_at":"2014-07-18T01:37:21Z","score":58},{"id":"arxiv_1105.0023","title":"Survey of Cognitive Radio Techniques in Wireless Network","authors":[{"name":"Lu Lu"}],"abstract":"In this report, I surveyed the cognitive radio technique in wireless networks. Researched several kinds of cognitive techniques about their advantages and disadvantages.","source":"arXiv","year":2011,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/1105.0023","pdf_url":"https://arxiv.org/pdf/1105.0023","is_open_access":true,"published_at":"2011-04-29T21:34:21Z","score":55},{"id":"arxiv_0906.0866","title":"Web Publishing of the Files Obtained by Flash","authors":[{"name":"Virgiliu Streian"},{"name":"Adela Ionescu"}],"abstract":"The aim of this article is to familiarize the user with the Web publishing of the files obtained by Flash. The article contains an overview of Macromedia Flash 5, as well as the running of a Playing Flash movie, information on Flash and Generator, the publishing of Flash movies, a HTLM publishing for Flash Player files and publishing by Generator templates.","source":"arXiv","year":2009,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/0906.0866","pdf_url":"https://arxiv.org/pdf/0906.0866","is_open_access":true,"published_at":"2009-06-04T09:52:22Z","score":53},{"id":"arxiv_0903.4314","title":"Virtual Reality","authors":[{"name":"Dan L. Lacrama"},{"name":"Dorina Fera"}],"abstract":"This paper is focused on the presentation of Virtual Reality principles together with the main implementation methods and techniques. An overview of the main development directions is included.","source":"arXiv","year":2009,"language":"en","subjects":["cs.MM"],"url":"https://arxiv.org/abs/0903.4314","pdf_url":"https://arxiv.org/pdf/0903.4314","is_open_access":true,"published_at":"2009-03-25T12:16:29Z","score":53},{"id":"arxiv_0809.0524","title":"Computer Art in the Former Soviet Bloc","authors":[{"name":"Eric Engle"}],"abstract":"Documents early computer art in the Soviet bloc and describes Marxist art theory.","source":"arXiv","year":2008,"language":"en","subjects":["cs.MM","cs.CY"],"url":"https://arxiv.org/abs/0809.0524","pdf_url":"https://arxiv.org/pdf/0809.0524","is_open_access":true,"published_at":"2008-09-02T21:48:28Z","score":52},{"id":"crossref_10.1071/ph610508","title":"A High Resolution Galactic Survey at 19·7 Mc/s","authors":[{"name":"The Late CA Shain"},{"name":"MM Komesaroff"},{"name":"CS Higgins"}],"abstract":"An extensive strip of the Southern Milky Way has been surveyed at 19· 7 Mc/s, using a Mills Cross with a pencil beam 1· 40 wide. The radio contours show a number of dark areas whose positions agree with those of optically-observed H II regions which at this frequency are soon in absorption. In addition, an intensity minimum along the galactic equator appears to represent the effect of absorption due to many H II regions extending to great distances in the galactic plane.","source":"CrossRef","year":1961,"language":"en","subjects":null,"doi":"10.1071/ph610508","url":"https://doi.org/10.1071/ph610508","pdf_url":"https://connectsci.au/ph/article-pdf/14/4/508/1347547/ph610508.pdf","is_open_access":true,"citations":30,"published_at":"","score":50.9},{"id":"crossref_10.1016/0042-207x(82)93812-x","title":"Automatic control of gas pressure in a vacuum system","authors":[{"name":"J Lucas"},{"name":"CS Smith"},{"name":"MM Meadows"}],"abstract":"","source":"CrossRef","year":1982,"language":"en","subjects":null,"doi":"10.1016/0042-207x(82)93812-x","url":"https://doi.org/10.1016/0042-207x(82)93812-x","is_open_access":true,"citations":5,"published_at":"","score":50.15}],"total":183353,"page":1,"page_size":20,"sources":["arXiv","CrossRef"],"query":"cs.MM"}