Hasil untuk "Art"

Menampilkan 20 dari ~3678753 hasil · dari CrossRef, arXiv, DOAJ, Semantic Scholar

JSON API
S2 Open Access 2024
The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey et al.

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

14014 sitasi en Computer Science
S2 Open Access 2023
3D Gaussian Splatting for Real-Time Radiance Field Rendering

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler et al.

Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

7793 sitasi en Computer Science
S2 Open Access 2023
Improved Baselines with Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Yuheng Li et al.

Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this paper, we present the first systematic study to investigate the design choices of LMMs in a controlled setting under the LLaVA framework. We show that the fully-connected vision-language connector in LLaVA is surprisingly power-ful and data-efficient. With simple modifications to LLa VA, namely, using CLIP- ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks. Our final 13B checkpoint uses merely 1.2M publicly available data, and finishes full training in ~ 1 day on a single 8-AI00 node. Furthermore, we present some early exploration of open problems in LMMs, including scaling to higher resolution inputs, compositional capabilities, and model hallucination, etc. We hope this makes state-of-the-art LMM research more accessible. Code and model will be publicly available.

4819 sitasi en Computer Science
S2 Open Access 2017
A Simple Yet Effective Baseline for 3d Human Pose Estimation

Julieta Martinez, Rayat Hossain, J. Romero et al.

Following the success of deep convolutional networks, state-of-the-art methods for 3d human pose estimation have focused on deep end-to-end systems that predict 3d joint locations given raw image pixels. Despite their excellent performance, it is often not easy to understand whether their remaining error stems from a limited 2d pose (visual) understanding, or from a failure to map 2d poses into 3- dimensional positions.,,With the goal of understanding these sources of error, we set out to build a system that given 2d joint locations predicts 3d positions. Much to our surprise, we have found that, with current technology, "lifting" ground truth 2d joint locations to 3d space is a task that can be solved with a remarkably low error rate: a relatively simple deep feedforward network outperforms the best reported result by about 30% on Human3.6M, the largest publicly available 3d pose estimation benchmark. Furthermore, training our system on the output of an off-the-shelf state-of-the-art 2d detector (i.e., using images as input) yields state of the art results – this includes an array of systems that have been trained end-to-end specifically for this task. Our results indicate that a large portion of the error of modern deep 3d pose estimation systems stems from their visual analysis, and suggests directions to further advance the state of the art in 3d human pose estimation.

1475 sitasi en Computer Science
S2 Open Access 2014
Striving for Simplicity: The All Convolutional Net

Jost Tobias Springenberg, Alexey Dosovitskiy, T. Brox et al.

Most modern convolutional neural networks (CNNs) used for object recognition are built using the same principles: Alternating convolution and max-pooling layers followed by a small number of fully connected layers. We re-evaluate the state of the art for object recognition from small images with convolutional networks, questioning the necessity of different components in the pipeline. We find that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks. Following this finding -- and building on other recent work for finding simple network structures -- we propose a new architecture that consists solely of convolutional layers and yields competitive or state of the art performance on several object recognition datasets (CIFAR-10, CIFAR-100, ImageNet). To analyze the network we introduce a new variant of the "deconvolution approach" for visualizing features learned by CNNs, which can be applied to a broader range of network structures than existing approaches.

4962 sitasi en Mathematics, Computer Science
S2 Open Access 1997
Hypothalamic expression of ART, a novel gene related to agouti, is up-regulated in obese and diabetic mutant mice.

J. Shutter, M. Graham, C. Kinsey et al.

We have isolated cDNA clones that encode a novel human gene related to agouti. Sequence analysis of this gene, named ART, for agouti-related transcript, predicts a 132-amino-acid protein that is 25% identical to human agouti. The highest degree of identity is within the carboxyl terminus of both proteins. Like agouti, ART contains a putative signal sequence and a cysteine rich carboxyl terminus, but lacks the region of basic residues and polyproline residues found in the middle of the agouti protein. Both agouti and ART contain 11 cysteines, and 9 of these are conserved spatially. ART is expressed primarily in the adrenal gland, subthalamic nucleus, and hypothalamus, with a lower level of expression occurring in testis, lung, and kidney. The murine homolog of ART was also isolated and is predicted to encode a 131-amino-acid protein that shares 81% amino acid identity to humans. The mouse was found to have the same expression pattern as human when assessed by RT-PCR. Examination by in situ hybridization using mouse tissues showed localized expression in the arcuate nucleus of the hypothalamus, the median eminence, and the adrenal medulla. In addition, the hypothalamic expression of ART was elevated approximately 10-fold in ob/ob and db/db mice. ART was mapped to human chromosome 16q22 and to mouse chromosome 8D1-D2. The expression pattern and transcriptional regulation of ART, coupled with the known actions of agouti, suggests a role for ART in the regulation of melanocortin receptors within the hypothalamus and adrenal gland, and implicates this novel gene in the central control of feeding.

680 sitasi en Biology, Medicine
arXiv Open Access 2026
Exploring a Multimodal Chatbot as a Facilitator in Therapeutic Art Activity

Le Lin, Zihao Zhu, Rainbow Tin Hung Ho et al.

Therapeutic art activities, such as expressive drawing and painting, require the synergy between creative visual production and interactive dialogue. Recent advancements in Multimodal Large Language Models (MLLMs) have expanded the capacity of computing systems to interpret both textual and visual data, offering a new frontier for AI-mediated therapeutic support. This work-in-progress paper introduces an MLLM-powered chatbot that analyzes visual creation in real-time while engaging the creator in reflective conversations. We conducted an evaluation with five experts in art therapy and related fields, which demonstrated the chatbot's potential to facilitate therapeutic engagement, and highlighted several areas for future development, including entryways and risk management, bespoke alignment of user profile and therapeutic style, balancing conversational depth and width, and enriching visual interactivity. These themes provide a design roadmap for designing the future AI-mediated creative expression tools.

en cs.HC
arXiv Open Access 2026
Does AI See like Art Historians? Interpreting How Vision Language Models Recognize Artistic Style

Marvin Limpijankit, Milad Alshomary, Yassin Oulad Daoud et al.

VLMs have become increasingly proficient at a range of computer vision tasks, such as visual question answering and object detection. This includes increasingly strong capabilities in the domain of art, from analyzing artwork to generation of art. In an interdisciplinary collaboration between computer scientists and art historians, we characterize the mechanisms underlying VLMs' ability to predict artistic style and assess the extent to which they align with the criteria art historians use to reason about artistic style. We employ a latent-space decomposition approach to identify concepts that drive art style prediction and conduct quantitative evaluations, causal analysis and assessment by art historians. Our findings indicate that 73% of the extracted concepts are judged by art historians to exhibit a coherent and semantically meaningful visual feature and 90% of concepts used to predict style of a given artwork were judged relevant. In cases where an irrelevant concept was used to successfully predict style, art historians identified possible reasons for its success; for example, the model might "understand" a concept in more formal terms, such as dark/light contrasts.

en cs.CV, cs.AI
arXiv Open Access 2025
A Relational (Re)Turn: Revisit Interactive Art through Interaction and Aesthetics

Aven-Le Zhou

This paper revisits the concept of interaction in interactive art, tracing its evolution from sociocultural origins to its narrowing within human-computer paradigms. It critiques this reduction and proposes a relational (re)turn through reclaiming interaction as intersubjective and relational. Through a synthesis of aesthetic theories and case studies from Ars Electronica, the paper introduces Techno Relational Aesthetics, a new conceptual lens that emphasizes technologically mediated relationality. This approach expands interactive art beyond audience-artwork interaction and opens the possibility to broader relational practices.

en cs.CY
arXiv Open Access 2025
Is Journal Citation Indicator a good metric for Art & Humanities Journals currently?

Yu Liao, Li Li, Zhesi Shen

Probably Not. Journal Citation Indicator (JCI) was introduced to address the limitations of traditional metrics like the Journal Impact Factor (JIF), particularly its inability to normalize citation impact across different disciplines. This study reveals that JCI faces significant challenges in field normalization for Art & Humanities journals, as evidenced by much lower correlations with a more granular, paper-level metric, CNCI-CT. A detailed analysis of Architecture journals highlights how journal-level misclassification and the interdisciplinary nature of content exacerbate these issues, leading to less reliable evaluations. We recommend improving journal classification systems or adopting paper-level normalization methods, potentially supported by advanced AI techniques, to enhance the accuracy and effectiveness of JCI for Art & Humanities disciplines.

en cs.DL
arXiv Open Access 2025
The Art of Tool Interface Design

Yunnan Wu, Paul Chen, Deshank Baranwal et al.

We present an agentic framework, Thinker, which achieves state of art performance in challenging reasoning tasks for realistic customer service scenarios that involve complex business logic and human interactions via long horizons. On the $τ$-bench retail dataset, Thinker achieves 82.6\% success rate with GPT-4o (version 2024-06-01) (baseline: 68.3\%), and 81.9\% success rate with Llama-3.1 405B (baseline: 49.6\%), without any fine-tuning. Thinker effectively closes the gap in reasoning capabilities between the base models by introducing proper structure. The key features of the Thinker framework are: (1) State-Machine Augmented Generation (SMAG), which represents business logic as state machines and the LLM uses state machines as tools. (2) Delegation of tasks from the main reasoning loop to LLM-powered tools. (3) Adaptive context management. Our prompting-only solution achieves signficant gains, while still maintaining a standard agentic architecture with a ReAct style reasoning loop. The key is to innovate on the tool interface design, as exemplified by SMAG and the LLM-powered tools.

en cs.AI
arXiv Open Access 2025
GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art

Yiming Lei, Chenkai Zhang, Zeming Liu et al.

Video Comment Art enhances user engagement by providing creative content that conveys humor, satire, or emotional resonance, requiring a nuanced and comprehensive grasp of cultural and contextual subtleties. Although Multimodal Large Language Models (MLLMs) and Chain-of-Thought (CoT) have demonstrated strong reasoning abilities in STEM tasks (e.g. mathematics and coding), they still struggle to generate creative expressions such as resonant jokes and insightful satire. Moreover, existing benchmarks are constrained by their limited modalities and insufficient categories, hindering the exploration of comprehensive creativity in video-based Comment Art creation. To address these limitations, we introduce GODBench, a novel benchmark that integrates video and text modalities to systematically evaluate MLLMs' abilities to compose Comment Art. Furthermore, inspired by the propagation patterns of waves in physics, we propose Ripple of Thought (RoT), a multi-step reasoning framework designed to enhance the creativity of MLLMs. Extensive experiments reveal that existing MLLMs and CoT methods still face significant challenges in understanding and generating creative video comments. In contrast, RoT provides an effective approach to improve creative composing, highlighting its potential to drive meaningful advancements in MLLM-based creativity. GODBench is publicly available at https://github.com/stan-lei/GODBench-ACL2025.

en cs.CL, cs.AI
S2 Open Access 1999
THE ART OF FRAME THEORY

P. Casazza

The theory of frames for a Hilbert space plays a fundamental role in signal processing, image processing, data compression, sampling theory and more, as well as being a fruitful area of research in abstract mathematics. In this “tutorial” on abstract frame theory, we will try to point out the major directions of research in abstract frame theory and give some sample techniques from each of the areas. We will also bring out some of the important open questions, discuss some of the limitations of the existing theory, and point to some new directions for research.

619 sitasi en Mathematics
arXiv Open Access 2024
Training A Small Emotional Vision Language Model for Visual Art Comprehension

Jing Zhang, Liang Zheng, Meng Wang et al.

This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language. While small models are computationally efficient, their capacity is much limited compared with large models. To break this trade-off, this paper builds a small emotional vision language model (SEVLM) by emotion modeling and input-output feature alignment. On the one hand, based on valence-arousal-dominance (VAD) knowledge annotated by psychology experts, we introduce and fuse emotional features derived through VAD dictionary and a VAD head to align VAD vectors of predicted emotion explanation and the ground truth. This allows the vision language model to better understand and generate emotional texts, compared with using traditional text embeddings alone. On the other hand, we design a contrastive head to pull close embeddings of the image, its emotion class, and explanation, which aligns model outputs and inputs. On two public affective explanation datasets, we show that the proposed techniques consistently improve the visual art understanding performance of baseline SEVLMs. Importantly, the proposed model can be trained and evaluated on a single RTX 2080 Ti while exhibiting very strong performance: it not only outperforms the state-of-the-art small models but is also competitive compared with LLaVA 7B after fine-tuning and GPT4(V). The code is available at https://github.com/BetterZH/SEVLM-code.

en cs.CV
arXiv Open Access 2024
AttnMod: Attention-Based New Art Styles

Shih-Chieh Su

We introduce AttnMod, a training-free technique that modulates cross-attention in pre-trained diffusion models to generate novel, unpromptable art styles. The method is inspired by how a human artist might reinterpret a generated image, for example by emphasizing certain features, dispersing color, twisting silhouettes, or materializing unseen elements. AttnMod simulates this intent by altering how the text prompt conditions the image through attention during denoising. These targeted modulations enable diverse stylistic transformations without changing the prompt or retraining the model, and they expand the expressive capacity of text-to-image generation.

en cs.CV, cs.AI

Halaman 23 dari 183938