As an important part of urbanization, the development monitoring of newly constructed parks is of great significance for evaluating the effect of urban planning and optimizing resource allocation. However, traditional change detection methods based on remote sensing imagery have obvious limitations in high-level and intelligent analysis, and thus are difficult to meet the requirements of current urban planning and management. In face of the growing demand for complex multi-modal data analysis in urban park development monitoring, these methods often fail to provide flexible analysis capabilities for diverse application scenarios. This study proposes a multi-modal LLM agent framework, which aims to make full use of the semantic understanding and reasoning capabilities of LLM to meet the challenges in urban park development monitoring. In this framework, a general horizontal and vertical data alignment mechanism is designed to ensure the consistency and effective tracking of multi-modal data. At the same time, a specific toolkit is constructed to alleviate the hallucination issues of LLM due to the lack of domain-specific knowledge. Compared to vanilla GPT-4o and other agents, our approach enables robust multi-modal information fusion and analysis, offering reliable and scalable solutions tailored to the diverse and evolving demands of urban park development monitoring.
Lucy Jiang, Amy Seunghyun Lee, Jon E. Froehlich
et al.
Public art can hold cultural, social, political, and aesthetic significance, enriching urban environments and promoting well-being. However, a majority of urban art is inaccessible to blind and low vision (BLV) people. Most art access research has focused on private and curated settings (e.g., museums, galleries) and most urban access work has centered on outdoor navigation, leaving urban and public art accessibility largely understudied. We conducted semi-structured interviews with 16 BLV participants, using design probes featuring AI-generated descriptions and real-time AI interactions to investigate preferences for both discovering and engaging with urban art. We found that BLV people valued spontaneous art exploration, multisensory (e.g., tactile, auditory, olfactory) engagement, and detailed descriptions of culturally significant artwork. Participants also highlighted challenges distinct to urban art contexts: safety took precedence over art exploration, multisensory access measures could be disruptive to others in the public space, and inaccurate AI descriptions could lead to cultural erasure. Our contributions include empirical insights on BLV preferences for urban art discovery and engagement, seven design dimensions for public art access solutions, and implications for expanding HCI urban accessibility research beyond navigation.
As a vital element of the built heritage of East Asia, historical districts in the region encapsulate the collective memory of urban development, reflecting the distinctive urban fabrics and cultural characteristics of the area. However, with the rapid socioeconomic progress in East Asia, traditional historical districts are facing an increasing number of challenges associated with renewal and transformation. This study aims to evaluate the spatial organisation and structural evolution of Suzhou historical districts from a multiscale perspective, with a focus on understanding the conflict and integration between traditional core areas and modern redevelopment zones. Specifically, the study examines the influence of spatial configuration on pedestrian movement by analysing spatial variables such as integration, choice, total depth, synergy, and intelligibility. To address these challenges, the paper proposes a research approach grounded in space syntax to provide a comprehensive analysis and description of the district’s spatial organisation. The findings reveal the characteristics of spatial organisation and structure across various scales, emphasising the importance of spatial logicality and consistency. Moreover, the paper provides recommendations for optimising spatial organisation and structure and practical strategies for reconciling the relationship between traditional core areas and modern redevelopment zones.
Contemporary art museums in Thailand often fail to engage working-class communities, revealing a disconnect between institutional narratives and the lived realities of marginalized urban citizens. This study investigates how cultural exclusion reflects broader socio-economic inequality, contributing to the journal’s focus on the economic ramifications of urbanization. Through qualitative fieldwork and interviews at four institutions—BACC and MOCA (Thailand), Tate Modern (UK), and Pirelli HangarBicocca (Italy)—it identifies four key dimensions of alienation: psychological, spatial, socio-cultural, and economic. Drawing on Bourdieu’s concept of cultural capital and Lefebvre’s Right to the City, the research introduces a typology of alienation that functions as both a theoretical contribution and a practical tool. It demonstrates that exclusion stems not only from cost but also from curatorial tone, spatial design, and symbolic inaccessibility. By centering the perspectives of lower-income participants, this study contributes an interdisciplinary framework that bridges museology, urban studies, and critical ethnography. By situating cultural alienation within the socio-economic transformations of contemporary urbanization, the study demonstrates how exclusion from museums parallels broader patterns of economic inequality and urban segregation in Thailand. Ultimately, it argues that inclusive cultural infrastructure is essential for fostering urban resilience and democratic participation.
The study examines Value Added Tax (VAT) performance in Nepal and its impacts on economic growth using a mixed-method approach. An Auto-Regressive Distributed Lag (ARDL) model using annual time-series data (1998–2024) is employed to evaluate the VAT-growth relationship. The model uses the VAT rate, VAT share of total tax revenue, and C-efficiency ratio as key metrics of VAT design, structure, and performance. Additionally, a qualitative policy review explores VAT design, reform, and implementation challenges by incorporating insights from tax experts, academics, policymakers, and stakeholders. Findings reveal that C-efficiency ratio peaked at 61.40% in FY2018/19 and averaged 40.49% over the study period, suggesting moderate VAT performance. Furthermore, empirical results show that a 1% rise in C-efficiency and VAT revenue share increases Gross Domestic Product (GDP) per capita by 0.11% and 0.37% in the long run respectively. Conversely, a 1% rise in VAT rate decreases GDP per capita by 0.12% in the long run. These results validate that enhancing C-efficiency is more favorable to growth than raising the standard rate. Policy evaluation recommends cautiously applying a limited multi-rate VAT —reduced rates on essential goods and relatively higher rates on luxury and negative-externality goods— to lessen the tax burden on low-income groups. However, this requires prior reforms such as administrative modernization, institutional reform, development of a decentralized tax system, a digitalized system, mandatory e-invoicing, base broadening through reduced exemptions, and stronger compliance. This collective reform can increase revenue, support sustained economic growth, ensure social equity, and improve economic welfare in developing economies like Nepal.
Modeling and evaluation of automated vehicles (AVs) in mixed-autonomy traffic is essential prior to their safe and efficient deployment. This is especially important at urban junctions where complex multi-agent interactions occur. Current approaches for modeling vehicular maneuvers and interactions at urban junctions have limitations in formulating non-cooperative interactions and vehicle dynamics within a unified mathematical framework. Previous studies either assume predefined paths or rely on cooperation and central controllability, limiting their realism and applicability in mixed-autonomy traffic. This paper addresses these limitations by proposing a modeling framework for trajectory planning and decentralized vehicular control at urban junctions. The framework employs a bi-level structure where the upper level generates kinematically feasible reference trajectories using an efficient graph search algorithm with a custom heuristic function, while the lower level employs a predictive controller for trajectory tracking and optimization. Unlike existing approaches, our framework does not require central controllability or knowledge sharing among vehicles. The vehicle kinematics are explicitly incorporated at both levels, and acceleration and steering angle are used as control variables. This intuitive formulation facilitates analysis of traffic efficiency, environmental impacts, and motion comfort. The framework's decentralized structure accommodates operational and stochastic elements, such as vehicles' detection range, perception uncertainties, and reaction delay, making the model suitable for safety analysis. Numerical and simulation experiments across diverse scenarios demonstrate the framework's capability in modeling accurate and realistic vehicular maneuvers and interactions at various urban junctions, including unsignalized intersections and roundabouts.
Modern urban spaces are equipped with an increasingly diverse set of sensors, all producing an abundance of multimodal data. Such multimodal data can be used to identify and reason about important incidents occurring in urban landscapes, such as major emergencies, cultural and social events, as well as natural disasters. However, such data may be fragmented over several sources and difficult to integrate due to the reliance on human-driven reasoning for identifying relationships between the multimodal data corresponding to an incident, as well as understanding the different components which define an incident. Such relationships and components are critical to identifying the causes of such incidents, as well as producing forecasting the scale and intensity of future incidents as they begin to develop. In this work, we create SIGMUS, a system for Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces. SIGMUS uses Large Language Models (LLMs) to produce the necessary world knowledge for identifying relationships between incidents occurring in urban spaces and data from different modalities, allowing us to organize evidence and observations relevant to an incident without relying and human-encoded rules for relating multimodal sensory data with incidents. This organized knowledge is represented as a knowledge graph, organizing incidents, observations, and much more. We find that our system is able to produce reasonable connections between 5 different data sources (new article text, CCTV images, air quality, weather, and traffic measurements) and relevant incidents occurring at the same time and location.
Accurate prediction of wind flow fields in urban canopies is crucial for ensuring pedestrian comfort, safety, and sustainable urban design. Traditional methods using wind tunnels and Computational Fluid Dynamics, such as Large-Eddy Simulations (LES), are limited by high costs, computational demands, and time requirements. This study presents a deep neural network (DNN) approach for fast and accurate predictions of urban wind flow fields, reducing computation time from an order of 10 hours on 32 CPUs for one LES evaluation to an order of 1 second on a single GPU using the DNN model. We employ a U-Net architecture trained on LES data including 252 synthetic urban configurations at seven wind directions ($0^{o}$ to $90^{o}$ in $15^{o}$ increments). The model predicts two key quantities of interest: mean velocity magnitude and streamwise turbulence intensity, at multiple heights within the urban canopy. The U-net uses 2D building representations augmented with signed distance functions and their gradients as inputs, forming a $256\times256\times9$ tensor. In addition, a Spatial Attention Module is used for feature transfer through skip connections. The loss function combines the root-mean-square error of predictions, their gradient magnitudes, and L2 regularization. Model evaluation on 50 test cases demonstrates high accuracy with an overall mean relative error of 9.3% for velocity magnitude and 5.2% for turbulence intensity. This research shows the potential of deep learning approaches to provide fast, accurate urban wind assessments essential for creating comfortable and safe urban environments. Code is available at https://github.com/tvarg/Urban-FlowUnet.git
David Nazareno Campo, Javier Conde, Álvaro Alonso
et al.
The proliferation of Generative Artificial Ingelligence (AI), especially Large Language Models, presents transformative opportunities for urban applications through Urban Foundation Models. However, base models face limitations, as they only contain the knowledge available at the time of training, and updating them is both time-consuming and costly. Retrieval Augmented Generation (RAG) has emerged in the literature as the preferred approach for injecting contextual information into Foundation Models. It prevails over techniques such as fine-tuning, which are less effective in dynamic, real-time scenarios like those found in urban environments. However, traditional RAG architectures, based on semantic databases, knowledge graphs, structured data, or AI-powered web searches, do not fully meet the demands of urban contexts. Urban environments are complex systems characterized by large volumes of interconnected data, frequent updates, real-time processing requirements, security needs, and strong links to the physical world. This work proposes a real-time spatial RAG architecture that defines the necessary components for the effective integration of generative AI into cities, leveraging temporal and spatial filtering capabilities through linked data. The proposed architecture is implemented using FIWARE, an ecosystem of software components to develop smart city solutions and digital twins. The design and implementation are demonstrated through the use case of a tourism assistant in the city of Madrid. The use case serves to validate the correct integration of Foundation Models through the proposed RAG architecture.
Healthy urban forests comprising of diverse trees and shrubs play a crucial role in mitigating climate change. They provide several key advantages such as providing shade for energy conservation, and intercepting rainfall to reduce flood runoff and soil erosion. Traditional approaches for monitoring the health of urban forests require instrumented inspection techniques, often involving a high amount of human labor and subjective evaluations. As a result, they are not scalable for cities which lack extensive resources. Recent approaches involving multi-spectral imaging data based on terrestrial sensing and satellites, are constrained respectively with challenges related to dedicated deployments and limited spatial resolutions. In this work, we propose an alternative approach for monitoring the urban forests using simplified inputs: street view imagery, tree inventory data and meteorological conditions. We propose to use image-to-image translation networks to estimate two urban forest health parameters, namely, NDVI and CTD. Finally, we aim to compare the generated results with ground truth data using an onsite campaign utilizing handheld multi-spectral and thermal imaging sensors. With the advent and expansion of street view imagery platforms such as Google Street View and Mapillary, this approach should enable effective management of urban forests for the authorities in cities at scale.
This study examines the dynamics of Urban Land Use Succession (ULUS) in Upper Hill, Nairobi, highlighting the impact of neoliberal policies and private sector-led urban redevelopment. It investigates how land tenure, public infrastructure, and planning controls shape urban landscapes, leading to patchwork land use patterns and environmental misalignments. The case of Upper Hill, transitioning from a serene residential area to a bustling commercial hub, is explored to understand the determinants of ULUS and propose strategies for streamlined urban development. Employing Neoliberal Theory and hypothesis testing, the research identifies spatial policy as the primary driver of ULUS. The study suggests innovative approaches, including land assembly and the establishment of an Urban Redevelopment Authority, to harmonize urban development. These strategies aim to bridge the gap between private and public land development, ensuring coherent urban growth. The research contributes to the understanding of urban redevelopment, particularly in Kenyan contexts, by offering a model that integrates public and private interests. This model serves as a blueprint for managing urban transformation in Nairobi and other similar urban settings, promoting sustainable and equitable urban development.
Milad Malekzadeh, Elias Willberg, Jussi Torkko
et al.
The visual appeal of urban environments significantly impacts residents' satisfaction with their living spaces and their overall mood, which in turn, affects their health and well-being. Given the resource-intensive nature of gathering evaluations on urban visual appeal through surveys or inquiries from residents, there is a constant quest for automated solutions to streamline this process and support spatial planning. In this study, we applied an off-the-shelf AI model to automate the analysis of urban visual appeal, using over 1,800 Google Street View images of Helsinki, Finland. By incorporating the GPT-4 model with specified criteria, we assessed these images. Simultaneously, 24 participants were asked to rate the images. Our results demonstrated a strong alignment between GPT-4 and participant ratings, although geographic disparities were noted. Specifically, GPT-4 showed a preference for suburban areas with significant greenery, contrasting with participants who found these areas less appealing. Conversely, in the city centre and densely populated urban regions of Helsinki, GPT-4 assigned lower visual appeal scores than participant ratings. While there was general agreement between AI and human assessments across various locations, GPT-4 struggled to incorporate contextual nuances into its ratings, unlike participants, who considered both context and features of the urban environment. The study suggests that leveraging AI models like GPT-4 allows spatial planners to gather insights into the visual appeal of different areas efficiently, aiding decisions that enhance residents' and travellers' satisfaction and mental health. Although AI models provide valuable insights, human perspectives are essential for a comprehensive understanding of urban visual appeal. This will ensure that planning and design decisions promote healthy living environments effectively.
Nádia Aparecida de Oliveira Silva, Rodrigo Moreira, Larissa Ferreira Rodrigues
et al.
People with visual impairments struggle with urban mobility and independent travel, opening up opportunities for technological advances to improve their quality of life. The Internet of Things (IoT) plays an essential role in bringing improvements and accessibility for visually impaired people. Although alternatives aimed to use IoT in urban mobility, those solutions are still in the initial stages and do not supports urban mobility for people with visual impairment. This paper proposed and evaluated a low-cost IoT architecture that uses Single-Border Computers (SBCs) to support urban mobility. A performance evaluation showcased that our low-cost architecture handles bus trace workload and is suitable for supporting impaired people to get information concerning bus location on Smart Cities scenarios.
In light of the rapid global urbanization, urban design has been shown to contribute largely to promoting the health and well-being of urban citizens. However, studies of urban design are underrepresented in low- and middle-income countries in Asia, where urban forms are traditionally compact and complex with multiple layers. Hanoi, a typical city in low- and middle-income countries, exhibits five unique urban typologies generated through official planning, unregulated development, and historical fluctuations. This study examines the perceived urban design from a sample of 218 participants across five urban typologies in Hanoi using an established scale. The findings suggest that perceived urban design is significantly influenced by urban typologies. Old urban typologies tend to report higher scores of land use mix and access to services but lower scores of walking facilities and street connectivity than modern urban typologies. The study contributes to our understanding of urban design in Hanoi, providing policymakers and urban designers with essential insights for sustainable urban development.
This study investigates the relationship between street trading and urban planning in Enugu City, Nigeria, within the expanding informal economy of the global South. It particularly focuses on the perspectives of urban planners regarding the impacts and management of street trading. The research employed a mixed-method approach, including personal observation, questionnaires, and in-depth interviews, analyzed through basic statistical methods. Findings reveal that urban planners recognize the socio-economic importance and cultural relevance of street trading, despite its negative spatial externalities. Contrary to prevailing assumptions, planners favour negotiated solutions over forced evictions. This study highlights the need for inclusive urban planning practices that accommodate the socio-economic benefits of street trading while addressing its challenges, contributing to the discourse on sustainable urban development.
Narjes GHaempanah, Ebrahim Molavi, Mohammad Omidvarian
Urban walls hold significant influence over the quality of urban spaces, making them one of the most impactful environmental elements. Enhancing their quality contributes to an overall improvement in the physical aspects of a city. While the urban landscape and its aesthetic indicators are well-established, the question arises regarding the influential components that shape the urban landscape. Hence, the objective of this scholarly article is two-fold: to identify effective solutions for enhancing the aesthetic dimension of urban walls in terms of their objective-physical aspects, and to prioritize these solutions based on the perspective of citizens who frequent the Eram sidewalk in Qom city. Determining the sample size was achieved through Cochran's formula, resulting in a total of 384 participants. The research methodology employed was practical and aligned with the nature of the study, employing a descriptive-quantitative approach to answer research questions. To analyze the collected data, the Lisrel software was employed for designing the structural equation. The research findings highlight the need for improvement in various components of the urban landscape when it comes to the walls of the Eram pedestrian area. These components include enhancing the beauty and visual variety in wall decorations, paying attention to facade additions, and diversifying the texture of materials utilized. Notably, citizens rated the absence of disturbances and visual disruptions as positive and appropriate, assigning it a high impact factor of 0.82. This component emerged as their top priority when indicating their visual preferences. By addressing these findings and incorporating the indicated solutions, urban planners and policymakers can effectively enhance the aesthetic appeal of urban walls, thereby improving the overall perception and experience of citizens within urban spaces.
The Abbas Abad area is one of the most significant natural and historical landmarks in Tehran. It has been developed and utilized over the past 17 years to establish a tourism and cultural hub for Iran's capital. However, with new activity areas being introduced, changes in leisure patterns and citizens' needs, and management conflicts arising from independent organizations established in the region, it is crucial to manage the complex's activities effectively. This requires attention to changes, a review of requirements and capacities, and consideration of present situations. To address these challenges, this study employed documentary research, field observations, in-depth interviews with stakeholders, and questionnaires to survey different visitor groups during various months and seasons throughout the year. The results offer appropriate solutions to improve the utilization status of the complex. These solutions include increasing diversity and functional mixing within the complex, particularly emphasizing the private sector's role in cultural activities. Additionally, providing basic comfort conditions in open spaces, improving communication methods, and upgrading management infrastructures in accordance with exploitation are essential steps towards enhancing the complex's management.
Urban region profiling from web-sourced data is of utmost importance for urban planning and sustainable development. We are witnessing a rising trend of LLMs for various fields, especially dealing with multi-modal data research such as vision-language learning, where the text modality serves as a supplement information for the image. Since textual modality has never been introduced into modality combinations in urban region profiling, we aim to answer two fundamental questions in this paper: i) Can textual modality enhance urban region profiling? ii) and if so, in what ways and with regard to which aspects? To answer the questions, we leverage the power of Large Language Models (LLMs) and introduce the first-ever LLM-enhanced framework that integrates the knowledge of textual modality into urban imagery profiling, named LLM-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining (UrbanCLIP). Specifically, it first generates a detailed textual description for each satellite image by an open-source Image-to-Text LLM. Then, the model is trained on the image-text pairs, seamlessly unifying natural language supervision for urban visual representation learning, jointly with contrastive loss and language modeling loss. Results on predicting three urban indicators in four major Chinese metropolises demonstrate its superior performance, with an average improvement of 6.1% on R^2 compared to the state-of-the-art methods. Our code and the image-language dataset will be released upon paper notification.