This thesis explores a multimodal AI framework for enhancing construction safety through the combined analysis of textual and visual data. In safety-critical environments such as construction sites, accident data often exists in multiple formats, such as written reports, inspection records, and site imagery, making it challenging to synthesize hazards using traditional approaches. To address this, this thesis proposed a multimodal AI framework that combines text and image analysis to assist in identifying safety hazards on construction sites. Two case studies were consucted to evaluate the capabilities of large language models (LLMs) and vision-language models (VLMs) for automated hazard identification.The first case study introduces a hybrid pipeline that utilizes GPT 4o and GPT 4o mini to extract structured insights from a dataset of 28,000 OSHA accident reports (2000-2025). The second case study extends this investigation using Molmo 7B and Qwen2 VL 2B, lightweight, open-source VLMs. Using the public ConstructionSite10k dataset, the performance of the two models was evaluated on rule-level safety violation detection using natural language prompts. This experiment served as a cost-aware benchmark against proprietary models and allowed testing at scale with ground-truth labels. Despite their smaller size, Molmo 7B and Quen2 VL 2B showed competitive performance in certain prompt configurations, reinforcing the feasibility of low-resource multimodal systems for rule-aware safety monitoring.
Hereby we propose a Bayesian method of estimation for the semiparametric Additive Hazards Model (AHM) from Survival Analysis under right-censoring. With this aim, we review the AHM revisiting the likelihood function, so as to comment on the challenges posed by Bayesian estimation from the full likelihood. Through an algorithmic reformulation of that likelihood, we present an alternative method based on a hybrid Bayesian treatment that exploits Lin and Ying (1994) estimating equation approach and which chooses tractable priors for the parameters. We obtain the estimators from the posterior distributions in closed form, we perform a small simulation experiment, and lastly, we illustrate our method with the classical Nickels Miners dataset and a brief simulation experiment.
Mert Ketenci, Vincent Jeanselme, Harry Reyes Nieva
et al.
Survival analysis is a fundamental tool for modeling time-to-event outcomes in healthcare. Recent advances have introduced flexible neural network approaches for improved predictive performance. However, most of these models do not provide interpretable insights into the association between exposures and the modeled outcomes, a critical requirement for decision-making in clinical practice. To address this limitation, we propose Additive Deep Hazard Analysis Mixtures (ADHAM), an interpretable additive survival model. ADHAM assumes a conditional latent structure that defines subgroups, each characterized by a combination of covariate-specific hazard functions. To select the number of subgroups, we introduce a post-training refinement that reduces the number of equivalent latent subgroups by merging similar groups. We perform comprehensive studies to demonstrate ADHAM's interpretability at the population, subgroup, and individual levels. Extensive experiments on real-world datasets show that ADHAM provides novel insights into the association between exposures and outcomes. Further, ADHAM remains on par with existing state-of-the-art survival baselines in terms of predictive performance, offering a scalable and interpretable approach to time-to-event prediction in healthcare.
Autonomous driving systems face significant challenges in handling unpredictable edge-case scenarios, such as adversarial pedestrian movements, dangerous vehicle maneuvers, and sudden environmental changes. Current end-to-end driving models struggle with generalization to these rare events due to limitations in traditional detection and prediction approaches. To address this, we propose INSIGHT (Integration of Semantic and Visual Inputs for Generalized Hazard Tracking), a hierarchical vision-language model (VLM) framework designed to enhance hazard detection and edge-case evaluation. By using multimodal data fusion, our approach integrates semantic and visual representations, enabling precise interpretation of driving scenarios and accurate forecasting of potential dangers. Through supervised fine-tuning of VLMs, we optimize spatial hazard localization using attention-based mechanisms and coordinate regression techniques. Experimental results on the BDD100K dataset demonstrate a substantial improvement in hazard prediction straightforwardness and accuracy over existing models, achieving a notable increase in generalization performance. This advancement enhances the robustness and safety of autonomous driving systems, ensuring improved situational awareness and potential decision-making in complex real-world scenarios.
Benjamin Feuer, Micah Goldblum, Teresa Datta
et al.
The release of ChatGPT in November 2022 sparked an explosion of interest in post-training and an avalanche of new preference optimization (PO) methods. These methods claim superior alignment by virtue of better correspondence with human pairwise preferences, often measured by LLM-judges. In this work, we attempt to answer the following question -- do LLM-judge preferences translate to progress on other, more concrete metrics for alignment, and if not, why not? We define a concrete metric for alignment, and introduce SOS-Bench (Substance Outweighs Style Benchmark), which is to the best of our knowledge the largest standardized, reproducible LLM meta-benchmark to date. We find that (1) LLM-judge preferences do not correlate with concrete measures of safety, world knowledge, and instruction following; (2) LLM-judges have powerful implicit biases, prioritizing style over factuality and safety; and (3) the supervised fine-tuning (SFT) stage of post-training, and not the PO stage, has the greatest impact on alignment, with data scaling and prompt diversity as the driving factors. Our codebase and complete results can be found at https://github.com/penfever/sos-bench.
I study a moral hazard problem between a principal and multiple agents who experience positive peer effects represented by a (weighted) network. Under the optimal linear contract, the principal provides high-powered incentives to central agents in the network in order to exploit the larger incentive spillovers such agents create. The analysis reveals a novel measure of network centrality that captures rich channels of direct and indirect incentive spillovers and characterizes the optimal contract and its induced equilibrium efforts. The notion of centrality relevant for incentive spillovers in the model emphasizes the role of pairs of agents who link to common neighbors in the network. This characterization leads to a measure of marginal network effects and identifies the agents whom the principal targets with stronger incentives in response to the addition (or strengthening) of a link. When the principal can position agents with heterogeneous costs of effort in the network, the principal prefers to place low-cost agents in central positions. The results shed light on how firms can increase productivity through corporate culture, office layout, and social interactions.
There is currently a focus on statistical methods which can use historical trial information to help accelerate the discovery, development and delivery of medicine. Bayesian methods can be constructed so that the borrowing is "dynamic" in the sense that the similarity of the data helps to determine how much information is used. In the time to event setting with one historical data set, a popular model for a range of baseline hazards is the piecewise exponential model where the time points are fixed and a borrowing structure is imposed on the model. Although convenient for implementation this approach effects the borrowing capability of the model. We propose a Bayesian model which allows the time points to vary and a dependency to be placed between the baseline hazards. This serves to smooth the posterior baseline hazard improving both model estimation and borrowing characteristics. We explore a variety of prior structures for the borrowing within our proposed model and assess their performance against established approaches. We demonstrate that this leads to improved type I error in the presence of prior data conflict and increased power. We have developed accompanying software which is freely available and enables easy implementation of the approach.
Consumers are sensitive to medical prices when consuming care, but delays in price information may distort moral hazard. We study how medical bills affect household spillover spending following utilization, leveraging variation in insurer claim processing times. Households increase spending by 22\% after a scheduled service, but then reduce spending by 11\% after the bill arrives. Observed bill effects are consistent with resolving price uncertainty; bill effects are strongest when pricing information is particularly salient. A model of demand for healthcare with delayed pricing information suggests households misperceive pricing signals prior to bills, and that correcting these perceptions reduce average (median) spending by 16\% (7\%) annually.
In many applications of survival data analysis, the individuals are treated in different medical centres or belong to different clusters defined by geographical or administrative regions. The analysis of such data requires accounting for between-cluster variability. Ignoring such variability would impose unrealistic assumptions in the analysis and could affect the inference on the statistical models. We develop a novel parametric mixed-effects general hazard (MEGH) model that is particularly suitable for the analysis of clustered survival data. The proposed structure generalises the mixed-effects proportional hazards (MEPH) and mixed-effects accelerated failure time (MEAFT) structures, among other structures, which are obtained as special cases of the MEGH structure. We develop a likelihood-based algorithm for parameter estimation in general subclasses of the MEGH model, which is implemented in our R package {\tt MEGH}. We propose diagnostic tools for assessing the random effects and their distributional assumption in the proposed MEGH model. We investigate the performance of the MEGH model using theoretical and simulation studies, as well as a real data application on leukemia.
Transporting hazardous materials (hazmats) using tank cars has more significant economic benefits than other transportation modes. Although railway transportation is roughly four times more fuel-efficient than roadway transportation, a train derailment has greater potential to cause more disastrous consequences than a truck incident. Train types, such as unit train or manifest train (also called mixed train), can influence transport risks in several ways. For example, unit trains only experience risks on mainlines and when arriving at or departing from terminals, while manifest trains experience additional switching risks in yards. Based on prior studies and various data sources covering the years 1996-2018, this paper constructs event chains for line-haul risks on mainlines (for both unit trains and manifest trains), arrival/departure risks in terminals (for unit trains) and yards (for manifest trains), and yard switching risks for manifest trains using various probabilistic models, and finally determines expected casualties as the consequences of a potential train derailment and release incident. This is the first analysis to quantify the total risks a train may encounter throughout the shipment process, either on mainlines or in yards/terminals, distinguishing train types. It provides a methodology applicable to any train to calculate the expected risks (quantified as expected casualties in this paper) from an origin to a destination.
Multiple small- to middle-scale cities, mostly located in northern China, became epidemic hotspots during the second wave of the spread of COVID-19 in early 2021. Despite qualitative discussions of potential social-economic causes, it remains unclear how this pattern could be accounted for from a quantitative approach. Through the development of an urban epidemic hazard index (EpiRank), we came up with a mathematical explanation for this phenomenon. The index is constructed from epidemic simulations on a multi-layer transportation network model on top of local SEIR transmission dynamics, which characterizes intra- and inter-city compartment population flow with a detailed mathematical description. Essentially, we argue that these highlighted cities possess greater epidemic hazards due to the combined effect of large regional population and small inter-city transportation. The proposed index, dynamic and applicable to different epidemic settings, could be a useful indicator for the risk assessment and response planning of urban epidemic hazards in China; the model framework is modularized and can be adapted for other nations without much difficulty.
We study optimal rating design under moral hazard and strategic manipulation. An intermediary observes a noisy indicator of effort and commits to a rating policy that shapes market beliefs and pay. We characterize optimal ratings via concavification of a gain function. Optimal ratings depends on interaction of effort and risk: for activities that raise tail risk, optimal ratings exhibit lower censorship, pooling poor outcomes to insure and encourage risk-taking; for activities that reduce tail risk, upper censorship increases penalties for negligence. In multi-task environments with window dressing, less informative ratings deter manipulation. In redistributive test design, optimal tests exhibit mid-censorship.
This work presents novel and powerful tests for comparing non-proportional hazard functions, based on sample-space partitions. Right censoring introduces two major difficulties which make the existing sample-space partition tests for uncensored data non-applicable: (i) the actual event times of censored observations are unknown; and (ii) the standard permutation procedure is invalid in case the censoring distributions of the groups are unequal. We overcome these two obstacles, introduce invariant tests, and prove their consistency. Extensive simulations reveal that under non-proportional alternatives, the proposed tests are often of higher power compared with existing popular tests for non-proportional hazards. Efficient implementation of our tests is available in the R package KONPsurv, which can be freely downloaded from {https://github.com/matan-schles/KONPsurv
It is shown that considering a fixed increment of a given magnitude at a fault is equivalent to factoring the mechanical moment at the fault as done in structural engineering with the applied loads, by the most currently used structural engineering standards (e.g. Eurocodes). A special safety factor is introduced and related to the partial factor acting on the mechanical moment representing the fault. A comparison is then made between the hazard maps obtained for Italy with the Neo Deterministic Seismic Hazard Assessment (NDSHA) technique, using two approaches for the definition of the seismic sources considered for the computation of the synthetic seismograms. The first one is based on the magnitude of the events listed in the earthquake catalogue, and located within the active seismogenic zones. This is the standard approach, used in most of the NDSHA computations performed up to now. It is adequate for countries, like Italy, where the catalogue can be reasonably considered complete for events of M=5 and above. When the catalogue completeness is barely adequate, another approach can be adopted for the definition of the earthquake sources. It uses the seismogenic nodes identified by means of pattern recognition techniques applied to morphostructural seismic zonation (MSZ), and increases the reference magnitude by a constant variation tuned thanks to the safety factor. The two approaches have been compared for Italy using the safety factor 2.0: they mostly produce comparable hazard maps. As the two sets are fully independent and the catalogue is very long, this implies a validation of the seismogenic nodes method and a tuning of the safety factor at about 2. Notable exception is seen in the Central Alps, where nodes tend to overestimate the "observed" hazard, and in southeastern Sicily, where the nodes underestimation is only apparent and can be negligible within experimental errors.
Jean Feng, David A. Shaw, Vladimir N. Minin
et al.
Antibodies, an essential part of our immune system, develop through an intricate process to bind a wide array of pathogens. This process involves randomly mutating DNA sequences encoding these antibodies to find variants with improved binding, though mutations are not distributed uniformly across sequence sites. Immunologists observe this nonuniformity to be consistent with "mutation motifs", which are short DNA subsequences that affect how likely a given site is to experience a mutation. Quantifying the effect of motifs on mutation rates is challenging: a large number of possible motifs makes this statistical problem high dimensional, while the unobserved history of the mutation process leads to a nontrivial missing data problem. We introduce an $\ell_1$-penalized proportional hazards model to infer mutation motifs and their effects. In order to estimate model parameters, our method uses a Monte Carlo EM algorithm to marginalize over the unknown ordering of mutations. We show that our method performs better on simulated data compared to current methods and leads to more parsimonious models. The application of proportional hazards to mutation processes is, to our knowledge, novel and formalizes the current methods in a statistical framework that can be easily extended to analyze the effect of other biological features on mutation rates.
The Dantzig selector for the proportional hazards model proposed by D.R. Cox is studied in a high-dimensional and sparse setting. We prove the $l_q$ consistency for all $q \geq 1$ of some estimators based on the compatibility factor, the weak cone invertibility factor, and the restricted eigenvalue for certain deterministic matrix which approximates the Hessian matrix of log partial likelihood. Our matrix conditions for these three factors are weaker than those of previous researches.
We consider the smoothed maximum likelihood estimator and the smoothed Grenander-type estimator for a monotone baseline hazard rate $λ_0$ in the Cox model. We analyze their asymptotic behavior and show that they are asymptotically normal at rate $n^{m/(2m+1)}$, when~$λ_0$ is $m\geq 2$ times continuously differentiable, and that both estimators are asymptotically equivalent. Finally, we present numerical results on pointwise confidence intervals that illustrate the comparable behavior of the two methods.
Mohammad Nourmohammadi, Mohammad Jafari Jozani, Brad Johnson
\noindent Randomized nomination sampling (RNS) is a rank-based sampling technique which has been shown to be effective in several nonparametric studies involving environmental and ecological applications. In this paper, we investigate parametric inference using RNS design for estimating the unknown vector of parameters $\boldsymbolθ$ in the proportional hazard rate and proportional reverse hazard rate models. We examine both maximum likelihood (ML) and method of moments (MM) methods and investigate the relative precision of our proposed RNS-based estimators compared with those based on simple random sampling (SRS). We introduce four types of RNS-based data as well as necessary EM algorithms for the ML estimation, and evaluate the performance of corresponding estimators in estimating $\boldsymbolθ$. We show that there are always values of the design parameters on which RNS-based estimators are more efficient than those based on SRS. Inference based on imperfect ranking is also explored and it is shown that the improvement holds even when the ranking is imperfect. Theoretical results are augmented with numerical evaluations and a case study.
We use a discrete-time proportional hazards model of time to involuntary employment termination. This model enables us to examine both the continuous effect of the age of an employee and whether that effect has varied over time, generalizing earlier work [Kadane and Woodworth J. Bus. Econom. Statist. 22 (2004) 182--193]. We model the log hazard surface (over age and time) as a thin-plate spline, a Bayesian smoothness-prior implementation of penalized likelihood methods of surface-fitting [Wahba (1990) Spline Models for Observational Data. SIAM]. The nonlinear component of the surface has only two parameters, smoothness and anisotropy. The first, a scale parameter, governs the overall smoothness of the surface, and the second, anisotropy, controls the relative smoothness over time and over age. For any fixed value of the anisotropy parameter, the prior is equivalent to a Gaussian process with linear drift over the time--age plane with easily computed eigenvectors and eigenvalues that depend only on the configuration of data in the time--age plane and the anisotropy parameter. This model has application to legal cases in which a company is charged with disproportionately disadvantaging older workers when deciding whom to terminate. We illustrate the application of the modeling approach using data from an actual discrimination case.