An Open-Source Robotics Research Platform for Autonomous Laparoscopic Surgery
Ariel Rodriguez, Lorenzo Mazza, Martin Lelis
et al.
Autonomous robot-assisted surgery demands reliable, high-precision platforms that strictly adhere to the safety and kinematic constraints of minimally invasive procedures. Existing research platforms, primarily based on the da Vinci Research Kit, suffer from cable-driven mechanical limitations that degrade state-space consistency and hinder the downstream training of reliable autonomous policies. We present an open-source, robot-agnostic Remote Center of Motion (RCM) controller based on a closed-form analytical velocity solver that enforces the trocar constraint deterministically without iterative optimization. The controller operates in Cartesian space, enabling any industrial manipulator to function as a surgical robot. We provide implementations for the UR5e and Franka Emika Panda manipulators, and integrate stereoscopic 3D perception. We integrate the robot control into a full-stack ROS-based surgical robotics platform supporting teleoperation, demonstration recording, and deployment of learned policies via a decoupled server-client architecture. We validate the system on a bowel grasping and retraction task across phantom, ex vivo, and in vivo porcine laparoscopic procedures. RCM deviations remain sub-millimeter across all conditions, and trajectory smoothness metrics (SPARC, LDLJ) are comparable to expert demonstrations from the JIGSAWS benchmark recorded on the da Vinci system. These results demonstrate that the platform provides the precision and robustness required for teleoperation, data collection and autonomous policy deployment in realistic surgical scenarios.
Benchmarking CNN- and Transformer-Based Models for Surgical Instrument Segmentation in Robotic-Assisted Surgery
Sara Ameli
Accurate segmentation of surgical instruments in robotic-assisted surgery is critical for enabling context-aware computer-assisted interventions, such as tool tracking, workflow analysis, and autonomous decision-making. In this study, we benchmark five deep learning architectures-UNet, UNet, DeepLabV3, Attention UNet, and SegFormer on the SAR-RARP50 dataset for multi-class semantic segmentation of surgical instruments in real-world radical prostatectomy videos. The models are trained with a compound loss function combining Cross Entropy and Dice loss to address class imbalance and capture fine object boundaries. Our experiments reveal that while convolutional models such as UNet and Attention UNet provide strong baseline performance, DeepLabV3 achieves results comparable to SegFormer, demonstrating the effectiveness of atrous convolution and multi-scale context aggregation in capturing complex surgical scenes. Transformer-based architectures like SegFormer further enhance global contextual understanding, leading to improved generalization across varying instrument appearances and surgical conditions. This work provides a comprehensive comparison and practical insights for selecting segmentation models in surgical AI applications, highlighting the trade-offs between convolutional and transformer-based approaches.
Activation Surgery: Jailbreaking White-box LLMs without Touching the Prompt
Maël Jenny, Jérémie Dentan, Sonia Vanier
et al.
Most jailbreak techniques for Large Language Models (LLMs) primarily rely on prompt modifications, including paraphrasing, obfuscation, or conversational strategies. Meanwhile, abliteration techniques (also known as targeted ablations of internal components) have been used to study and explain LLM outputs by probing which internal structures causally support particular responses. In this work, we combine these two lines of research by directly manipulating the model's internal activations to alter its generation trajectory without changing the prompt. Our method constructs a nearby benign prompt and performs layer-wise activation substitutions using a sequential procedure. We show that this activation surgery method reveals where and how refusal arises, and prevents refusal signals from propagating across layers, thereby inhibiting the model's safety mechanisms. Finally, we discuss the security implications for open-weights models and instrumented inference environments.
Who Benefits From Sinus Surgery? Comparing Generative AI and Supervised Machine Learning for Predicting Surgical Outcomes in Chronic Rhinosinusitis
Sayeed Shafayet Chowdhury, Snehasis Mukhopadhyay, Shiaofen Fang
et al.
Artificial intelligence has reshaped medical imaging, yet the use of AI on clinical data for prospective decision support remains limited. We study pre-operative prediction of clinically meaningful improvement in chronic rhinosinusitis (CRS), defining success as a more than 8.9-point reduction in SNOT-22 at 6 months (MCID). In a prospectively collected cohort where all patients underwent surgery, we ask whether models using only pre-operative clinical data could have identified those who would have poor outcomes, i.e. those who should have avoided surgery. We benchmark supervised ML (logistic regression, tree ensembles, and an in-house MLP) against generative AI (ChatGPT, Claude, Gemini, Perplexity), giving each the same structured inputs and constraining outputs to binary recommendations with confidence. Our best ML model (MLP) achieves 85 % accuracy with superior calibration and decision-curve net benefit. GenAI models underperform on discrimination and calibration across zero-shot setting. Notably, GenAI justifications align with clinician heuristics and the MLP's feature importance, repeatedly highlighting baseline SNOT-22, CT/endoscopy severity, polyp phenotype, and physchology/pain comorbidities. We provide a reproducible tabular-to-GenAI evaluation protocol and subgroup analyses. Findings support an ML-first, GenAI- augmented workflow: deploy calibrated ML for primary triage of surgical candidacy, with GenAI as an explainer to enhance transparency and shared decision-making.
Federated Deep Reinforcement Learning for Privacy-Preserving Robotic-Assisted Surgery
Sana Hafeez, Sundas Rafat Mulkana, Muhammad Ali Imran
et al.
The integration of Reinforcement Learning (RL) into robotic-assisted surgery (RAS) holds significant promise for advancing surgical precision, adaptability, and autonomous decision-making. However, the development of robust RL models in clinical settings is hindered by key challenges, including stringent patient data privacy regulations, limited access to diverse surgical datasets, and high procedural variability. To address these limitations, this paper presents a Federated Deep Reinforcement Learning (FDRL) framework that enables decentralized training of RL models across multiple healthcare institutions without exposing sensitive patient information. A central innovation of the proposed framework is its dynamic policy adaptation mechanism, which allows surgical robots to select and tailor patient-specific policies in real-time, thereby ensuring personalized and Optimised interventions. To uphold rigorous privacy standards while facilitating collaborative learning, the FDRL framework incorporates secure aggregation, differential privacy, and homomorphic encryption techniques. Experimental results demonstrate a 60\% reduction in privacy leakage compared to conventional methods, with surgical precision maintained within a 1.5\% margin of a centralized baseline. This work establishes a foundational approach for adaptive, secure, and patient-centric AI-driven surgical robotics, offering a pathway toward clinical translation and scalable deployment across diverse healthcare environments.
Towards Real-time Intrahepatic Vessel Identification in Intraoperative Ultrasound-Guided Liver Surgery
Karl-Philippe Beaudet, Alexandros Karargyris, Sidaty El Hadramy
et al.
While laparoscopic liver resection is less prone to complications and maintains patient outcomes compared to traditional open surgery, its complexity hinders widespread adoption due to challenges in representing the liver's internal structure. Laparoscopic intraoperative ultrasound offers efficient, cost-effective and radiation-free guidance. Our objective is to aid physicians in identifying internal liver structures using laparoscopic intraoperative ultrasound. We propose a patient-specific approach using preoperative 3D ultrasound liver volume to train a deep learning model for real-time identification of portal tree and branch structures. Our personalized AI model, validated on ex vivo swine livers, achieved superior precision (0.95) and recall (0.93) compared to surgeons, laying groundwork for precise vessel identification in ultrasound-based liver resection. Its adaptability and potential clinical impact promise to advance surgical interventions and improve patient care.
Classification of Lattices Bounded by Large Surgeries of Knots
Ali Naseri Sadr
We classify all the lattices realized as the intersection form of a positive definite four manifold with boundary $S_n^3(K)$ for a knot $K$ in the three sphere and a positive integer $n$ greater than $4g_4(K)+3$. We then use this result to define a concordance invariant and generalize a theorem of Rasmussen on lens space surgeries.
Special alternating knots with sufficiently many twist regions have no chirally cosmetic surgeries
Tetsuya Ito
We show that a special alternating knot with sufficiently large number (more than $63$) of twist regions has no chirally cosmetic surgeries, a pair of Dehn surgeries producing orientation-reversingly homeomorphic $3$-manifolds. In the course of proof, we provide the optimal upper bounds of the primitive finite type invariants of degree 2 and 3 that solve Willerton's conjecture.
Airway Label Prediction in Video Bronchoscopy: Capturing Temporal Dependencies Utilizing Anatomical Knowledge
Ron Keuth, Mattias Heinrich, Martin Eichenlaub
et al.
Purpose: Navigation guidance is a key requirement for a multitude of lung interventions using video bronchoscopy. State-of-the-art solutions focus on lung biopsies using electromagnetic tracking and intraoperative image registration w.r.t. preoperative CT scans for guidance. The requirement of patient-specific CT scans hampers the utilisation of navigation guidance for other applications such as intensive care units. Methods: This paper addresses navigation guidance solely incorporating bronchosopy video data. In contrast to state-of-the-art approaches we entirely omit the use of electromagnetic tracking and patient-specific CT scans. Guidance is enabled by means of topological bronchoscope localization w.r.t. an interpatient airway model. Particularly, we take maximally advantage of anatomical constraints of airway trees being sequentially traversed. This is realized by incorporating sequences of CNN-based airway likelihoods into a Hidden Markov Model. Results: Our approach is evaluated based on multiple experiments inside a lung phantom model. With the consideration of temporal context and use of anatomical knowledge for regularization, we are able to improve the accuracy up to to 0.98 compared to 0.81 (weighted F1: 0.98 compared to 0.81) for a classification based on individual frames. Conclusion: We combine CNN-based single image classification of airway segments with anatomical constraints and temporal HMM-based inference for the first time. Our approach renders vision-only guidance for bronchoscopy interventions in the absence of electromagnetic tracking and patient-specific CT scans possible.
Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study
Patrick Saux, Pierre Bauvin, Violeta Raverdy
et al.
Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and regression trees algorithm to build interpretable regression trees. The performances of the model were assessed through the median absolute deviation (MAD) and root mean squared error (RMSE) of BMI. Findings10 231 patients from 12 centres in ten countries were included in the analysis, corresponding to 30 602 patient-years. Among participants in all 12 cohorts, 7701 (75$\bullet$3%) were female, 2530 (24$\bullet$7%) were male. Among 434 baseline attributes available in the training cohort, seven variables were selected: height, weight, intervention type, age, diabetes status, diabetes duration, and smoking status. At 5 years, across external testing cohorts the overall mean MAD BMI was 2$\bullet$8 kg/m${}^2$ (95% CI 2$\bullet$6-3$\bullet$0) and mean RMSE BMI was 4$\bullet$7 kg/m${}^2$ (4$\bullet$4-5$\bullet$0), and the mean difference between predicted and observed BMI was-0$\bullet$3 kg/m${}^2$ (SD 4$\bullet$7). This model is incorporated in an easy to use and interpretable web-based prediction tool to help inform clinical decision before surgery. InterpretationWe developed a machine learning-based model, which is internationally validated, for predicting individual 5-year weight loss trajectories after three common bariatric interventions.
Generalizability analyses with a partially nested trial design: the Necrotizing Enterocolitis Surgery Trial
Sarah E. Robertson, Matthew A. Rysavy, Martin L. Blakely
et al.
We discuss generalizability analyses under a partially nested trial design, where part of the trial is nested within a cohort of trial-eligible individuals, while the rest of the trial is not nested. This design arises, for example, when only some centers participating in a trial are able to collect data on non-randomized individuals, or when data on non-randomized individuals cannot be collected for the full duration of the trial. Our work is motivated by the Necrotizing Enterocolitis Surgery Trial (NEST) that compared initial laparotomy versus peritoneal drain for infants with necrotizing enterocolitis or spontaneous intestinal perforation. During the first phase of the study, data were collected from randomized individuals as well as consenting non-randomized individuals; during the second phase of the study, however, data were only collected from randomized individuals, resulting in a partially nested trial design. We propose methods for generalizability analyses with partially nested trial designs. We describe identification conditions and propose estimators for causal estimands in the target population of all trial-eligible individuals, both randomized and non-randomized, in the part of the data where the trial is nested, while using trial information spanning both parts. We evaluate the estimators in a simulation study.
Chirally Cosmetic Surgeries on Kinoshita-Terasaka and Conway knot families
Xiliu Yang
In this note, we prove that a nontrivial Kinoshita-Terasaka or Conway knot does not admit chirally cosmetic surgeries, by calculating the finite type invariant of order 3.
Surgery calculus for classical $\operatorname{SL}_2(\mathbb{C})$ Chern-Simons theory
Calvin McPhail-Snyder
Classical $\operatorname{SL}_2(\mathbb{C})$-Chern-Simons theory assigns a $3$-manifold $M$ with representation $ρ: π_1(M) \to \operatorname{SL}_2(\mathbb{C})$ its complex volume $\operatorname{V}(M, ρ) \in \mathbb{C} / 2 π^2 i \mathbb{Z}$, with real part the volume and imaginary part the Chern-Simons invariant. The existing literature focuses on computing $\operatorname{V}$ using a triangulation. In this paper we show how to compute $\operatorname{V}(M, L, ρ)$ directly from a surgery diagram for $M$ a compact oriented $3$-manifold with torus boundary components, embedded cusps $L$, and representation $ρ: π_1(M \setminus L) \to \operatorname{SL}_2(\mathbb{C})$. When $M$ has nonempty boundary $\operatorname{V}(M, L, ρ)(\mathfrak{s})$ depends on some extra data $\mathfrak{s}$ we call a log-decoration. Our method describes $ρ$ in a coordinate system closely related to quantum groups, and we think of our construction as a classical, noncompact version of Witten-Reshetikhin-Turaev's quantum $\operatorname{SU}(2)$ Chern-Simons theory.
Quadratically pinched hypersurfaces of the sphere via mean curvature flow with surgery
Mat Langford, Huy The Nguyen
We study mean curvature flow in $\mathbb S_K^{n+1}$, the round sphere of sectional curvature $K>0$, under the quadratic curvature pinching condition $|A|^{2} < \frac{1}{n-2} H^{2} + 4 K$ when $n\ge 4$ and $|A|^{2} < \frac{3}{5}H^{2}+\frac{8}{3}K$ when $n=3$. This condition is related to a famous theorem of Simons, which states that the only minimal hypersurfaces satisfying $\vert A\vert^2<nK$ are the totally geodesic hyperspheres. It is related to but distinct from two-convexity. Notably, in contrast to two-convexity, it allows the mean curvature to change sign. We show that the pinching condition is preserved by mean curvature flow, and obtain a cylindrical estimate and corresponding pointwise derivative estimates for the curvature. As a result, we find that the flow becomes either uniformly convex or quantitatively cylindrical in regions of high curvature. This allows us to apply the surgery apparatus developed by Huisken and Sinestrari. We conclude that any smoothly, properly, isometrically immersed hypersurface $\mathcal{M}$ of $\mathbb S_K^{n+1}$ satisfying the pinching condition is diffeomorphic to $\mathbb S^n$ or the connected sum of a finite number of copies of $\mathbb S^1\times \mathbb S^{n-1}$. If $\mathcal M$ is embedded, then it bounds a 1-handlebody. The results are sharp when $n\ge 4$.
Automatic Operating Room Surgical Activity Recognition for Robot-Assisted Surgery
Aidean Sharghi, Helene Haugerud, Daniel Oh
et al.
Automatic recognition of surgical activities in the operating room (OR) is a key technology for creating next generation intelligent surgical devices and workflow monitoring/support systems. Such systems can potentially enhance efficiency in the OR, resulting in lower costs and improved care delivery to the patients. In this paper, we investigate automatic surgical activity recognition in robot-assisted operations. We collect the first large-scale dataset including 400 full-length multi-perspective videos from a variety of robotic surgery cases captured using Time-of-Flight cameras. We densely annotate the videos with 10 most recognized and clinically relevant classes of activities. Furthermore, we investigate state-of-the-art computer vision action recognition techniques and adapt them for the OR environment and the dataset. First, we fine-tune the Inflated 3D ConvNet (I3D) for clip-level activity recognition on our dataset and use it to extract features from the videos. These features are then fed to a stack of 3 Temporal Gaussian Mixture layers which extracts context from neighboring clips, and eventually go through a Long Short Term Memory network to learn the order of activities in full-length videos. We extensively assess the model and reach a peak performance of 88% mean Average Precision.
Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection
Duygu Sarikaya, Jason J. Corso, Khurshid A. Guru
Video understanding of robot-assisted surgery (RAS) videos is an active research area. Modeling the gestures and skill level of surgeons presents an interesting problem. The insights drawn may be applied in effective skill acquisition, objective skill assessment, real-time feedback, and human-robot collaborative surgeries. We propose a solution to the tool detection and localization open problem in RAS video understanding, using a strictly computer vision approach and the recent advances of deep learning. We propose an architecture using multimodal convolutional neural networks for fast detection and localization of tools in RAS videos. To our knowledge, this approach will be the first to incorporate deep neural networks for tool detection and localization in RAS videos. Our architecture applies a Region Proposal Network (RPN), and a multi-modal two stream convolutional network for object detection, to jointly predict objectness and localization on a fusion of image and temporal motion cues. Our results with an Average Precision (AP) of 91% and a mean computation time of 0.1 seconds per test frame detection indicate that our study is superior to conventionally used methods for medical imaging while also emphasizing the benefits of using RPN for precision and efficiency. We also introduce a new dataset, ATLAS Dione, for RAS video understanding. Our dataset provides video data of ten surgeons from Roswell Park Cancer Institute (RPCI) (Buffalo, NY) performing six different surgical tasks on the daVinci Surgical System (dVSS R ) with annotations of robotic tools per frame.
On $1$-bridge braids, satellite knots, the manifold $v2503$ and non-left-orderable surgeries and fillings
Zipei Nie
We define the property (D) for nontrivial knots. We show that the fundamental group of the manifold obtained by Dehn surgery on a knot $K$ with property (D) with slope $\frac{p}{q}\ge 2g(K)-1$ is not left orderable. By making full use of the fixed point method, we prove that (1) nontrivial knots which are closures of positive $1$-bridge braids have property (D); (2) L-space satellite knots, with positive $1$-bridge braid patterns, and companion with property (D), have property (D); (3) the fundamental group of the manifold obtained by Dehn filling on $v2503$ is not left orderable. Additionally, we prove that L-space twisted torus knots of form $T_{p,kp\pm 1}^{l,m}$ are closures of positive $1$-bridge braids.
Notes on fold maps obtained by surgery operations and algebraic information of their Reeb spaces
Naoki Kitazawa
The theory of Morse functions and their higher dimensional versions or fold maps on manifolds and its application to geometric theory of manifolds is one of important branches of geometry and mathematics. Studies related to this was started in 1950s by differential topologists such as Thom and Whitney and they have been studied actively. In this paper, we study fold maps obtained by surgery operations to fundamental fold maps, and especially Reeb spaces, defined as the spaces of all connected components of preimages and in suitable situations inheriting fundamental and important algebraic invariants such as (co)homology groups. Reeb spaces are fundamental and important tools in studying manifolds also in general. The author has already studied about homology groups of the Reeb spaces and obtained several results and in this paper, we study about their cohomology rings for several specific cases, as more precise information. These studies are motivated by a problem that construction of explicit fold maps is important in investigating (the worlds of explicit classes of) manifolds in geometric and constructive ways and difficult. It is not so difficult to construct these maps for simplest manifolds such as standard spheres, products of standard spheres and manifolds represented as their connected sums. We see various types of cohomology rings of Reeb spaces via systematic construction of fold maps.
Real-time image-based instrument classification for laparoscopic surgery
Sebastian Bodenstedt, Antonia Ohnemus, Darko Katic
et al.
During laparoscopic surgery, context-aware assistance systems aim to alleviate some of the difficulties the surgeon faces. To ensure that the right information is provided at the right time, the current phase of the intervention has to be known. Real-time locating and classification the surgical tools currently in use are key components of both an activity-based phase recognition and assistance generation. In this paper, we present an image-based approach that detects and classifies tools during laparoscopic interventions in real-time. First, potential instrument bounding boxes are detected using a pixel-wise random forest segmentation. Each of these bounding boxes is then classified using a cascade of random forest. For this, multiple features, such as histograms over hue and saturation, gradients and SURF feature, are extracted from each detected bounding box. We evaluated our approach on five different videos from two different types of procedures. We distinguished between the four most common classes of instruments (LigaSure, atraumatic grasper, aspirator, clip applier) and background. Our method succesfully located up to 86% of all instruments respectively. On manually provided bounding boxes, we achieve a instrument type recognition rate of up to 58% and on automatically detected bounding boxes up to 49%. To our knowledge, this is the first approach that allows an image-based classification of surgical tools in a laparoscopic setting in real-time.
Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery
Vivek Krishnan, Deva Ramanan
We consider the task of visual net surgery, in which a CNN can be reconfigured without extra data to recognize novel concepts that may be omitted from the training set. While most prior work make use of linguistic cues for such "zero-shot" learning, we do so by using a pictorial language representation of the training set, implicitly learned by a CNN, to generalize to new classes. To this end, we introduce a set of visualization techniques that better reveal the activation patterns and relations between groups of CNN filters. We next demonstrate that knowledge of pictorial languages can be used to rewire certain CNN neurons into a part model, which we call a pictorial language classifier. We demonstrate the robustness of simple PLCs by applying them in a weakly supervised manner: labeling unlabeled concepts for visual classes present in the training data. Specifically we show that a PLC built on top of a CNN trained for ImageNet classification can localize humans in Graz-02 and determine the pose of birds in PASCAL-VOC without extra labeled data or additional training. We then apply PLCs in an interactive zero-shot manner, demonstrating that pictorial languages are expressive enough to detect a set of visual classes in MS-COCO that never appear in the ImageNet training set.