Directed type theory, with a twist
Fernando Rafael Chu Rivera, Paige Randall North
In recent years, Homotopy Type Theory (HoTT) has had great success both as a foundation of mathematics and as internal language to reason about $\infty$-groupoids (a.k.a. spaces). However, in many areas of mathematics and computer science, it is often the case that it is categories, not groupoids, which are the more important structures to consider. For this reason, multiple directed type theories have been proposed, i.e., theories whose semantics are based on categories. In this paper, we present a new such type theory, Twisted Type Theory (TTT). It features a novel ``twisting'' operation on types: given a type that depends both contravariantly and covariantly on some variables, its twist is a new type that depends only covariantly on the same variables. To provide the semantics of this operation, we introduce the notion of dependent 2-sided fibrations (D2SFibs), which generalize Street's notion of 2-sided fibrations. We develop the basic theory of D2SFibs, as well as characterize them through a straightening-unstraightening theorem. With these results in hand, we introduce a new elimination rule for Hom-types. We argue that our syntax and semantics satisfy key features that allow reasoning in a HoTT-like style, which allows us to mimic the proof techniques of that setting. We end the paper by exemplifying this, and use TTT to reason about categories, giving a syntactic proof of Yoneda's lemma.
Discovering the Higgsino at CTAO-North within the Decade
Shotaro Abe, Tomohiro Inada, Emmanuel Moulin
et al.
We demonstrate that higgsino dark matter (DM) could be discovered within the next few years using the Cherenkov Telescope Array Observatory's soon-to-be-operational northern site (CTAO-North). A 1.1 TeV thermal higgsino is a highly motivated yet untested model of DM. Despite its strong theoretical motivation in supersymmetry and beyond, the higgsino is notoriously difficult to detect; it lies deep within the neutrino fog of direct detection experiments and could pose a challenge even for a future muon collider. We show that, in contrast, higgsino detection could be possible within this decade with CTAO-North in La Palma, Spain. The Galactic Center is the region where the dominant DM annihilation signature emerges, but it only barely rises above the horizon at the CTAO-North site. However, we project that this challenge can be overcome with large-zenith-angle observations at the northern site, enabling the conclusive detection of a higgsino signal by 2030 for a range of DM density profiles in the inner Galaxy.
Properties of the Lower Segment of M31's North West Stream
Janet Preston, Denis Erkal, Michelle L. M. Collins
et al.
We present a kinematic and spectroscopic analysis of 40 red giant branch stars, in 9 fields, exquisitely delineating the lower segment of the North West Stream (NW-K2), which extends for $\sim$80 kpc from the centre of the Andromeda galaxy. We measure the stream's systemic velocity as -439.3$^{+4.1}_{-3.8}$ km/s with a velocity dispersion = 16.4$^{+5.6}_{-3.8}$ km/s that is in keeping with its progenitor being a dwarf galaxy. We find no detectable velocity gradient along the stream. We determine $-$1.3$\pm$0.1 $\le$ <[Fe/H]$_{\rm spec}$> $\le$ $-$1.2$\pm$0.8 but find no metallicity gradient along the stream. We are able to plausibly associate NW-K2 with the globular clusters PandAS-04, PandAS-09, PAndAS-10, PAndAS-11, PandAS-12 but not with PandAS-13 or PandAS-15 which we find to be superimposed on the stream but not kinematically associated with it.
Syntactic Language Change in English and German: Metrics, Parsers, and Convergences
Yanran Chen, Wei Zhao, Anne Breitbarth
et al.
Many studies have shown that human languages tend to optimize for lower complexity and increased communication efficiency. Syntactic dependency distance, which measures the linear distance between dependent words, is often considered a key indicator of language processing difficulty and working memory load. The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependency parsers, including the widely used Stanford CoreNLP as well as 4 newer alternatives. Our analysis of syntactic language change goes beyond linear dependency distance and explores 15 metrics relevant to dependency distance minimization (DDM) and/or based on tree graph properties, such as the tree height and degree variance. Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work. We also show that syntactic language change over the time period investigated is largely similar between English and German for the different metrics explored: only 4% of cases we examine yield opposite conclusions regarding upwards and downtrends of syntactic metrics across German and English. We also show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions. To our best knowledge, ours is the most comprehensive analysis of syntactic language change using modern NLP technology in recent corpora of English and German.
Dataset of Quotation Attribution in German News Articles
Fynn Petersen-Frey, Chris Biemann
Extracting who says what to whom is a crucial part in analyzing human communication in today's abundance of data such as online news articles. Yet, the lack of annotated data for this task in German news articles severely limits the quality and usability of possible systems. To remedy this, we present a new, freely available, creative-commons-licensed dataset for quotation attribution in German news articles based on WIKINEWS. The dataset provides curated, high-quality annotations across 1000 documents (250,000 tokens) in a fine-grained annotation schema enabling various downstream uses for the dataset. The annotations not only specify who said what but also how, in which context, to whom and define the type of quotation. We specify our annotation schema, describe the creation of the dataset and provide a quantitative analysis. Further, we describe suitable evaluation metrics, apply two existing systems for quotation attribution, discuss their results to evaluate the utility of our dataset and outline use cases of our dataset in downstream tasks.
Analysis of the Impact of North Indian Ocean Cyclonic Disturbance on Human and Economic Losses
Monu Yadav, Laxminarayan Das
This paper explores the features of cyclonic disturbances (CDs) in the North Indian Ocean (NIO) by utilizing data from 1990 to 2022. It investigates the occurrence rate of these disturbances and their effects on human and economic losses throughout the mentioned period. The analysis demonstrates a rising trend in the occurrence of CDs in the NIO. While there has been a slight decline in CD-related fatalities since 2015, but there has been a considerable increase in economic losses. These findings can be attributed to enhanced government initiatives in disaster prevention and mitigation in recent years, as well as rapid economic growth in regions prone to CDs. The study sheds light on the significance of addressing the impact of CDs on both human lives and economic stability in the NIO region.
Classification of Human- and AI-Generated Texts for English, French, German, and Spanish
Kristina Schaaff, Tim Schlippe, Lorenz Mindner
In this paper we analyze features to classify human- and AI-generated text for English, French, German and Spanish and compare them across languages. We investigate two scenarios: (1) The detection of text generated by AI from scratch, and (2) the detection of text rephrased by AI. For training and testing the classifiers in this multilingual setting, we created a new text corpus covering 10 topics for each language. For the detection of AI-generated text, the combination of all proposed features performs best, indicating that our features are portable to other related languages: The F1-scores are close with 99% for Spanish, 98% for English, 97% for German and 95% for French. For the detection of AI-rephrased text, the systems with all features outperform systems with other features in many cases, but using only document features performs best for German (72%) and Spanish (86%) and only text vector features leads to best results for English (78%).
The role of teleconnection patterns in the variability and trends of growing season indices across Europe
P. Craig, R. Allan
Teleconnection patterns affect the weather and climate on both interannual and decadal timescales which in turn affects various socio‐economic sectors such as agriculture. We use three climate indices based on E‐OBS data from the INDECIS dataset (growing season onset [ogs10], growing season rainfall [gsr] and growing season temperature [ta_o]) to assess the interannual variability and trends over 1950–2017 associated with four teleconnection patterns (North Atlantic Oscillation [NAO], East Atlantic pattern [EA], Scandinavian pattern [SCA] and East Atlantic/West Russia pattern [EAWR]) using linear regression to extract the signal of each teleconnection pattern and their contribution to interannual variability. Trends towards an earlier growing season onset are found across most of Europe in low‐lying regions. The NAO dominates interannual variability in northwest Europe where an NAO index of 1 is associated with earlier ogs10 of about 10 days and the EA dominates the continent with a trend towards the positive EA phase driving an earlier growing season onset of 1.1–1.7 days·decade−1 in five regions (Great Britain and Ireland, France, Italy, Poland and North Germany, Hungary, Balkans). The EA and SCA gsr signals have north/south splits of orientation: positive EA is linked to increased gsr in northern regions and reduced gsr in southern Europe, and vice versa for SCA. Correlations between gsr interannual variability and the teleconnection contributions are strongest in the Mediterranean regions and south Scandinavia with maxima of 0.41 and 0.46, respectively. Decreasing ta_o trends in Romania are explained by poor data coverage causing problems with the EOBS gridding algorithm when new stations are incorporated from 1961. The net effect is that Romanian ta_o is about 1.5°C cooler than expected compared to trends from surrounding countries. Improved spatial and temporal data coverage will benefit the EOBS dataset and prevent such erroneous trends.
Seasonal prediction of Euro-Atlantic teleconnections from multiple systems
L. Lledó, I. Cionni, V. Torralba
et al.
Seasonal mean atmospheric circulation in Europe can vary substantially from year to year. This diversity of conditions impacts many socioeconomic sectors. Teleconnection indices can be used to characterize this seasonal variability, while seasonal forecasts of those indices offer the opportunity to take adaptation actions a few months in advance. For instance, the North Atlantic Oscillation has proven useful as a proxy for atmospheric effects in several sectors, and dynamical forecasts of its evolution in winter have been shown skillful. However the NAO only characterizes part of this seasonal circulation anomalies, and other teleconnections such as the East Atlantic, the East Atlantic Western Russia or the Scandinavian Pattern also play an important role in shaping atmospheric conditions in the continent throughout the year. This paper explores the quality of seasonal forecasts of these four teleconnection indices for the four seasons of the year, derived from five different seasonal prediction systems. We find that several teleconnection indices can be skillfully predicted in advance in winter, spring and summer. We also show that there is no single prediction system that performs better than the others for all seasons and teleconnections, and that a multi-system approach produces results that are as good as the best of the systems.
43 sitasi
en
Physics, Geography
What Leads to Persisting Surface Air Temperature Anomalies from Winter to Following Spring over Mid- to High-Latitude Eurasia?
R. Wu, Shangfeng Chen
Surface air temperature (SAT) anomalies tend to persist from winter to the following spring over the mid- to high latitudes of Eurasia. The present study compares two distinct cases of Eurasian SAT anomaly evolution and investigates the reasons for the persistence of continental-scale mid- to high-latitude Eurasian SAT anomalies from winter to following spring (termed persistent cases). The persisting SAT anomalies are closely associated with the sustenance of large-scale atmospheric circulation anomaly pattern over the North Atlantic and Eurasia, featuring a combination of the North Atlantic Oscillation/Arctic Oscillation (NAO/AO) and the Scandinavian pattern, from winter to spring. The combined circulation anomalies result in SAT warming over most of mid- to high-latitude Eurasia via anomalous wind-induced temperature advection. The sustenance of atmospheric circulation anomaly pattern is related to the maintenance of the North Atlantic triple sea surface temperature (SST) anomaly pattern due to air–sea interaction processes. The Barents Sea ice anomalies, which form in winter and increase in spring, also partly contribute to the sustenance of atmospheric circulation anomalies via modulating thermal state of the lower troposphere. In the cases that notable SAT warming (cooling) in winter is replaced by pronounced SAT cooling (warming) in the subsequent spring—termed reverse cases—the North Atlantic SST anomalies become small and the Greenland Sea ice change is a response to atmospheric change in spring. Without the support of lower boundary forcing, the atmospheric circulation anomaly pattern experiences a reverse in the spatial distribution from winter to spring likely due to internal atmospheric processes.
The Univalence Principle
Benedikt Ahrens, Paige Randall North, Michael Shulman
et al.
The Univalence Principle is the statement that equivalent mathematical structures are indistinguishable. We prove a general version of this principle that applies to all set-based, categorical, and higher-categorical structures defined in a non-algebraic and space-based style, as well as models of higher-order theories such as topological spaces. In particular, we formulate a general definition of indiscernibility for objects of any such structure, and a corresponding univalence condition that generalizes Rezk's completeness condition for Segal spaces and ensures that all equivalences of structures are levelwise equivalences. Our work builds on Makkai's First-Order Logic with Dependent Sorts, but is expressed in Voevodsky's Univalent Foundations (UF), extending previous work on the Structure Identity Principle and univalent categories in UF. This enables indistinguishability to be expressed simply as identification, and yields a formal theory that is interpretable in classical homotopy theory, but also in other higher topos models. It follows that Univalent Foundations is a fully equivalence-invariant foundation for higher-categorical mathematics, as intended by Voevodsky.
Scribosermo: Fast Speech-to-Text models for German and other Languages
Daniel Bermuth, Alexander Poeppel, Wolfgang Reif
Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English. This paper presents Speech-to-Text models for German, as well as for Spanish and French with special features: (a) They are small and run in real-time on microcontrollers like a RaspberryPi. (b) Using a pretrained English model, they can be trained on consumer-grade hardware with a relatively small dataset. (c) The models are competitive with other solutions and outperform them in German. In this respect, the models combine advantages of other approaches, which only include a subset of the presented features. Furthermore, the paper provides a new library for handling datasets, which is focused on easy extension with additional datasets and shows an optimized way for transfer-learning new languages using a pretrained model from another language with a similar alphabet.
MexPub: Deep Transfer Learning for Metadata Extraction from German Publications
Zeyd Boukhers, Nada Beili, Timo Hartmann
et al.
Extracting metadata from scientific papers can be considered a solved problem in NLP due to the high accuracy of state-of-the-art methods. However, this does not apply to German scientific publications, which have a variety of styles and layouts. In contrast to most of the English scientific publications that follow standard and simple layouts, the order, content, position and size of metadata in German publications vary greatly among publications. This variety makes traditional NLP methods fail to accurately extract metadata from these publications. In this paper, we present a method that extracts metadata from PDF documents with different layouts and styles by viewing the document as an image. We used Mask R-CNN that is trained on COCO dataset and finetuned with PubLayNet dataset that consists of ~200K PDF snapshots with five basic classes (e.g. text, figure, etc). We refine-tuned the model on our proposed synthetic dataset consisting of ~30K article snapshots to extract nine patterns (i.e. author, title, etc). Our synthetic dataset is generated using contents in both languages German and English and a finite set of challenging templates obtained from German publications. Our method achieved an average accuracy of around $90\%$ which validates its capability to accurately extract metadata from a variety of PDF documents with challenging templates.
Effects of the tropospheric large‐scale circulation on European winter temperatures during the period of amplified Arctic warming
T. Vihma, R. Graversen, Linling Chen
et al.
We investigate factors influencing European winter (DJFM) air temperatures for the period 1979–2015 with the focus on changes during the recent period of rapid Arctic warming (1998–2015). We employ meteorological reanalyses analysed with a combination of correlation analysis, two pattern clustering techniques, and back‐trajectory airmass identification. In all five selected European regions, severe cold winter events lasting at least 4 days are significantly correlated with warm Arctic episodes. Relationships during opposite conditions of warm Europe/cold Arctic are also significant. Correlations have become consistently stronger since 1998. Large‐scale pattern analysis reveals that cold spells are associated with the negative phase of the North Atlantic Oscillation (NAO‐) and the positive phase of the Scandinavian (SCA+) pattern, which in turn are correlated with the divergence of dry‐static energy transport. Warm European extremes are associated with opposite phases of these patterns and the convergence of latent heat transport. Airmass trajectory analysis is consistent with these findings, as airmasses associated with extreme cold events typically originate over continents, while warm events tend to occur with prevailing maritime airmasses. Despite Arctic‐wide warming, significant cooling has occurred in northeastern Europe owing to a decrease in adiabatic subsidence heating in airmasses arriving from the southeast, along with increased occurrence of circulation patterns favouring low temperature advection. These dynamic effects dominated over the increased mean temperature of most circulation patterns. Lagged correlation analysis reveals that SCA‐ and NAO+ are typically preceded by cold Arctic anomalies during the previous 2–3 months, which may aid seasonal forecasting.
62 sitasi
en
Medicine, Environmental Science
Revisiting the identification of wintertime atmospheric circulation regimes in the Euro‐Atlantic sector
Swinda K. J. Falkena, J. Wiljes, A. Weisheimer
et al.
Atmospheric circulation is often clustered in so‐called circulation regimes, which are persistent and recurrent patterns. For the Euro‐Atlantic sector in winter, most studies identify four regimes: the Atlantic Ridge, Scandinavian Blocking and the two phases of the North Atlantic Oscillation. These results are obtained by applying k‐means clustering to the first several empirical orthogonal functions (EOFs) of geopotential height data. Studying the observed circulation in reanalysis data, it is found that when the full field data are used for the k‐means cluster analysis instead of the EOFs, the optimal number of clusters is no longer four but six. The two extra regimes that are found are the opposites of the Atlantic Ridge and Scandinavian Blocking, meaning they have a low‐pressure area roughly where the original regimes have a high‐pressure area. This introduces an appealing symmetry in the clustering result. Incorporating a weak persistence constraint in the clustering procedure is found to lead to a longer duration of regimes, extending beyond the synoptic time‐scale, without changing their occurrence rates. This is in contrast to the commonly used application of a time‐filter to the data before the clustering is executed, which, while increasing the persistence, changes the occurrence rates of the regimes. We conclude that applying a persistence constraint within the clustering procedure is a better way of stabilizing the clustering results than low‐pass filtering the data.
50 sitasi
en
Environmental Science, Geology
Network-Based Analysis of Public Transportation Systems in North American Cities
Abbas Masoumzadeh, Tilemachos Pechlivanoglou
A comprehensive data analysis system is implemented for the extraction of information and comparison of North American public transport systems. The system is based on network representations of the transport systems and makes use of a span of metrics and algorithms from the established properties in graph theory to complicated domain specific measurements. Due to nature of big data systems and the requirement of scalability, many heuristic optimizations and approximations have been considered in the system. Integration with other sources of data specially population density maps is also executed in the system. Formal evaluations are done on subcomponents of the system to make sure the approximations have reasonable precision. Results on comparison of four cities, San Francisco, Boston, Toronto and Los Angeles, approves that the big data approach to comparison of public transit systems can successfully reveal the underlying similarities and differences.
A Higher Structure Identity Principle
Benedikt Ahrens, Paige Randall North, Michael Shulman
et al.
The ordinary Structure Identity Principle states that any property of set-level structures (e.g., posets, groups, rings, fields) definable in Univalent Foundations is invariant under isomorphism: more specifically, identifications of structures coincide with isomorphisms. We prove a version of this principle for a wide range of higher-categorical structures, adapting FOLDS-signatures to specify a general class of structures, and using two-level type theory to treat all categorical dimensions uniformly. As in the previously known case of 1-categories (which is an instance of our theory), the structures themselves must satisfy a local univalence principle, stating that identifications coincide with "isomorphisms" between elements of the structure. Our main technical achievement is a definition of such isomorphisms, which we call "indiscernibilities", using only the dependency structure rather than any notion of composition.
Linking crop yield anomalies to large-scale atmospheric circulation in Europe
A. Ceglar, M. Turco, A. Toreti
et al.
Understanding the effects of climate variability and extremes on crop growth and development represents a necessary step to assess the resilience of agricultural systems to changing climate conditions. This study investigates the links between the large-scale atmospheric circulation and crop yields in Europe, providing the basis to develop seasonal crop yield forecasting and thus enabling a more effective and dynamic adaptation to climate variability and change. Four dominant modes of large-scale atmospheric variability have been used: North Atlantic Oscillation, Eastern Atlantic, Scandinavian and Eastern Atlantic-Western Russia patterns. Large-scale atmospheric circulation explains on average 43% of inter-annual winter wheat yield variability, ranging between 20% and 70% across countries. As for grain maize, the average explained variability is 38%, ranging between 20% and 58%. Spatially, the skill of the developed statistical models strongly depends on the large-scale atmospheric variability impact on weather at the regional level, especially during the most sensitive growth stages of flowering and grain filling. Our results also suggest that preceding atmospheric conditions might provide an important source of predictability especially for maize yields in south-eastern Europe. Since the seasonal predictability of large-scale atmospheric patterns is generally higher than the one of surface weather variables (e.g. precipitation) in Europe, seasonal crop yield prediction could benefit from the integration of derived statistical models exploiting the dynamical seasonal forecast of large-scale atmospheric circulation.
61 sitasi
en
Environmental Science, Medicine
Nordic Slow Adventure: Explorations in Time and Nature
P. Varley, Tristan Semple
Teleconnection–extreme precipitation relationships over the Mediterranean region
S. Krichak, J. Breitgand, S. Gualdi
et al.
139 sitasi
en
Environmental Science