Hasil untuk "stat.ML"
Menampilkan 20 dari ~159330 hasil · dari CrossRef, Semantic Scholar, DOAJ
Joaquin Carbonara, Ernest Fokoue
ABSTRACTThe rapid advancements in artificial intelligence (AI), machine learning (ML), neural networks (NN) and language models (LM) research, coupled with the widespread availability of large language models as a service (LLMaaS), have begun to influence most domains, particularly the field of statistics, in unprecedented ways that are difficult to forecast. The awarding of two Nobel Prizes in 2024 for computational work in AI—to Hopfield and Hinton for their foundational discoveries and inventions in machine learning with artificial neural networks and to Baker, Hassabis and Jumper for developing an AI model to solve the longstanding problem of predicting proteins' complex structures—is a testament to the significant impact of AI in these fields. Two key contributors for the current revolution are statistics and data science. The merger of data science with AI research led to the creation of tools like LLMs, profound advancements in AI as a tool and speculations of humanity being close to creating AGI. These transformative technologies have opened up a vast array of opportunities, but they have also presented new challenges that necessitate careful consideration. Here, we discuss what is needed to successfully navigate these stormy times in the current sea of information surrounding us.
Zach Furman, Edmund Lau
The \textit{local learning coefficient} (LLC) is a principled way of quantifying model complexity, originally derived in the context of Bayesian statistics using singular learning theory (SLT). Several methods are known for numerically estimating the local learning coefficient, but so far these methods have not been extended to the scale of modern deep learning architectures or data sets. Using a method developed in {\tt arXiv:2308.12108 [stat.ML]} we empirically show how the LLC may be measured accurately and self-consistently for deep linear networks (DLNs) up to 100M parameters. We also show that the estimated LLC has the rescaling invariance that holds for the theoretical quantity.
Adrien Corenflos, A. Finke
State-of-the-art methods for Bayesian inference in state-space models are (a) conditional sequential Monte Carlo (CSMC) algorithms; (b) sophisticated 'classical' MCMC algorithms like MALA, or mGRAD from Titsias and Papaspiliopoulos (2018, arXiv:1610.09641v3 [stat.ML]). The former propose $N$ particles at each time step to exploit the model's 'decorrelation-over-time' property and thus scale favourably with the time horizon, $T$ , but break down if the dimension of the latent states, $D$, is large. The latter leverage gradient-/prior-informed local proposals to scale favourably with $D$ but exhibit sub-optimal scalability with $T$ due to a lack of model-structure exploitation. We introduce methods which combine the strengths of both approaches. The first, Particle-MALA, spreads $N$ particles locally around the current state using gradient information, thus extending MALA to $T>1$ time steps and $N>1$ proposals. The second, Particle-mGRAD, additionally incorporates (conditionally) Gaussian prior dynamics into the proposal, thus extending the mGRAD algorithm to $T>1$ time steps and $N>1$ proposals. We prove that Particle-mGRAD interpolates between CSMC and Particle-MALA, resolving the 'tuning problem' of choosing between CSMC (superior for highly informative prior dynamics) and Particle-MALA (superior for weakly informative prior dynamics). We similarly extend other 'classical' MCMC approaches like auxiliary MALA, aGRAD, and preconditioned Crank-Nicolson-Langevin (PCNL) to $T>1$ time steps and $N>1$ proposals. In experiments, for both highly and weakly informative prior dynamics, our methods substantially improve upon both CSMC and sophisticated 'classical' MCMC approaches.
Stefan Bedbur, Anton Imm, Udo Kamps
Zhangjie Chen, Min Wang, Ya Wang
We recently developed a synchronized low-energy electronically chopped passive infrared (SLEEPIR) sensor node to detect stationary and moving occupants. It uses a liquid crystal shutter to modulate the infrared signal received by a traditional passive infrared (PIR) sensor and thus enables its capability to detect stationary occupants. However, the detection accuracy of the SLEEPIR sensor can be easily influenced by infrared environmental disturbances. To address this problem, in this article, we propose two long short-term memory (LSTM) models to filter infrared environmental disturbance, named baseline LSTM (Base.LSTM) and statistical LSTM (Stat.LSTM). They use the sensor node raw output and statistical features as their respective input. For comparison, we propose two other models: the occupancy state switch detection (SSD) algorithm that directly uses a predetermined threshold voltage value to classify the occupancy state and its status change; and the multilayer perception (MLP) classifier with statistical feature inputs (Stat.ML). To validate their detection performance, we designed two testing scenarios in different environment settings: 1) daily occupancy tests and 2) EDGE case tests. The first scenario intends to restore complex real-life environmental situations as much as possible in the lab and apartment rooms. The second scenario aims to verify their detection accuracy under different environmental temperatures. This scenario also considers different occupancy postures, such as lying down. Experimental results show that the detection accuracy of both LSTM models ( $>$ 95%) in both testing scenarios outperforms that of the SSD (around 82%–94%) and the Stat.ML (around 80%–90%).
Quanhan Xi, Benjamin Bloem-Reddy
Most modern probabilistic generative models, such as the variational autoencoder (VAE), have certain indeterminacies that are unresolvable even with an infinite amount of data. Different tasks tolerate different indeterminacies, however recent applications have indicated the need for strongly identifiable models, in which an observation corresponds to a unique latent code. Progress has been made towards reducing model indeterminacies while maintaining flexibility, and recent work excludes many--but not all--indeterminacies. In this work, we motivate model-identifiability in terms of task-identifiability, then construct a theoretical framework for analyzing the indeterminacies of latent variable models, which enables their precise characterization in terms of the generator function and prior distribution spaces. We reveal that strong identifiability is possible even with highly flexible nonlinear generators, and give two such examples. One is a straightforward modification of iVAE (arXiv:1907.04809 [stat.ML]); the other uses triangular monotonic maps, leading to novel connections between optimal transport and identifiability.
Samuel Albanie, Erika Lu, João F. Henriques
In the quiet backwaters of cs.CV, cs.LG and stat.ML, a cornucopia of new learning systems is emerging from a primordial soup of mathematics-learning systems with no need for external supervision. To date, little thought has been given to how these self-supervised learners have sprung into being or the principles that govern their continuing diversification. After a period of deliberate study and dispassionate judgement during which each author set their Zoom virtual background to a separate Galapagos island, we now entertain no doubt that each of these learning machines are lineal descendants of some older and generally extinct species. We make five contributions: (1) We gather and catalogue row-major arrays of machine learning specimens, each exhibiting heritable discriminative features; (2) We document a mutation mechanism by which almost imperceptible changes are introduced to the genotype of new systems, but their phenotype (birdsong in the form of tweets and vestigial plumage such as press releases) communicates dramatic changes; (3) We propose a unifying theory of self-supervised machine evolution and compare to other unifying theories on standard unifying theory benchmarks, where we establish a new (and unifying) state of the art; (4) We discuss the importance of digital biodiversity, in light of the endearingly optimistic Paris Agreement.
M. Surace, H. Angell, Christopher Innocenti et al.
Predictive biomarkers for response to IO therapies remain insufficient. Although multiplex immunofluorescence has the potential to provide superior biomarkers, the information garnered from these studies is frequently underleveraged. Due to the large number of markers that must be analyzed (6 - 40 +), and the complexity of the spatial information, the number of hypotheses is large and must be tested systematically and automatically. GraphITE (Graphs-based Investigation of Tissues with Embeddings) is a novel method of converting multiplex IF image analysis results into embeddings, numerical vectors which represent the phenotype of each cell as well as the immediate neighborhood. This allows for the clustering of embeddings based on similarity as well as the discovery of novel predictive biomarkers based on both the spatial and multimarker data in multiplex IF images. Here we demonstrate initial observations from deployment of GraphITE on 564 commercially-sourced NSCLC and HNSCC resections stained with a multiplex IF panel containing CD8, PDL1, PD1, CD68, Ki67, and CK.4 μm FFPE tumor sections were stained with CD8, PDL1, PD1, CD68, Ki67, and CK at Akoya Biosciences using OPAL TSA-linked fluorophores and imaged on a Vectra Polaris. Images were analyzed by Computational Biology (AstraZeneca). Graphs were built by mapping each cell in the mIF image as a node, using the X, Y coordinates and connecting nodes with edges according to distance. 64-dimensional embeddings were generated using Deep Graph InfoMax (DGI).1 Embeddings are downprojected to 2 dimensions using UMAP.2. Details are available in the preprint of the GraphITE methods manuscript.3A single downprojection was developed using embeddings from 158 HNSCC and 406 NSCLC cases. 60–80 distinct clusters were observed, some of which contained embeddings from both indications and others which were exclusive to one indication. Exclusive clusters describe tissue neighborhoods observed only in one indication. Drivers of cluster exclusivity included increased cell density in HNSCC as compared to NSCLC both in PD-L1- tumor centers with few infiltrating lymphocytes as well as in PD-L1- macrophagedominated neighborhoods. HNSCC and NSCLC embeddings were more colocalized in PD-L1+ tumor centers and in tumor stroma with high CD8+ or CD68+ immune cell content and high PD-L1+ expression.This study demonstrates the utility and potential of the GraphITE platform to discriminate between and describe both unique and common neighborhood-level features of the tumor microenvironment. Deploying GraphITE across multiple indications effectively leverages spatial heterogeneity and multimarker information from multiplex IF panels.1. Veličković P, Fedus W, Hamilton WL, Liò P, Bengio Y, DevonHjelm R. Deep Graph Infomax. 2018. arxiv:1809.10341 [stat.ML].2. McInnes L, Healy J, Melville J. UMAP: Uniform manifold approximationand projection for dimension reduction. 2020; arxiv:1802.03426 [stat.ML].3. Innocenti C, Zhang Z, Selvaraj B, Gaffney I, Frangos M, Cohen-Setton J, Dillon LAL, Surace MJ, Pedrinaci C, Hipp J, Baykaner K. An unsupervised graph embeddings approach to multiplex immunofluorescence image explorationbioRxiv 2021.06.09.447654; doi: https://doi.org/10.1101/2021.06.09.447654The study was approved by AstraZeneca.
Antonio Silveti-Falls, Cesare Molinari, Jalal Fadili
In this paper we propose and analyze inexact and stochastic versions of the CGALP algorithm developed in [25], which we denote ICGALP , that allow for errors in the computation of several important quantities. In particular this allows one to compute some gradients, proximal terms, and/or linear minimization oracles in an inexact fashion that facilitates the practical application of the algorithm to computationally intensive settings, e.g., in high (or possibly infinite) dimensional Hilbert spaces commonly found in machine learning problems. The algorithm is able to solve composite minimization problems involving the sum of three convex proper lower-semicontinuous functions subject to an affine constraint of the form Ax = b for some bounded linear operator A. Only one of the functions in the objective is assumed to be differentiable, the other two are assumed to have an accessible proximal operator and a linear minimization oracle. As main results, we show convergence of the Lagrangian values (so-called convergence in the Bregman sense) and asymptotic feasibility of the affine constraint as well as strong convergence of the sequence of dual variables to a solution of the dual problem, in an almost sure sense. Almost sure convergence rates are given for the Lagrangian values and the feasibility gap for the ergodic primal variables. Rates in expectation are given for the Lagrangian values and the feasibility gap subsequentially in the pointwise sense. Numerical experiments verifying the predicted rates of convergence are shown as well.
James W. Dunham, Jennifer Melot, D. Murdick
We describe a strategy for identifying the universe of research publications relevant to the application and development of artificial intelligence. The approach leverages the arXiv corpus of scientific preprints, in which authors choose subject tags for their papers from a set defined by editors. We compose a functional definition of AI relevance by learning these subjects from paper metadata, and then inferring the arXiv-subject labels of papers in larger corpora: Clarivate Web of Science, Digital Science Dimensions, and Microsoft Academic Graph. This yields predictive classification $F_1$ scores between .75 and .86 for Natural Language Processing (cs.CL), Computer Vision (cs.CV), and Robotics (cs.RO). For a single model that learns these and four other AI-relevant subjects (cs.AI, cs.LG, stat.ML, and cs.MA), we see precision of .83 and recall of .85. We evaluate the out-of-domain performance of our classifiers against other sources of topic information and predictions from alternative methods. We find that a supervised solution can generalize to identify publications that belong to the high-level fields of study represented on arXiv. This offers a method for identifying AI-relevant publications that updates at the pace of research output, without reliance on subject-matter experts for query development or labeling.
Anil K. Bera, Osman Doğan, Süleyman Taşpınar
Felipe Torres
The current human knowledge is written. Documenting is the most used manner to preserve memories and to store fantastic stories. Thus, to distinguish the reality from fiction, the scientific writing cites previous works moreover than become form experimental setups. Books and scientific papers are only a small part of the existent literature but are considered more thrust as information sources. It is useful to find more relations and to know where to focus the lookup of a topic using the information about the authors and the keywords on the titles and abstracts. This is possible using relational databases or knowledge graphs, a semantic approach, but with the tensor memory hypothesis, that adds a temporal dimension, is possible to process the information with an episodic memory approach. If well, knowledge graphs are of extended use on question answering and chatbots, they need a previous relational schema generated automatically or by-hand and stored in an easy-to-query file format. I use JATS, a standard format that allows integrating scientific papers in semantic searches but is not spread on all scientific publishers, to extract the markup tags from PDF files, current year journal articles of one particular topic, and then construct the tensors memory with their references to extract relations and predictions with statistical relational learning techniques. Introduction Memory is defined as the ability to record information and after recall it. Writing is a human invention that facilitates this capacity in particular for declarative memories that are facts or events that can be expressed with language and it could be of two types: semantic or episodic (Tresp et al., 2017). The memories and knowledge of humanity are stored on written documents, getting more reliability if they include references to previous works from others authors. Scientific articles are the model of well-structured presentation and storage of information, each one of them with an own title, explicit authorship, and references to information related to other documents or within the same document. But, what almost always is relevant for the consideration of reading them, the retrieval action, is their publishing year. Thus, their ordered structure makes possible to use them as a representation of global human episodic knowledge and memories. Also, scientific publication as a human activity could be modeled as a social network. From this kind of networks the expression “trending topic” emerged to call the more frequent term or word used in a specific temporal window and it is understood as the principal theme or main subject that is related to the information described in a piece of content. In a mathematical and computational framework, semantic memories could be represented as knowledge graphs, where the entities are nodes and the links are relations between them. A relation between entities is then possible to define as a triple (s, p, o) or as a simple sentence subject-predicate-objective. An episodic memory adds a time marker, thus a temporal prepositional phrase is added to the simple sentence: subject-predicate-objective-temporal_preposition Proceedings of the 4th Congress on Robotics and Neuroscience or a quad (s, p, o, t). This approach is widely used on semantic web technologies under the Linked Datamethodology (Bizer et al., 2011). Thus, it is plausible to use complex networks analysis tools to search for the most relevant relations between authors, paper titles or keywords. The scientific publication databases can easily contain millions of authors, papers and their respective citations. A reduced number of relevant documents is expected from a specific topic query, and not thousands of results that search engines like Google Scholar or publisher’s own engines could generate for a given chain of words. The field of science of science studies these relations and the former works were realized using knowledge graphs, that are expressed as adjacency matrices. If the temporal dimension and various types of relationships are considered, then its possible to form tensors of fourth order. A matrix X of the network could be bipartite (X ∈ Rn×m) if there are two types of nodes (authors-articles, authors-words, articles-words) or monopartite (X ∈ Rn×n); unweighted ( xij ∈ {0, 1}) or weighted (xij ∈ R), directed or undirected (XT = X) (Zeng et al., 2017). (Tresp and Ma, 2017) introduced the Tensor Memory Hypothesis, where a knowledge graph is represented by a Tucker decomposition of the tensors. It is based on representational learning, i.e, a discrete entity e is associated with a vector of real numbers ae called latent variables. (Tresp and Ma, 2017) also argue that representational learning might also be the basis for perception, planning and decision making. From a physiological point of view, there is evidence that the hippocampus plays a central role in the temporal organization of memories and supports the disambiguation of overlapping episodes (Eichenbaum, 2014a), then in the standard consolidation of memory theory (SCT), the episodic memory is a neocortical representation that arises from hippocampal activity while in the multiple trace theory (MTT) the episodic memory is only represented on the hippocampus and is used to form semantic memories on the neocortex. Also, there is evidence of the existence of “place cells” and “time cells”in the hippocampus and that these support associative networks that represent spatiotemporal relations between the entities of memories (Eichenbaum, 2014b). Table 1. PCA variance for the number of latent components. Latent Components PCA variance (%) 3 2.93 5 4.3 10 7.32 15 10.03 20 12.5 25 14.8 50 24.99 100 41.88 200 63.9 There are some previous works on trending or hot topics in science: (Griffiths and Steyvers, 2004) used Latent Dirichlet Allocation (LDA) to analyze the abstracts from Proceedings on the National Academy of Sciences (PNAS) from 1991 to 2001. (Wei et al., 2013) performed a statistical analysis to find if scientists follow hot topics on their investigations, they used published papers from the American Physical Society (APS) Physical Review journals beginning in 1976 and ending in 2009. (Kang and Lin, 2018) used non-smooth non-negative matrix factorization (snNMF) to extract themore prominent topics from a dataset of keywords from scientific articles related to "Machine Learning" from 2014 to 2016 in arXiv.org stat.ML, the similarity of this work with the Tensor Memory Hypothesis belongs to the use of matrix decomposition to reduce the rank of the matrix. (Alshareef et al., 2018) indexes based on cosine similarity to estimate a score that represents the anticipation of a prospective relationship between authors. They used two subsets of the IEEE digital library containing the keywords “database” and “multimedia”. Results The quantity of latent components is not associated with a specific statistical measure of data. However, to have an approach, table 1 presents the correspondent percentage of variance if the same number of PCA components were employed. Proceedings of the 4th Congress on Robotics and Neuroscience Table 2. Most probable words for the query with an entity type. Entity Type Latent Components Authors Articles Words 3 neuromodulation neuromodulation neuromodulation 5 stimulus, presented stimulus, presented stimulus, technique 10 presented presented presented 15 sleep, memory sleep sleep 20 stimulus, memory stimulus, cued stimulus, cued 25 memory, sws memory, spatial, sws memory, sws 50 sleep, stimulus sleep, stimulus sleep, stimulus 100 assr, memory assr, memory assr, memory 200 wireless, monitoring sleep, slow sleep, slow Table 3. Most probable word with NMF decomposition. Entity Type Latent Components Authors Articles Words 3 slow, sleep, auditory stimulation, sleep sleep, memory 5 spindles, auditory, sleep sleep sleep 10 sleep, stimulation sleep, stimulation sleep, memory 15 sleep, memory brain, consolidation sleep, memory 20 sleep, memory oscillations, sleep sleep, memory 25 sleep, stimulation activity, memory sleep, memory 50 sleep, memory oscillations, humans sleep, memory 100 sleep, role reactivation, slow-wave sleep, memory 200 sleep, slow sleep, brain sleep, memory The words with more relations in the complete tensor, before decomposition, are sleep, memory, stimulation, slow, brain, consolidation, auditory, spindles, reactivation, and activity. Table 2 is populated using a selection strategy of most frequently word from queries of the type wordi = argmaxo{P (s, o, t)}, (1) where s is each author, paper title or word in the database, o a word, t a year and, i is the index of a entity . The most probable words, from the same queries, using more latent components are more than using a few latent variables. For example, there are 21 different words from query results using 200 latent components. In the other hand for few latent components, the results of queries are only the words shown in table 2. Table 3 is populated using the of NMF decomposition in the collapsed on time matrix, adding the weights of each year. The more frequently words are selected from which are maximum for each topic or k-row in the matrix H of the decompositions. The same processing using nsNMF decomposition results with the words sleep and memory as the most probable for all the cases. The analysis of relationships between entities needs a metric of distance. Each entity is represented by latent vectors, then one metric selection could be the Euclidean distance but given this particular type of data, content from documents, the usual metric employed is the cosine similarity. However, the use of distances on the original data space demand high computational costs, the use of a reduced space alleviates the computational cost of calculating distances but requires a previous high cost of space transformation. Figure 1 is an example of the Euclidean Proceedings of the 4th Congress on Robotics
Marco P Lehmann, Alexander Aivazidis, Mohammad Javad Faraji et al.
FS Nathoo, ML Lesperance, AB Lawson et al.
In this article, we consider methods for Bayesian computation within the context of brain imaging studies. In such studies, the complexity of the resulting data often necessitates the use of sophisticated statistical models; however, the large size of these data can pose significant challenges for model fitting. We focus specifically on the neuroelectromagnetic inverse problem in electroencephalography, which involves estimating the neural activity within the brain from electrode-level data measured across the scalp. The relationship between the observed scalp-level data and the unobserved neural activity can be represented through an underdetermined dynamic linear model, and we discuss Bayesian computation for such models, where parameters represent the unknown neural sources of interest. We review the inverse problem and discuss variational approximations for fitting hierarchical models in this context. While variational methods have been widely adopted for model fitting in neuroimaging, they have received very little attention in the statistical literature, where Markov chain Monte Carlo is often used. We derive variational approximations for fitting two models: a simple distributed source model and a more complex spatiotemporal mixture model. We compare the approximations to Markov chain Monte Carlo using both synthetic data as well as through the analysis of a real electroencephalography dataset examining the evoked response related to face perception. The computational advantages of the variational method are demonstrated and the accuracy associated with the resulting approximations are clarified.
J.-H. Zhao, Philip L. H. Yu, Qibao Jiang
Ralf Brüggemann, Helmut Lütkepohl
AbstractJohansen's reduced‐rank maximum likelihood (ML) estimator for cointegration parameters in vector error correction models is known to produce occasional extreme outliers. Using a small monetary system and German data we illustrate the practical importance of this problem. We also consider an alternative generalized least squares (GLS) system estimator which has better properties in this respect. The two estimators are compared in a small simulation study. It is found that the GLS estimator can indeed be an attractive alternative to ML estimation of cointegration parameters.
Karen Paul
Abstract Fe(2 ML)/V(y ML) and interleaved Fe(2 ML)/V(y ML)/Fe(3 ML)/V(y ML) superlattice systems with spacer thicknesses, y, (4 ≤ y ≤ 17) were investigated macro-magnetically to estimate the coupling strength and the magnetoresistance in these materials, and particularly in the antiferromagnetically coupled monolayers. The results from the magnetic and magnetoresistive measurements indicate that adding one monolayer of Fe increases the antiferromagnetic coupling and the magnetoresistivity ratio from 0.0075 mJ/m2 at 20 K and 2 % at 10 K for Fe(2 ML)/V(y ML), to 0.05 mJ/m2 and 2.5 % for Fe(2 ML)/V(y ML)/Fe(3 ML)/V(y ML) at the same temperatures. Both systems exhibit in-plane magnetic and magnetoresistive isotropy, therefore the increase of the conferred physical parameters is attributed mainly to the stresses at the interface as governing mechanisms over the magnetoelastic forces.
Dimitris Karlis
Naoto Kunitomo
Halaman 1 dari 7967