For a simple model of shallow and wide neural networks, we show that the epigraph of its input-output map as a function of the network parameters approximates epigraph of a. convex function in a precise sense. This leads to a plausible explanation of their observed good performance.
This paper proposes the Next-Depth Lookahead Tree (NDLT), a single-tree model designed to improve performance by evaluating node splits not only at the node being optimized but also by evaluating the quality of the next depth level.
This paper introduces a novel Kalman filter framework designed to achieve robust state estimation under both process and measurement noise. Inspired by the Weighted Observation Likelihood Filter (WoLF), which provides robustness against measurement outliers, we applied generalized Bayesian approach to build a framework considering both process and measurement noise outliers.
Despite their performance and widespread use, little is known about the theory of Random Forests. A major unanswered question is whether, or when, the Random Forest algorithm is consistent. The literature explores various variants of the classic Random Forest algorithm to address this question and known short-comings of the method. This paper is a contribution to this literature. Specifically, the suitability of grafting consistent estimators onto a shallow CART is explored. It is shown that this approach has a consistency guarantee and performs well in empirical settings.
We study a new technique for understanding convergence of learning agents under small modifications of data. We show that such convergence can be understood via an analogue of Fatou's lemma which yields gamma-convergence. We show it's relevance and applications in general machine learning tasks and domain adaptation transfer learning.
Variational inference uses optimization, rather than integration, to approximate the marginal likelihood, and thereby the posterior, in a Bayesian model. Thanks to advances in computational scalability made in the last decade, variational inference is now the preferred choice for many high-dimensional models and large datasets. This tutorial introduces variational inference from the parametric perspective that dominates these recent developments, in contrast to the mean-field perspective commonly found in other introductory texts.
Wesley Cowan, Michael N. Katehakis, Sheldon M. Ross
We study new types of dynamic allocation problems the {\sl Halting Bandit} models. As an application, we obtain new proofs for the classic Gittins index decomposition result and recent results of the authors in `Multi-armed bandits under general depreciation and commitment.'
This work provides an online learning rule that is universally consistent under processes on (X,Y) pairs, under conditions only on the X process. As a special case, the conditions admit all processes on (X,Y) such that the process on X is stationary. This generalizes past results which required stationarity for the joint process on (X,Y), and additionally required this process to be ergodic. In particular, this means that ergodicity is superfluous for the purpose of universally consistent online learning.
We consider an extension of Leo Breiman's thesis from "Statistical Modeling: The Two Cultures" to include a bifurcation of algorithmic modeling, focusing on parametric regressions, interpretable algorithms, and complex (possibly explainable) algorithms.
Background: Laboratory examination has several factors that can affect the results of the examination, one of which is the pre-analytic factor that can affect the results of erythrocyte examination is the ratio between blood volume and anticoagulant. If the blood volume is insufficient, the anticoagulant causes red blood cells to become krenated, and if the excess blood volume can cause anticoagulants it can cause blood clots. Research Objective: This study aims to determine the ratio of the number of erythrocytes in the blood sample volume of 3 mL, 2 mL, and 1 mL with anticoagulant K2EDTA. Research Methods: This study used primary data with a hematological examination at the UTD RSUD Dr. H. Abdul Moeloek Bandar Lampung. This type of research is quantitative using an observational analytic design with a crossapproach sectional through a hematological examination using the Hematology Alayzer Mindray BC-3600 with a sample size of 40 respondents who meet the inclusion and exclusion criteria. Results: The results of the mean examination of the number of erythrocytes between the blood volume of 1 mL, 2 mL, 3 mL with the anticoagulant K2EDTA had different results, at a volume of 3 mL showed the lowest results. Conclusion: There is no significant difference between the examination of the number of erythrocytes with the blood sample volume of 1 mL, 2 mL, and 3 mL in thetube vacutainer K2EDTA
 Keywords: Hematology Examinatio;, Blood Volume; K2EDTA.
Information theory gives rise to a novel method for causal skeleton discovery by expressing associations between variables as tensors. This tensor-based approach reduces the dimensionality of the data needed to test for conditional independence, e.g., for systems comprising three variables, the causal skeleton can be determined using pair-wise determined tensors. To arrive at this result, an additional information measure, path information, is proposed.
We derive the limiting distribution for the largest eigenvalues of the adjacency matrix for a stochastic blockmodel graph when the number of vertices tends to infinity. We show that, in the limit, these eigenvalues are jointly multivariate normal with bounded covariances. Our result extends the classic result of Füredi and Komlós on the fluctuation of the largest eigenvalue for Erdős-Rényi graphs.
This is the Proceedings of the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), which was held in Stockholm, Sweden, July 14, 2018. Invited speakers were Barbara Engelhardt, Cynthia Rudin, Fernanda Viégas, and Martin Wattenberg.
In this paper we cast the well-known convolutional neural network in a Gaussian process perspective. In this way we hope to gain additional insights into the performance of convolutional networks, in particular understand under what circumstances they tend to perform well and what assumptions are implicitly made in the network. While for fully-connected networks the properties of convergence to Gaussian processes have been studied extensively, little is known about situations in which the output from a convolutional network approaches a multivariate normal distribution.
This is full length article (draft version) where problem number of topics in Topic Modeling is discussed. We proposed idea that Renyi and Tsallis entropy can be used for identification of optimal number in large textual collections. We also report results of numerical experiments of Semantic stability for 4 topic models, which shows that semantic stability play very important role in problem topic number. The calculation of Renyi and Tsallis entropy based on thermodynamics approach.
Been Kim, Dmitry M. Malioutov, Kush R. Varshney
et al.
This is the Proceedings of the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), which was held in Sydney, Australia, August 10, 2017. Invited speakers were Tony Jebara, Pang Wei Koh, and David Sontag.
In this document we are going to derive the equations needed to implement a Variational Bayes estimation of the parameters of the simplified probabilistic linear discriminant analysis (SPLDA) model. This can be used to adapt SPLDA from one database to another with few development data or to implement the fully Bayesian recipe. Our approach is similar to Bishop's VB PPCA.
Annealed importance sampling (AIS) is a common algorithm to estimate partition functions of useful stochastic models. One important problem for obtaining accurate AIS estimates is the selection of an annealing schedule. Conventionally, an annealing schedule is often determined heuristically or is simply set as a linearly increasing sequence. In this paper, we propose an algorithm for the optimal schedule by deriving a functional that dominates the AIS estimation error and by numerically minimizing this functional. We experimentally demonstrate that the proposed algorithm mostly outperforms conventional scheduling schemes with large quantization numbers.
The problem of convex optimization is studied. Usually in convex optimization the minimization is over a d-dimensional domain. Very often the convergence rate of an optimization algorithm depends on the dimension d. The algorithms studied in this paper utilize dictionaries instead of a canonical basis used in the coordinate descent algorithms. We show how this approach allows us to reduce dimensionality of the problem. Also, we investigate which properties of a dictionary are beneficial for the convergence rate of typical greedy-type algorithms.