This paper provides a formalism for an important class of causal inference problems inspired by user-advertiser interaction in online advertiser. Then this formalism is specialized to an extension of temporal marked point processes and the neural point processes are suggested as practical solutions to some interesting special cases.
This article considers a semi-supervised classification setting on a Gaussian mixture model, where the data is not labeled strictly as usual, but instead with uncertain labels. Our main aim is to compute the Bayes risk for this model. We compare the behavior of the Bayes risk and the best known algorithm for this model. This comparison eventually gives new insights over the algorithm.
This note addresses computational difficulty of the Gromov-Wasserstein distance frequently mentioned in the literature. We provide details on the structure of the Gromov-Wasserstein distance optimization problem that show its non-convex quadratic nature for any instance of an input data. We further illustrate the non-convexity of the problem with several explicit examples.
In this note, we consider the problem of robust learning mixtures of linear regressions. We connect mixtures of linear regressions and mixtures of Gaussians with a simple thresholding, so that a quasi-polynomial time algorithm can be obtained under some mild separation condition. This algorithm has significantly better robustness than the previous result.
We prove Carl's type inequalities for the error of approximation of compact sets K by deep and shallow neural networks. This in turn gives lower bounds on how well we can approximate the functions in K when requiring the approximants to come from outputs of such networks. Our results are obtained as a byproduct of the study of the recently introduced Lipschitz widths.
-The fluctuation effect of gradient expectation and variance caused by parameter update between consecutive iterations is neglected or confusing by current mainstream gradient optimization algorithms. The work in this paper remedy this issue by introducing a novel unbiased stratified statistic \ $\bar{G}_{mst}$\ , a sufficient condition of fast convergence for \ $\bar{G}_{mst}$\ also is established. A novel algorithm named MSSG designed based on \ $\bar{G}_{mst}$\ outperforms other sgd-like algorithms. Theoretical conclusions and experimental evidence strongly suggest to employ MSSG when training deep model.
We study the problem of testing the existence of a heterogeneous dense subhypergraph. The null hypothesis corresponds to a heterogeneous Erdös-Rényi uniform random hypergraph and the alternative hypothesis corresponds to a heterogeneous uniform random hypergraph that contains a dense subhypergraph. We establish detection boundaries when the edge probabilities are known and construct an asymptotically powerful test for distinguishing the hypotheses. We also construct an adaptive test which does not involve edge probabilities, and hence, is more practically useful.
Background: Laboratory examination has several factors that can affect the results of the examination, one of which is the pre-analytic factor that can affect the results of erythrocyte examination is the ratio between blood volume and anticoagulant. If the blood volume is insufficient, the anticoagulant causes red blood cells to become krenated, and if the excess blood volume can cause anticoagulants it can cause blood clots. Research Objective: This study aims to determine the ratio of the number of erythrocytes in the blood sample volume of 3 mL, 2 mL, and 1 mL with anticoagulant K2EDTA. Research Methods: This study used primary data with a hematological examination at the UTD RSUD Dr. H. Abdul Moeloek Bandar Lampung. This type of research is quantitative using an observational analytic design with a crossapproach sectional through a hematological examination using the Hematology Alayzer Mindray BC-3600 with a sample size of 40 respondents who meet the inclusion and exclusion criteria. Results: The results of the mean examination of the number of erythrocytes between the blood volume of 1 mL, 2 mL, 3 mL with the anticoagulant K2EDTA had different results, at a volume of 3 mL showed the lowest results. Conclusion: There is no significant difference between the examination of the number of erythrocytes with the blood sample volume of 1 mL, 2 mL, and 3 mL in thetube vacutainer K2EDTA
 Keywords: Hematology Examinatio;, Blood Volume; K2EDTA.
We explore if it is possible to learn a directed acyclic graph (DAG) from data without imposing explicitly the acyclicity constraint. In particular, for Gaussian distributions, we frame structural learning as a sparse matrix factorization problem and we empirically show that solving an $\ell_1$-penalized optimization yields to good recovery of the true graph and, in general, to almost-DAG graphs. Moreover, this approach is computationally efficient and is not affected by the explosion of combinatorial complexity as in classical structural learning algorithms.
We study the implicit bias of AdaGrad on separable linear classification problems. We show that AdaGrad converges to a direction that can be characterized as the solution of a quadratic optimization problem with the same feasible set as the hard SVM problem. We also give a discussion about how different choices of the hyperparameters of AdaGrad might impact this direction. This provides a deeper understanding of why adaptive methods do not seem to have the generalization ability as good as gradient descent does in practice.
Bias and heterogeneity in peer assessment can lead to the issue of unfair scoring in the educational field. To deal with this problem, we propose a reference ranking method for an online peer assessment system using HodgeRank. Such a scheme provides instructors with an objective scoring reference based on mathematics.
We propose a new class of transforms that we call {\it Lehmer Transform} which is motivated by the {\it Lehmer mean function}. The proposed {\it Lehmer transform} decomposes a function of a sample into their constituting statistical moments. Theoretical properties of the proposed transform are presented. This transform could be very useful to provide an alternative method in analyzing non-stationary signals such as brain wave EEG.
In this paper, we analyze the Wisconsin Diagnostic Breast Cancer Data using Machine Learning classification techniques, such as the SVM, Bayesian Logistic Regression (Variational Approximation), and K-Nearest-Neighbors. We describe each model, and compare their performance through different measures. We conclude that SVM has the best performance among all other classifiers, while it competes closely with the Bayesian Logistic Regression that is ranked second best method for this dataset.
$\textit{Pymc-learn}$ is a Python package providing a variety of state-of-the-art probabilistic models for supervised and unsupervised machine learning. It is inspired by $\textit{scikit-learn}$ and focuses on bringing probabilistic machine learning to non-specialists. It uses a general-purpose high-level language that mimics $\textit{scikit-learn}$. Emphasis is put on ease of use, productivity, flexibility, performance, documentation, and an API consistent with $\textit{scikit-learn}$. It depends on $\textit{scikit-learn}$ and $\textit{pymc3}$ and is distributed under the new BSD-3 license, encouraging its use in both academia and industry. Source code, binaries, and documentation are available on http://github.com/pymc-learn/pymc-learn.
The standard interpretation of importance-weighted autoencoders is that they maximize a tighter lower bound on the marginal likelihood than the standard evidence lower bound. We give an alternate interpretation of this procedure: that it optimizes the standard variational lower bound, but using a more complex distribution. We formally derive this result, present a tighter lower bound, and visualize the implicit importance-weighted distribution.
We consider stationary autoregressive processes with coefficients restricted to an ellipsoid, which includes autoregressive processes with absolutely summable coefficients. We provide consistency results under different norms for the estimation of such processes using constrained and penalized estimators. As an application we show some weak form of universal consistency. Simulations show that directly including the constraint in the estimation can lead to more robust results.
We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the Whittle index is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about Schur-convexity and mechanical words, which are particular binary strings intimately related to palindromes.
This technical note extends recent results on the computational complexity of globally minimizing the error of piecewise-affine models to the related problem of minimizing the error of switching linear regression models. In particular, we show that, on the one hand the problem is NP-hard, but on the other hand, it admits a polynomial-time algorithm with respect to the number of data points for any fixed data dimension and number of modes.
We show that if $F$ is a convex class of functions that is $L$-subgaussian, the error rate of learning problems generated by independent noise is equivalent to a fixed point determined by `local' covering estimates of the class, rather than by the gaussian averages. To that end, we establish new sharp upper and lower estimates on the error rate for such problems.
It is shown that bootstrap approximations of support vector machines (SVMs) based on a general convex and smooth loss function and on a general kernel are consistent. This result is useful to approximate the unknown finite sample distribution of SVMs by the bootstrap approach.