The Regressinator: A Simulation Tool for Teaching Regression Assumptions and Diagnostics in R
Alex Reinhart
When students learn linear regression, they must learn to use diagnostics to check and improve their models. Model-building is an expert skill requiring the interpretation of diagnostic plots, an understanding of model assumptions, the selection of appropriate changes to remedy problems, and an intuition for how potential problems may affect results. Simulation offers opportunities to practice these skills, and is already widely used to teach important concepts in sampling, probability, and statistical inference. Visual inference, which uses simulation, has also recently been applied to regression instruction. This article presents the regressinator, an R package designed to facilitate simulation and visual inference in regression settings. Simulated regression problems can be easily defined with minimal programming, using the same modeling and plotting code students may already learn. The simulated data can then be used for model diagnostics, visual inference, and other activities, with the package providing functions to facilitate common tasks with a minimum of programming. Example activities covering model diagnostics, statistical power, and model selection are shown for both advanced undergraduate and Ph.D.-level regression courses.
Probabilities. Mathematical statistics, Special aspects of education
انحدار لاسو المكيف وانحدار المنظم المنفصل وانحدار دلاسو: دراسة محاكاة لاختيار المتغيرات
HUSSEIN HASHEM
في هذا البحث نقارن ثلاث طرق رئيسية لاختيار المتغيرات لنماذج الانحدار الخطي: انحدار لاسو المكيف ، وانحدار المنظم المنفصل (SRR)، و انحدار لاسو(AIC, GIC, BIC, CGV). نعرض أداء هذه الأساليب في دراسة محاكاة من خلال النظر في خطأ النموذج المتوسط. نحن نعتبر أيضًا الحالة التي يتجاوز فيها عدد المتغيرات المستقلة عدد المشاهدات. تحدد دراسة المحاكاة الطرق الأفضل في جميع سيناريوهات الانحدار الخطي.
Probabilities. Mathematical statistics
Comment on ‘What Protects the Autonomy of the Federal Statistics Agencies? An Assessment of the Procedures in Place That Protect the Independence and Objectivity of Official Statistics” by Pierson et al.
Stephen Penneck
Political institutions and public administration (General), Probabilities. Mathematical statistics
Learner Engagement and Demographic Influences in Brazilian Massive Open Online Courses: Aprenda Mais Platform Case Study
Júlia Marques Carvalho da Silva, Gabriela Hahn Pedroso, Augusto Basso Veber
et al.
This paper explores the dynamics of student engagement and demographic influences in Massive Open Online Courses (MOOCs). The study analyzes multiple facets of Brazilian MOOC participation, including re-enrollment patterns, course completion rates, and the impact of demographic characteristics on learning outcomes. Using survey data and statistical analyses from the public Aprenda Mais Platform, this study reveals that MOOC learners exhibit a strong tendency toward continuous learning, with a majority re-enrolling in subsequent courses within a short timeframe. The average completion rate across courses is around 42.14%, with learners maintaining consistent academic performance. Demographic factors, notably, race/color and disability, are found to influence enrollment and completion rates, underscoring the importance of inclusive educational practices. Geographical location impacts students’ decision to enroll in and complete courses, highlighting the necessity for region-specific educational strategies. The research concludes that a diverse array of factors, including content interest, personal motivation, and demographic attributes, shape student engagement in MOOCs. These insights are vital for educators and course designers in creating effective, inclusive, and engaging online learning experiences.
Electronic computers. Computer science, Probabilities. Mathematical statistics
On the Behavior of the Nonlinear Difference Equation yn+1 = Ayn−1 + Byn−3 + Cyn−1+Dyn−3/Fyn−3−E
Turki D. Alharbi, Elsayed M. Elsayed
The theory of difference equations got a significant position in the applicable analysis. Therefore, studying the qualitative behavior of the difference equations is a fruitful area of research that has increasingly attracted many researchers. In this paper, we demonstrate the stability and the existence of periodic solutions of the nonlinear difference equation. Moreover, we provide some numerical simulations to confirm our results.
Probabilities. Mathematical statistics, Analysis
Analyzing Online Marketing for Contemporary Snacks Through Instagram using SIR Model
Nur Syazwani Atikah Mohd Hilmi, Noorzila Sharif
During the implementation of the Movement Control Order (MCO), e-marketing became a favourable solution for sellers to interact with customers at home. Today, even though the MCO is no longer in existence, online shopping behaviours have become a new norm because of a long-term impact on the interactions between seller and buyer. Since the number of online marketing is growing, sellers must understand the concept of viral marketing that suits their products. This study analyses the virality of contemporary snack marketing on the Instagram platform using the Susceptible-Infected-Recovery (SIR) model. It is essential for sellers or marketers who want to see how users react to various types of posting. It also makes it easier for customers to get the information from the postings. The parameters used in this model are the number of followers, number of reached users, and number of homepage visits. Data are collected from 14 July 2021 until 11 October 2021 at an Instagram account that promotes a type of chocolate-in-jar snack. There are three types of posting that the admin of the Instagram account frequently uses to gain customer interaction, which are Feed, Reels, and Stories. The most viral posting from each type is observed. The results reveal among the viral posts, Reels recorded the highest virality  posted on 11 September 2021 (Saturday).
Probabilities. Mathematical statistics, Technology
Assessing Performance of the Generalized Exponential Model in the Presence of the Interval Censored Data with Covariate
Nada Alharbi, Jayanthi A., Haizum A.
et al.
This study aims to extend the generalized exponential model (GEM) to include covariates in the presence of interval-censored data. The maximum likelihood estimator (MLE) was obtained for the parameter of the model formulated. Afterward, a thorough simulation study was carried out to evaluate the estimator's performance based on the values of bias, standard error (SE), and root mean square error (RMSE). The result indicated that the (SE) and (RMSE) decrease with the increase in sample sizes and decrease in censoring proportions. Finally, the performance of the Wald confidence interval estimation technique for the GE model with interval-censored data covariate was assessed by a coverage probability study at several censoring proportions and different sample sizes.
Probabilities. Mathematical statistics, Statistics
A Survey in Mathematical Language Processing
Jordan Meadows, Andre Freitas
Informal mathematical text underpins real-world quantitative reasoning and communication. Developing sophisticated methods of retrieval and abstraction from this dual modality is crucial in the pursuit of the vision of automating discovery in quantitative science and mathematics. We track the development of informal mathematical language processing approaches across five strategic sub-areas in recent years, highlighting the prevailing successful methodological elements along with existing limitations.
Appropriation des technologies de l'information et de la communication et processus de management des connaissances dans l'administration publique marocaine
FATIMA EZZAHRA EL ADRAOUI, Lalla Hind EL IDRISSI
Aujourd’hui, la mise en place des projets de gestion des connaissances est fortement sollicitée et facilitée par la mobilisation des technologies de l’information et de communication (TIC) qui offrent les moyens de visibilité pratiques permettant de générer une base énorme de connaissances facilement accessible et exploitable par les différents utilisateurs.
Le présent travail a pour objet d’explorer le rôle assigné aux technologies de l’information et de la communication (TIC) dans les processus de gestion des connaissances, afin de faciliter la gestion du capital intellectuel des organisations.
Il s’agira précisément de comprendre l’apport de l'adoption et l'utilisation des technologies de l’information et de la communication (TIC) dans les différentes étapes du processus de gestion des connaissances, plus particulièrement le cas d’une administration publique marocaine.
Science, Probabilities. Mathematical statistics
On the mathematical structure of quantum models of computation based on Hamiltonian minimisation
Jacob Biamonte
Determining properties of ground states of spin Hamiltonians remains a topic of central relevance connecting disciplines of mathematical, theoretical and applied physics. In the last few decades, ground state properties of physical systems have been increasingly considered as computational resources. This thesis develops parts of the mathematical apparatus to create (program) ground states relevant for quantum and classical computation. The core findings presented in this thesis (now over a decade old) including that (i) logic operations (gates) can be embedded into the low-energy sector of Ising spins whereas three (and higher) body Ising interaction terms can be mimicked through the minimisation of 2- and 1-body Ising terms yet require the introduction of slack spins; (ii) Perturbation theory gadgets enable the emulation of interactions not present in a given Hamiltonian, e.g.~$YY$ interactions can be realized from $ZZ$, $XX$, the thesis contains a result from 2007 showing that physically relevant two-body model Hamiltonian's have a QMA-hard ground state energy decision problem. Merged with other results, this established that these models provide a universal resource for ground state quantum computation. More recent findings include the proof that an idealised version of the contemporary variational approach to quantum algorithms enables a universal model of quantum computation. Other related results are also presented as they relate to ground state quantum computation and the minimisation of Hamiltonians by quantum circuits. The topics covered include: Ising model reductions, stochastic versus quantum processes on graphs, quantum gates and circuits as tensor networks, variational quantum algorithms and Hamiltonian gadgets.
Concentration inequality for U-statistics of order two for uniformly ergodic Markov chains
Quentin Duchemin, Yohann de Castro, Claire Lacour
We prove a new concentration inequality for U-statistics of order two for uniformly ergodic Markov chains. Working with bounded and $π$-canonical kernels, we show that we can recover the convergence rate of Arcones and Gin{é} who proved a concentration result for U-statistics of independent random variables and canonical kernels. Our result allows for a dependence of the kernels $h_{i,j}$ with the indexes in the sums, which prevents the use of standard blocking tools. Our proof relies on an inductive analysis where we use martingale techniques, uniform ergodicity, Nummelin splitting and Bernstein's type inequality. Assuming further that the Markov chain starts from its invariant distribution, we prove a Bernstein-type concentration inequality that provides sharper convergence rate for small variance terms.
m-Polar Fuzzy Hyperideals in LA-Semihypergroups
Ahmed Elmoasry, Naveed Yaqoob
In this paper, we initiate a study of m-polar fuzzy sets in hyperstructure theory, particularly in left almost semihypergroups. We define an m-polar fuzzy left (right, two sided) hyperideal in a left almost semihypergroup and provided some results on these m-polar fuzzy left (right, two sided) hyperideals.
Probabilities. Mathematical statistics, Analysis
On a Geometric Representation of Probability Laws and of a Coherent Prevision-Function According to Subjectivistic Conception of Probability
Pierpaolo Angelini, Angela De Sanctis
We distinguish the two extreme aspects of the logic of certainty by identifying their corresponding structures into a linear space. We extend probability laws P formally admissible in terms of coherence to random quantities. We give a geometric representation of these laws P and of a coherent prevision function P which we previously defined in an original way. We are the first in the world to do this kind of work: it is the foundation of our next and extensive study concerning the formulation of a geometric, wellorganized and original theory of random quantities.
Mathematics, Probabilities. Mathematical statistics
Preservice Secondary Mathematics Teachers’ Statistical Knowledge: A Snapshot of Strengths and Weaknesses
Jennifer N. Lovett, Hollylynne S. Lee
Amid the implementation of new curriculum standard regarding statistics and new recommendations for preservice secondary mathematics teachers (PSMTs) to teach statistics, there is a need to examine the current state of PSMTs’ knowledge of the statistical content they will be expected to teach. This study reports on the statistical knowledge of 217 PSMTs from a purposeful sample of 18 universities across the United States. The results show that PSMTs may not have strong Common Statistical Knowledge that is needed to teach statistics to high school students. PSMTs’ strengths include identifying appropriate measures of center, while weaknesses involve issues with variability, sampling distributions, p-values, and confidence intervals.
Special aspects of education, Probabilities. Mathematical statistics
Estimation Through Array-Based Group Tests
João Paulo Martins , Miguel Felgueiras , Rui Santos
Pooling individual samples for batch testing is a common procedure for reducing costs. The recent use of multidimensional array algorithms, due to the emergence of robotic pooling, is an innovative way of pooling. We show that the two-dimensional arraybased group tests can provide accurate estimates for the prevalence rate even for situations in which the traditional estimators, applied to one-dimensional arrays, are not valid. Hence, a computational script was developed to determine which prevalence rate estimate minimizes the sum of the squared deviations between the number of observed and expected rows and columns whose pooled sample had a positive test result.
Statistics, Probabilities. Mathematical statistics
Powerful statistical inference for nested data using sufficient summary statistics
Irene Dowding, Stefan Haufe
Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This 'naive' approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment.
Book Reviews: Rehman M. KHAN (2013). Problem Solving and Data Analysis using Minitab — A Clear and Easy Guide to Six Sigma Methodology.
R. Viertl
Rehman M. KHAN (2013). Problem Solving and Data Analysis using Minitab — A Clear and Easy Guide to Six Sigma Methodology. (Wiley Series in Probability and Statistics), John Wiley & Sons Ltd, Chichester, UK, 484 Seiten, gebunden, ISBN 978-1-118-30757-1 (cloth) (e57.60)
Probabilities. Mathematical statistics, Statistics
Tightness of stationary distributions of a flexible-server system in the Halfin-Whitt asymptotic regime
Alexander L. Stolyar
A large-scale flexible service system with two large server pools and two types of customers is considered. Servers in pool 1 can only serve type 1 customers, while server in pool 2 are flexible – they can serve both types 1 and 2. (This is a so-called “N-system.”) The customer arrival processes are Poisson and customer service requirements are independent exponentially distributed. The service rate of a customer depends both on its type and the pool where it is served. A priority service discipline, where type 2 has priority in pool 2, and type 1 prefers pool 1, is considered. We consider the Halfin-Whitt asymptotic regime, where the arrival rate of customers and the number of servers in each pool increase to infinity in proportion to a scaling parameter n, while the overall system capacity exceeds its load by O(√n). <br/>
For this system we prove tightness of diffusion-scaled stationary distributions. Our approach relies on a single common Lyapunov function <i>G</i><sup>(<i>n</i>)</sup>(<i>x</i>), depending on parameter <i>n</i> and defined on the entire state space as a functional of the drift-based fluid limits (DFL). Specifically, <i>G</i><sup>(<i>n</i>)</sup>(<i>x</i>)=∫<sup>∞</sup><sub>0</sub><i>g</i>(<i>y</i><sup>(<i>n</i>)</sup>(<i>t</i>))<i>dt</i>, where <i>y</i><sup>(<i>n</i>)</sup>(⋅) is the DFL starting at <i>x</i>, and <i>g</i>(⋅) is a “distance” to the origin. (<i>g</i>(⋅) is same for all <i>n</i>). The key part of the analysis is the study of the (first and second) derivatives of the DFLs and function <i>G</i><sup>(<i>n</i>)</sup>(<i>x</i>). The approach, as well as many parts of the analysis, are quite generic and may be of independent interest.
Probabilities. Mathematical statistics, Applied mathematics. Quantitative methods
Statistical Methods and Computing for Big Data
Chun Wang, Ming-Hui Chen, Elizabeth Schifano
et al.
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard software tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article reviews recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and sequential updating for stream data. Software review focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay.
Fruit flies and moduli: interactions between biology and mathematics
Ezra Miller
Possibilities for using geometry and topology to analyze statistical problems in biology raise a host of novel questions in geometry, probability, algebra, and combinatorics that demonstrate the power of biology to influence the future of pure mathematics. This expository article is a tour through some biological explorations and their mathematical ramifications. The article starts with evolution of novel topological features in wing veins of fruit flies, which are quantified using the algebraic structure of multiparameter persistent homology. The statistical issues involved highlight mathematical implications of sampling from moduli spaces. These lead to geometric probability on stratified spaces, including the sticky phenomenon for Frechet means and the origin of this mathematical area in the reconstruction of phylogenetic trees.