StataCorp announces the 16th major release of its data science software—Stata. Stata has been a mainstay of health, economic, social science, and business researchers for 34 years. This latest release will not only delight Stata’s core users but also entice other data scientists to take a fresh look at Stata.
Bill Gould, President of StataCorp and Head of Development, said about the release, “This release posed challenges like we have never seen. Statistical. Software engineering. Epistemological. Our users don’t and shouldn’t care. The results solve real problems elegantly, and that’s what matters.”
Stata 16 adds lasso models as Stata introduces its first official suite of commands for machine learning. True to Stata’s roots in statistical methods, its lasso tools provide not just model selection and prediction but inferential lasso models that allow researchers to estimate the effects that answer real scientific questions.
This release also expands Stata’s tools for reproducible reporting. Release 16 includes more automation of Word documents that include both statistical and graphical results. Its dynamic features allow reports to update automatically as data update. And Stata’s integrated versioning ensures users can reproduce any results, any time.
Stata’s core user camps will be pleased that release 16 incorporates a complete suite of meta-analysis features. The new suite—with additions such as forest plots and meta-regression—allows users to combine results from multiple studies in a principled way.
The new integration with Python will appeal to more technical data scientists. Stata now has seamless two-way communication with Python, including instructions, data, and metadata. The integration provides access to the full range of Python packages—machine learning, 3-D graphics, data scraping, and more. Alan Riley, Vice President of Software Development, said, “Stata and Python are perfect complements. I keep finding reasons to use them together, and I can’t wait to see what our users create.”
Stata already possessed the most approachable set of Bayesian analysis features available—opening Bayesian statistics to those otherwise put off by the specialized requirements of other software. Release 16 adds support for multiple chains, Bayesian predictions, the Gelman–Rubin convergence diagnostic, and posterior predictive p-values.
Release 16 further extends Stata’s unique extended regression models. These models address common problems that arise in data—endogenous covariates (unobserved confounding), sample selection, and nonrandom treatment assignment. What is unique is that all problems can be handled simultaneously. As Vince Wiggins, Vice President of Scientific Development, stated, “The statistics aren’t new, just the way we stitched them together.” Stata 16 adds panel data (multilevel data) to the allowed problems list.
As usual, StataCorp packs lots of new features into each new release. Features not mentioned above include multiple datasets in memory, importing from SAS and SPSS, nonparametric series regression, sample-size analysis for confidence intervals, panel-data mixed logit, nonlinear DSGE models, multiple-group IRT, panel-data Heckman-selection models, time extensions for nonlinear mixed-effects models for pharmacokinetic models and growth models, numerical integration, linear programming, and Do-file Editor autocompletion.
For over 30 years, StataCorp has been a leader in statistical and data science software. Stata provides everything for research professionals’ data science needs—data manipulation, visualization, statistics, and reproducible reporting.