Tutorial on tidymodels for Machine Learning

Set Up Data Set: Diamonds Separating Testing and Training Data: rsample Data Pre-Processing and Feature Engineering: recipes Defining and Fitting Models: parsnip Summarizing Fitted Models: broom Evaluating Model Performance: yardstick Tuning Model Parameters: tune and dials Preparing a parsnip Model for Tuning Preparing Data for Tuning: recipes Combine Everything: workflows Selecting the Best Model to Make the Final Predictions Summary Further Resources Session Info Updates caret is a well known R package for machine learning, which includes almost everything from data pre-processing to cross-validation. [Read More]

Regression Modeling With Proportion Data (Part 2)

Attendance in Handball-Bundesliga Rose By 7 % After World Championship

Data Analyses: Beta and Quasi-Binomial Regression Results Plot Model Comparison Effect Size In the first part of this post, I demonstrated how beta and quasi-binomial regression can be used with dependent variables that are proportions or ratios. I applied these models to attendance rates of the German Handball-Bundesliga. In the second part, I want to investigate whether attendance increased after the World Championship that took place in January 2019 in Denmark and Germany (with a new spectator record). [Read More]

Regression Modeling With Proportion Data (Part 1)

Predicting Attendance in the German Handball-Bundesliga

Modeling Proportion Data Application: Handball-Bundesliga Setup Selected Variables Initial Results for Beta Regression Illustrative Plot of Estimates Residuals Model Comparisons Models Considered Model Performance Prediction of Future Matches Resources As a data scientist, one often encounters dependent variables that are proportions: for example, the number of successes divided by the number of attempts, party vote, proportion of money spent for something, or the attendance rate of public events. [Read More]

Categorical Predictors in ANOVA and Regression

Regression Perspective ANOVA and SPSS Perspective How to Combine the Perspectives? Solution Examples Example data Dummy Coding Planned Comparisons/Contrast Coding Helmert Coding Orthogonal and Nonorthognoal Contrasts References Data with categorical predictors such as groups, conditions, or countries can be analyzed in a regression framework as well as in an ANOVA framework. In either case, the grouping variable needs to be recoded, it cannot enter the model like a continuous predictor such as age or income. [Read More]

Tutorial: Rasch and 2PL Model in R

Setup Data Rasch Model Plots Model Identification Note on Item Parameters in eRm Package MML Estimation 2PL Model Model Fit Relative Fit of Rasch and 2PL Model Absolute Fit of the Rasch Model DIF Person Parameters ML MAP and EAP Item and Test Information References Recently, I wrote a summary of some illustrative IRT analyses for my students. Quickly, I realized that this might be of interest to others as well, and I am posting here a tutorial for the Rasch model and the 2PL model in R. [Read More]