Tutorial on tidymodels for Machine Learning

Set Up Data Set: Diamonds Separating Testing and Training Data: rsample Data Pre-Processing and Feature Engineering: recipes Defining and Fitting Models: parsnip Summarizing Fitted Models: broom Evaluating Model Performance: yardstick Tuning Model Parameters: tune and dials Preparing a parsnip Model for Tuning Preparing Data for Tuning: recipes Combine Everything: workflows Selecting the Best Model to Make the Final Predictions Summary Further Resources Session Info Updates caret is a well known R package for machine learning, which includes almost everything from data pre-processing to cross-validation. [Read More]

Regression Modeling With Proportion Data (Part 2)

Attendance in Handball-Bundesliga Rose By 7 % After World Championship

Data Analyses: Beta and Quasi-Binomial Regression Results Plot Model Comparison Effect Size In the first part of this post, I demonstrated how beta and quasi-binomial regression can be used with dependent variables that are proportions or ratios. I applied these models to attendance rates of the German Handball-Bundesliga. In the second part, I want to investigate whether attendance increased after the World Championship that took place in January 2019 in Denmark and Germany (with a new spectator record). [Read More]

Regression Modeling With Proportion Data (Part 1)

Predicting Attendance in the German Handball-Bundesliga

Modeling Proportion Data Application: Handball-Bundesliga Setup Selected Variables Initial Results for Beta Regression Illustrative Plot of Estimates Residuals Model Comparisons Models Considered Model Performance Prediction of Future Matches Resources As a data scientist, one often encounters dependent variables that are proportions: for example, the number of successes divided by the number of attempts, party vote, proportion of money spent for something, or the attendance rate of public events. [Read More]

Durchschnittsalter in den Mannheimer Stadtteilen

Geodaten plotten mit ggplot2

Ich wollte schon immer einmal Geodaten plotten und heute bin ich endlich dazu gekommen das auszuprobieren. Das Plotten selbst ist eigentlich ganz einfach … fortunes::fortune("done it.") #> #> It was simple, but you know, it's always simple when you've done it. #> -- Simone Gabbriellini (after solving a problem with a trick suggested #> on the list) #> R-help (August 2005) Die Herausforderung besteht eher darin die Daten aufzubereiten, in diesem Fall die Polygone der Stadtteile von Mannheim. [Read More]

Lowest Number of Red Cards in 40 Years

Web Scraping FIFA World Cup Data

While watching a FIFA World Cup game, I suddenly had the impression that games got fairer over the years. Everybody remembers the headbutt of Zinedine Zidane, but I haven’t seen similar things in 2018. I always wanted to try out web scraping and this was the opportunity to do so. https://media.giphy.com/media/9AuHkzLy26AmI/giphy.gif In this post, I will give a very brief intro to web scraping focusing mostly on scraping the World Cup data. [Read More]

Categorical Predictors in ANOVA and Regression

Regression Perspective ANOVA and SPSS Perspective How to Combine the Perspectives? Solution Examples Example data Dummy Coding Planned Comparisons/Contrast Coding Helmert Coding Orthogonal and Nonorthognoal Contrasts References Data with categorical predictors such as groups, conditions, or countries can be analyzed in a regression framework as well as in an ANOVA framework. In either case, the grouping variable needs to be recoded, it cannot enter the model like a continuous predictor such as age or income. [Read More]

Tutorial: Rasch and 2PL Model in R

Setup Data Rasch Model Plots Model Identification Note on Item Parameters in eRm Package MML Estimation 2PL Model Model Fit Relative Fit of Rasch and 2PL Model Absolute Fit of the Rasch Model DIF Person Parameters ML MAP and EAP Item and Test Information References Recently, I wrote a summary of some illustrative IRT analyses for my students. Quickly, I realized that this might be of interest to others as well, and I am posting here a tutorial for the Rasch model and the 2PL model in R. [Read More]

Working With Different Versions of an R Package

Installing Packages to a Custom Location Installing Development Packages Installing Outdated Packages Special cases Reproducibility via a Project-Specific Library Recently, I had to install an older version of an R package, because a function was deprecated that I wanted to use. I wanted to install the old version in addition to and not instead of the new version. This post has been updated in April 2020. Previously, install_version() and install_github() had no lib argument which made it necessary to use a workaround via the withr package to install to a non-standard library. [Read More]

Font Embedding for LaTeX and R Users

Use cairo_pdf()

What is Font Embedding Problem: R does not embed fonts Solution in R: Use cairo_pdf() Solution With Completed PDF Recently, I sent my dissertation as a PDF file to a copy shop and got an email back that I had not embedded all fonts and that they won’t print it for me. What? So instead of celebrating the submission, I had to search online for pdf latex “font embedding”, and this blog post is a summary of that afternoon. [Read More]