Tutorial on tidymodels for Machine Learning

Set Up Data Set: Diamonds Separating Testing and Training Data: rsample Data Pre-Processing and Feature Engineering: recipes Defining and Fitting Models: parsnip Summarizing Fitted Models: broom Evaluating Model Performance: yardstick Tuning Model Parameters: tune and dials Preparing a parsnip Model for Tuning Preparing Data for Tuning: recipes Combine Everything: workflows Selecting the Best Model to Make the Final Predictions Summary Further Resources Session Info Updates caret is a well known R package for machine learning, which includes almost everything from data pre-processing to cross-validation. [Read More]

Regression Modeling With Proportion Data (Part 2)

Attendance in Handball-Bundesliga Rose By 7 % After World Championship

Data Analyses: Beta and Quasi-Binomial Regression Results Plot Model Comparison Effect Size In the first part of this post, I demonstrated how beta and quasi-binomial regression can be used with dependent variables that are proportions or ratios. I applied these models to attendance rates of the German Handball-Bundesliga. In the second part, I want to investigate whether attendance increased after the World Championship that took place in January 2019 in Denmark and Germany (with a new spectator record). [Read More]

Regression Modeling With Proportion Data (Part 1)

Predicting Attendance in the German Handball-Bundesliga

Modeling Proportion Data Application: Handball-Bundesliga Setup Selected Variables Initial Results for Beta Regression Illustrative Plot of Estimates Residuals Model Comparisons Models Considered Model Performance Prediction of Future Matches Resources As a data scientist, one often encounters dependent variables that are proportions: for example, the number of successes divided by the number of attempts, party vote, proportion of money spent for something, or the attendance rate of public events. [Read More]

Durchschnittsalter in den Mannheimer Stadtteilen

Geodaten plotten mit ggplot2

Ich wollte schon immer einmal Geodaten plotten und heute bin ich endlich dazu gekommen das auszuprobieren. Das Plotten selbst ist eigentlich ganz einfach … fortunes::fortune("done it.") #> #> It was simple, but you know, it's always simple when you've done it. #> -- Simone Gabbriellini (after solving a problem with a trick suggested #> on the list) #> R-help (August 2005) Die Herausforderung besteht eher darin die Daten aufzubereiten, in diesem Fall die Polygone der Stadtteile von Mannheim. [Read More]

Lowest Number of Red Cards in 40 Years

Web Scraping FIFA World Cup Data

While watching a FIFA World Cup game, I suddenly had the impression that games got fairer over the years. Everybody remembers the headbutt of Zinedine Zidane, but I haven’t seen similar things in 2018. I always wanted to try out web scraping and this was the opportunity to do so. https://media.giphy.com/media/9AuHkzLy26AmI/giphy.gif In this post, I will give a very brief intro to web scraping focusing mostly on scraping the World Cup data. [Read More]