## Tutorial on tidymodels for Machine Learning

Set Up Data Set: Diamonds Separating Testing and Training Data: rsample Data Pre-Processing and Feature Engineering: recipes Defining and Fitting Models: parsnip Summarizing Fitted Models: broom Evaluating Model Performance: yardstick Tuning Model Parameters: tune and dials Preparing a parsnip Model for Tuning Preparing Data for Tuning: recipes Combine Everything: workflows Selecting the Best Model to Make the Final Predictions Summary Further Resources Session Info Updates caret is a well known R package for machine learning, which includes almost everything from data pre-processing to cross-validation. [Read More]

## Regression Modeling With Proportion Data (Part 2)

### Attendance in Handball-Bundesliga Rose By 7 % After World Championship

Data Analyses: Beta and Quasi-Binomial Regression Results Plot Model Comparison Effect Size In the first part of this post, I demonstrated how beta and quasi-binomial regression can be used with dependent variables that are proportions or ratios. I applied these models to attendance rates of the German Handball-Bundesliga. In the second part, I want to investigate whether attendance increased after the World Championship that took place in January 2019 in Denmark and Germany (with a new spectator record). [Read More]

## Regression Modeling With Proportion Data (Part 1)

### Predicting Attendance in the German Handball-Bundesliga

Modeling Proportion Data Application: Handball-Bundesliga Setup Selected Variables Initial Results for Beta Regression Illustrative Plot of Estimates Residuals Model Comparisons Models Considered Model Performance Prediction of Future Matches Resources As a data scientist, one often encounters dependent variables that are proportions: for example, the number of successes divided by the number of attempts, party vote, proportion of money spent for something, or the attendance rate of public events. [Read More]

## Durchschnittsalter in den Mannheimer Stadtteilen

### Geodaten plotten mit ggplot2

Ich wollte schon immer einmal Geodaten plotten und heute bin ich endlich dazu gekommen das auszuprobieren. Das Plotten selbst ist eigentlich ganz einfach … fortunes::fortune("done it.") #> #> It was simple, but you know, it's always simple when you've done it. #> -- Simone Gabbriellini (after solving a problem with a trick suggested #> on the list) #> R-help (August 2005) Die Herausforderung besteht eher darin die Daten aufzubereiten, in diesem Fall die Polygone der Stadtteile von Mannheim. [Read More]

## Lowest Number of Red Cards in 40 Years

### Web Scraping FIFA World Cup Data

While watching a FIFA World Cup game, I suddenly had the impression that games got fairer over the years. Everybody remembers the headbutt of Zinedine Zidane, but I haven’t seen similar things in 2018. I always wanted to try out web scraping and this was the opportunity to do so. https://media.giphy.com/media/9AuHkzLy26AmI/giphy.gif In this post, I will give a very brief intro to web scraping focusing mostly on scraping the World Cup data. [Read More]