In this project, I webscrape the most recent Bundesliga football table with R and the rvest
package.
Setup
### Bundesliga Web Scraping In R
# Reference: https://stackoverflow.com/questions/45450981/rvest-scrape-2-classes-in-1-tag
# Load libraries:
library(dplyr)
library(tidyr)
library(rvest)
To extract the table I use html_elements('tbody')
.
bundesliga <- read_html("https://www.bundesliga.com/en/bundesliga/table")
page <- bundesliga %>% html_elements('tbody')
In this section, I use html_nodes()
from the rvest
package. I extract the class names in html_nodes()
to obtain the parts from the football table.
Teams
# Teams are in the class 'team':
teams <- page %>%
html_nodes("[class='team']") %>%
html_text2()
Rank
# Rank & convert into integer:
team_rank <- page %>%
html_nodes("[class='rank']") %>%
html_text2() %>%
readr::parse_integer()
Matches Played
# Matches played
matches <- page %>%
html_nodes("[class='matches']") %>%
html_text2() %>%
readr::parse_integer()
Points
# Points
points <- page %>%
html_nodes("[class='pts']") %>%
html_text2() %>%
readr::parse_integer()
Wins
# Wins:
wins <- page %>%
html_nodes("[class='d-none d-lg-table-cell wins']") %>%
html_text2() %>%
readr::parse_integer()
Draws
# Draws:
draws <- page %>%
html_nodes("[class='d-none d-lg-table-cell draws']") %>%
html_text2() %>%
readr::parse_integer()
Losses
# Losses:
losses <- page %>%
html_nodes("[class='d-none d-lg-table-cell looses']") %>%
html_text2() %>%
readr::parse_integer()
Goals
# Goals:
goals <- page %>%
html_nodes("[class='d-none d-md-table-cell goals']") %>%
html_text2()
Goal Difference
# Goal Difference:
goal_diff <- page %>%
html_nodes("[class='difference']") %>%
html_text2() %>%
readr::parse_integer()
### Create Bundesliga dataframe:
bundes_df <- data.frame(Rank = team_rank, Team = teams, Points = points,
Played = matches, Wins = wins, Draws = draws,
Losses = losses, Goals = goals,
Goal_Difference = goal_diff)
## Goals Column Separate Into Goals For and Goals Against:
bundes_df <- bundes_df %>% separate(Goals, c("Goals For", "Goals Against"))
bundes_df
## Rank Team Points Played Wins Draws Losses
## 1 1 FCB Bayern FC Bayern München 69 29 22 3 4
## 2 2 BVB Dortmund Borussia Dortmund 60 29 19 3 7
## 3 3 B04 Leverkusen Bayer 04 Leverkusen 52 29 15 7 7
## 4 4 RBL Leipzig RB Leipzig 51 29 15 6 8
## 5 5 SCF Freiburg SC Freiburg 48 29 13 9 7
## 6 6 TSG Hoffenheim TSG Hoffenheim 44 29 13 5 11
## 7 7 FCU Union Berlin 1. FC Union Berlin 44 29 12 8 9
## 8 8 KOE Köln 1. FC Köln 43 29 11 10 8
## 9 9 SGE Frankfurt Eintracht Frankfurt 39 29 10 9 10
## 10 10 M05 Mainz 1. FSV Mainz 05 38 29 11 5 13
## 11 11 BMG M'gladbach Borussia M'gladbach 37 29 10 7 12
## 12 12 BOC Bochum VfL Bochum 1848 36 29 10 6 13
## 13 13 WOB Wolfsburg VfL Wolfsburg 34 29 10 4 15
## 14 14 FCA Augsburg FC Augsburg 32 29 8 8 13
## 15 15 VFB Stuttgart VfB Stuttgart 27 29 6 9 14
## 16 16 DSC Bielefeld Arminia Bielefeld 26 29 5 11 13
## 17 17 BSC Hertha Berlin Hertha Berlin 26 29 7 5 17
## 18 18 SGF Fürth SpVgg Greuther Fürth 16 29 3 7 19
## Goals For Goals Against Goal_Difference
## 1 86 29 57
## 2 70 42 28
## 3 68 42 26
## 4 64 31 33
## 5 46 34 12
## 6 50 45 5
## 7 38 39 -1
## 8 41 43 -2
## 9 40 40 0
## 10 43 36 7
## 11 41 52 -11
## 12 30 40 -10
## 13 33 45 -12
## 14 34 46 -12
## 15 36 53 -17
## 16 23 43 -20
## 17 31 66 -35
## 18 24 72 -48
If you would like to save the dataframe into a .csv file you can use this code chunk.
## Save Bundesliga Table Into A .csv file.
write.csv(bundes_df, paste("Bundesliga_", Sys.Date(), sep = ""))