In this project, I webscrape the most recent Bundesliga football table with R and the rvest package.

Setup

### Bundesliga Web Scraping In R
# Reference: https://stackoverflow.com/questions/45450981/rvest-scrape-2-classes-in-1-tag

# Load libraries:

library(dplyr)
library(tidyr)
library(rvest)

 

To extract the table I use html_elements('tbody').

bundesliga <- read_html("https://www.bundesliga.com/en/bundesliga/table")

page <- bundesliga %>% html_elements('tbody')

 

## Extract Teams, Rank, Matches, Points, Wins, Draws, Losses, Goals, Goal Differences

In this section, I use html_nodes() from the rvest package. I extract the class names in html_nodes() to obtain the parts from the football table.

Teams

# Teams are in the class 'team':

teams <- page %>% 
         html_nodes("[class='team']") %>%
         html_text2()

 

Rank

# Rank & convert into integer:

team_rank <- page %>% 
             html_nodes("[class='rank']") %>%
             html_text2() %>% 
             readr::parse_integer()

 

Matches Played

# Matches played

matches <- page %>% 
           html_nodes("[class='matches']") %>%
           html_text2() %>% 
           readr::parse_integer()

 

Points

# Points

points <- page %>% 
  html_nodes("[class='pts']") %>%
  html_text2() %>% 
  readr::parse_integer()

 

Wins

# Wins:

wins <- page %>% 
           html_nodes("[class='d-none d-lg-table-cell wins']") %>%
           html_text2() %>% 
           readr::parse_integer()

 

Draws

# Draws:

draws <- page %>% 
           html_nodes("[class='d-none d-lg-table-cell draws']") %>%
           html_text2() %>% 
           readr::parse_integer()

 

Losses

# Losses:
losses <- page %>% 
          html_nodes("[class='d-none d-lg-table-cell looses']") %>%
          html_text2() %>% 
          readr::parse_integer()

 

Goals

# Goals:

goals <- page %>% 
         html_nodes("[class='d-none d-md-table-cell goals']") %>%
         html_text2() 

 

Goal Difference

# Goal Difference:

goal_diff <- page %>% 
             html_nodes("[class='difference']") %>%
             html_text2() %>% 
             readr::parse_integer()

 

Creating Dataframe

 

### Create Bundesliga dataframe:

bundes_df <- data.frame(Rank = team_rank, Team = teams, Points = points,
                        Played = matches, Wins = wins, Draws = draws,
                        Losses = losses, Goals = goals, 
                        Goal_Difference = goal_diff)

## Goals Column Separate Into Goals For and Goals Against:

bundes_df <- bundes_df %>% separate(Goals, c("Goals For", "Goals Against"))

bundes_df
##    Rank                                Team Points Played Wins Draws Losses
## 1     1        FCB Bayern FC Bayern München     69     29   22     3      4
## 2     2      BVB Dortmund Borussia Dortmund     60     29   19     3      7
## 3     3  B04 Leverkusen Bayer 04 Leverkusen     52     29   15     7      7
## 4     4              RBL Leipzig RB Leipzig     51     29   15     6      8
## 5     5            SCF Freiburg SC Freiburg     48     29   13     9      7
## 6     6       TSG Hoffenheim TSG Hoffenheim     44     29   13     5     11
## 7     7 FCU Union Berlin 1. FC Union Berlin     44     29   12     8      9
## 8     8                 KOE Köln 1. FC Köln     43     29   11    10      8
## 9     9   SGE Frankfurt Eintracht Frankfurt     39     29   10     9     10
## 10   10           M05 Mainz 1. FSV Mainz 05     38     29   11     5     13
## 11   11  BMG M'gladbach Borussia M'gladbach     37     29   10     7     12
## 12   12          BOC Bochum VfL Bochum 1848     36     29   10     6     13
## 13   13         WOB Wolfsburg VfL Wolfsburg     34     29   10     4     15
## 14   14            FCA Augsburg FC Augsburg     32     29    8     8     13
## 15   15         VFB Stuttgart VfB Stuttgart     27     29    6     9     14
## 16   16     DSC Bielefeld Arminia Bielefeld     26     29    5    11     13
## 17   17     BSC Hertha Berlin Hertha Berlin     26     29    7     5     17
## 18   18      SGF Fürth SpVgg Greuther Fürth     16     29    3     7     19
##    Goals For Goals Against Goal_Difference
## 1         86            29              57
## 2         70            42              28
## 3         68            42              26
## 4         64            31              33
## 5         46            34              12
## 6         50            45               5
## 7         38            39              -1
## 8         41            43              -2
## 9         40            40               0
## 10        43            36               7
## 11        41            52             -11
## 12        30            40             -10
## 13        33            45             -12
## 14        34            46             -12
## 15        36            53             -17
## 16        23            43             -20
## 17        31            66             -35
## 18        24            72             -48

 

If you would like to save the dataframe into a .csv file you can use this code chunk.

## Save Bundesliga Table Into A .csv file.

write.csv(bundes_df, paste("Bundesliga_", Sys.Date(), sep = ""))