In this R webscraping project, I use RSelenium to extract the latest list of Billionaires net worth according to Bloomberg. Note that the Bloomberg Billionaires index does update every week.
With the loaded libraries I use RSelenium, rvest, tidyverse, tidyr along with stringi for working with text and ggplot2 for the data analysis portion. One main resource I used for learning some aspects of RSelenium is here.
## Bloomberg Billionaires Index Webscraping With RSelenium
# Made on Feb 28, 2022, updated on Apr 14, 2022
# Load libraries:
library(RSelenium)
library(rvest)
library(tidyverse)
library(stringi)
library(tidyr)
library(ggplot2)
This next code chunk tells R to open a new window of Firefox web browser. Then it goes to the Bloomberg Billionaires Index website and the reads in the page.
# Restart R (CTRL + SHIFT + F10) before running code below again if code does not work.
# Open Firefox browser:
rD <- rsDriver(browser = "firefox", port = 4545L, verbose = F)
remDr <- rD$client
# Bloomberg Billionaires Page, it updates each day.
url <- "https://www.bloomberg.com/billionaires/"
remDr$navigate(url)
# No need to click to accept cookies actually.
# Scroll down after more ranks appear (About 2 down arrows for each rank)
webElem <- remDr$findElement("css", "body")
Sys.sleep(3)
## Extract Billionaires Data:
# Format: Rank, Billionaire, Last Change in $, YTD $ Change, Country/Region, Industry
# Get html:
html <- remDr$getPageSource()[[1]]
page <- read_html(html)
The webpage HTML has been extracted into R. We can obtain parts of the table. I use xpaths with html_nodes().
Rank
### Get Rank
rank <- page %>%
html_nodes(xpath = '/html/body/div[6]/section[2]/div/div/div[1]') %>%
html_text2()
rank[1:105]
## [1] "Rank" "1" "2" "3" "4" "5" "6" "7" "8" "9"
## [11] "10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
## [21] "20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
## [31] "30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
## [41] "40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
## [51] "50" "" "51" "52" "53" "54" "55" "56" "57" "58"
## [61] "59" "60" "61" "62" "63" "64" "65" "66" "67" "68"
## [71] "69" "70" "71" "72" "73" "74" "75" "76" "77" "78"
## [81] "79" "80" "81" "82" "83" "84" "85" "86" "87" "88"
## [91] "89" "90" "91" "92" "93" "94" "95" "96" "97" "98"
## [101] "99" "100" "" "101" "102"
The rank output is close to ideal. The Rank text in index one needs to be removed and there are empty strings to be removed. In the next code chunk I remove the first index element and remove empty strings. The texts are converted into numeric numbers with as.numeric(). Empty strings are removed with billionaires names, total net worth and the other columns.
# Remove Rank (index 1) and blanks (stringi pkg):
rank <- rank[seq(2, length(rank))]
rank <- as.numeric(stri_remove_empty(rank))
Billionaire Names
### Billionaires Name
billionaires <- page %>%
html_nodes(xpath = '/html/body/div[6]/section[2]/div/div/div[2]') %>%
html_text2()
billionaires <- billionaires[seq(2, length(billionaires))] %>% stri_remove_empty()
Total Net Worth
### Total Net Worth:
total_net_worth <- page %>%
html_nodes(xpath = '/html/body/div[6]/section[2]/div/div/div[3]') %>%
html_text2() %>%
stri_remove_empty()
total_net_worth <- total_net_worth[seq(2, length(total_net_worth))] %>% stri_remove_empty()
Last Change In $
### Last Change in $
last_change <- page %>%
html_nodes(xpath = '/html/body/div[6]/section[2]/div/div/div[4]') %>%
html_text2() %>%
stri_remove_empty()
last_change <- last_change[seq(2, length(last_change))] %>% stri_remove_empty()
Year to Date Change In $
### YTD Change $
ytd_change <- page %>%
html_nodes(xpath = '/html/body/div[6]/section[2]/div/div/div[5]') %>%
html_text2() %>%
stri_remove_empty()
ytd_change <- ytd_change[seq(2, length(ytd_change))] %>% stri_remove_empty()
Country/Region
### Country/Region
region <- page %>%
html_nodes(xpath = '/html/body/div[6]/section[2]/div/div/div[6]') %>%
html_text2()
region <- region[seq(2, length(region))] %>% stri_remove_empty()
Industry
### Industry
industry <- page %>%
html_nodes(xpath = '/html/body/div[6]/section[2]/div/div/div[7]') %>%
html_text2()
industry <- industry[seq(2, length(industry))] %>% stri_remove_empty()
# Create dataframe:
bloomberg_billionaires_df <- data.frame(
Rank = rank,
Name = billionaires,
Total_Net_Worth = total_net_worth,
Last_Change = last_change,
YTD_Change = ytd_change,
Region = region,
Industry = industry
)
# Preview dataframe:
head(bloomberg_billionaires_df, 10)
## Rank Name Total_Net_Worth Last_Change YTD_Change Region
## 1 1 Elon Musk $259B +$8.38B -$11.0B United States
## 2 2 Jeff Bezos $180B +$4.85B -$12.2B United States
## 3 3 Bernard Arnault $142B +$1.53B -$36.0B France
## 4 4 Bill Gates $130B +$1.37B -$8.13B United States
## 5 5 Warren Buffett $125B -$1.23B +$16.4B United States
## 6 6 Gautam Adani $118B +$805M +$41.9B India
## 7 7 Larry Page $116B +$1.70B -$12.1B United States
## 8 8 Sergey Brin $111B +$1.60B -$12.0B United States
## 9 9 Steve Ballmer $101B +$1.85B -$4.77B United States
## 10 10 Larry Ellison $100B +$1.08B -$6.71B United States
## Industry
## 1 Technology
## 2 Technology
## 3 Consumer
## 4 Technology
## 5 Diversified
## 6 Industrial
## 7 Technology
## 8 Technology
## 9 Technology
## 10 Technology
tail(bloomberg_billionaires_df, 10)
## Rank Name Total_Net_Worth Last_Change YTD_Change
## 491 491 Margot Perot & family $5.33B +$50.0M -$50.0M
## 492 492 Horst Pudwill $5.32B +$40.1M -$2.06B
## 493 493 Charles Johnson $5.31B +$65.9M -$729M
## 494 494 Mary Malone $5.29B +$30.3M +$100M
## 495 495 Mat Ishbia $5.28B +$113M -$1.99B
## 496 496 Zhang Lei $5.27B +$36.4M -$1.18B
## 497 497 Mark Scheinberg $5.25B +$50.0M -$75.0M
## 498 498 Zhao Yan $5.24B -$126M -$2.41B
## 499 499 Stephane Bancel $5.23B +$308M -$2.56B
## 500 500 Autry Stephens $5.22B $0 +$724M
## Region Industry
## 491 United States Diversified
## 492 Germany Consumer
## 493 United States Finance
## 494 United States Food & Beverage
## 495 United States Finance
## 496 China Finance
## 497 Isle of Man Entertainment
## 498 China Health Care
## 499 France Health Care
## 500 United States Energy
Sys.Date()
## [1] "2022-04-14"
# String for file name:
paste0("Bloomberg_Billionaires_", Sys.Date(), '.csv')
## [1] "Bloomberg_Billionaires_2022-04-14.csv"
You can use the following line to save the dataframe into a .csv file.
# Optional - Save Dataframe:
write.csv(bloomberg_billionaires_df, paste0("Bloomberg_Billionaires_", Sys.Date(), '.csv'), row.names = FALSE)
In this data analysis, I uncover insights from Bloomberg billionaires.
### Data Analysis Portion:
## Billionaires Count by Region:
by_region <- bloomberg_billionaires_df %>%
group_by(Region) %>%
summarise(Count = n()) %>%
arrange(desc(Count)) %>%
data.frame()
Apologies for the small plot output, I was trying to find ways to fix the size in RMarkdown. Solutions are hard to find.
## Create bar graph in ggplot2, Billionaires Count by Region:
# Sorting by counts for the Region
by_region$Region <- factor(by_region$Region,
levels = by_region$Region[order(by_region$Count)])
# Billionaires Count by Region (Top 10 Regions):
ggplot(head(by_region, 10), aes(x = Region, y = Count)) +
geom_bar(stat = "identity", fill = "#238c6d") +
coord_flip() +
geom_text(aes(label = Count), hjust = 1.2, colour = "white", fontface = "bold") +
labs(x = "\n Region \n", y = "\n Count \n",
title = paste0("\n Bloomberg_Billionaires_", Sys.Date(), "\n")) +
theme(plot.title = element_text(hjust = 0.5, size = 15),
axis.title.x = element_text(face="bold", colour="#063970", size = 12),
axis.title.y = element_text(face="bold", colour="#063970", size = 12))

Canadian Billionaires
## Canadian Billionaires (CZ from Binance highest for Canada):
bloomberg_billionaires_df %>% filter(Region == 'Canada')
## Rank Name Total_Net_Worth Last_Change YTD_Change Region
## 1 36 Changpeng Zhao $33.6B +$676M -$62.2B Canada
## 2 137 Sherry Brydson $13.9B +$327M -$648M Canada
## 3 259 Taylor Thomson $8.57B +$184M -$420M Canada
## 4 260 Peter Thomson $8.57B +$184M -$420M Canada
## 5 261 David Thomson $8.57B +$181M -$419M Canada
## 6 265 James Pattison $8.47B +$114M -$200M Canada
## 7 339 Anthony von Mandl $7.18B +$34.0M -$770M Canada
## 8 371 Linda Campbell $6.73B +$129M -$334M Canada
## 9 372 Gaye Farncombe $6.73B +$129M -$334M Canada
## 10 381 J K Irving $6.58B +$16.5M -$197M Canada
## 11 382 Chip Wilson $6.57B +$211M -$46.1M Canada
## 12 391 Joseph Tsai $6.41B +$11.5M -$672M Canada
## 13 392 Arthur Irving $6.40B +$154M +$1.45B Canada
## 14 425 David Cheriton $5.99B +$117M -$259M Canada
## 15 434 Alain Bouchard $5.93B +$63.5M +$435M Canada
## 16 476 Tobi Lutke $5.47B +$113M -$6.27B Canada
## Industry
## 1 Finance
## 2 Media & Telecom
## 3 Media & Telecom
## 4 Media & Telecom
## 5 Media & Telecom
## 6 Media & Telecom
## 7 Consumer
## 8 Media & Telecom
## 9 Media & Telecom
## 10 Commodities
## 11 Retail
## 12 Technology
## 13 Energy
## 14 Technology
## 15 Retail
## 16 Technology
American Billionaires
## American Billionaires (Top 10):
bloomberg_billionaires_df %>%
filter(Region == 'United States') %>%
head(10)
## Rank Name Total_Net_Worth Last_Change YTD_Change Region
## 1 1 Elon Musk $259B +$8.38B -$11.0B United States
## 2 2 Jeff Bezos $180B +$4.85B -$12.2B United States
## 3 4 Bill Gates $130B +$1.37B -$8.13B United States
## 4 5 Warren Buffett $125B -$1.23B +$16.4B United States
## 5 7 Larry Page $116B +$1.70B -$12.1B United States
## 6 8 Sergey Brin $111B +$1.60B -$12.0B United States
## 7 9 Steve Ballmer $101B +$1.85B -$4.77B United States
## 8 10 Larry Ellison $100B +$1.08B -$6.71B United States
## 9 13 Mark Zuckerberg $79.0B +$298M -$46.5B United States
## 10 16 Jim Walton $70.0B +$1.59B +$5.51B United States
## Industry
## 1 Technology
## 2 Technology
## 3 Technology
## 4 Diversified
## 5 Technology
## 6 Technology
## 7 Technology
## 8 Technology
## 9 Technology
## 10 Retail
Billionaires By Category
## Billionaires Count by Category
by_category <- bloomberg_billionaires_df %>%
group_by(Industry) %>%
summarise(Count = n()) %>%
arrange(desc(Count)) %>%
data.frame()
head(by_category, 10)
## Industry Count
## 1 Technology 73
## 2 Industrial 59
## 3 Finance 57
## 4 Diversified 47
## 5 Consumer 38
## 6 Retail 33
## 7 Energy 32
## 8 Food & Beverage 32
## 9 Real Estate 31
## 10 Health Care 29
## Bar Graph Of Billionaires By Category:
# Sorting by counts for the Category
by_category$Industry <- factor(by_category$Industry,
levels = by_category$Industry[order(by_category$Count)])
# Top 10 Billionaires Count by Industry, labels added, blue bars
ggplot(head(by_category, 10), aes(x = Industry, y = Count)) +
geom_bar(stat = "identity", fill = "#555387") +
coord_flip() +
geom_text(aes(label = Count), hjust = 1.2, colour = "white", fontface = "bold") +
labs(x = "\n Industry \n", y = "\n Count \n",
title = paste0("\n Bloomberg_Billionaires_", Sys.Date(), "\n")) +
theme(plot.title = element_text(hjust = 0.5, size = 15),
axis.title.x = element_text(face="bold", colour="#063970", size = 12),
axis.title.y = element_text(face="bold", colour="#063970", size = 12))

Technology Billionaires
## Technology Billionaires (Top 10):
bloomberg_billionaires_df %>% filter(Industry == 'Technology') %>% head(10)
## Rank Name Total_Net_Worth Last_Change YTD_Change Region
## 1 1 Elon Musk $259B +$8.38B -$11.0B United States
## 2 2 Jeff Bezos $180B +$4.85B -$12.2B United States
## 3 4 Bill Gates $130B +$1.37B -$8.13B United States
## 4 7 Larry Page $116B +$1.70B -$12.1B United States
## 5 8 Sergey Brin $111B +$1.60B -$12.0B United States
## 6 9 Steve Ballmer $101B +$1.85B -$4.77B United States
## 7 10 Larry Ellison $100B +$1.08B -$6.71B United States
## 8 13 Mark Zuckerberg $79.0B +$298M -$46.5B United States
## 9 23 Michael Dell $51.0B +$579M -$3.96B United States
## 10 25 MacKenzie Scott $48.5B +$1.65B -$7.78B United States
## Industry
## 1 Technology
## 2 Technology
## 3 Technology
## 4 Technology
## 5 Technology
## 6 Technology
## 7 Technology
## 8 Technology
## 9 Technology
## 10 Technology
Finance Billionaires
## Finance Billionaires (Top 10):
bloomberg_billionaires_df %>% filter(Industry == 'Finance') %>% head(10)
## Rank Name Total_Net_Worth Last_Change YTD_Change Region
## 1 34 Stephen Schwarzman $34.9B +$788M -$2.94B United States
## 2 36 Changpeng Zhao $33.6B +$676M -$62.2B Canada
## 3 42 Ken Griffin $30.5B +$84.8M +$9.19B United States
## 4 49 James Simons $26.0B +$175M +$525M United States
## 5 65 Abigail Johnson $22.1B +$42.3M -$3.90B United States
## 6 72 Thomas Peterffy $20.7B +$214M -$4.11B United States
## 7 75 Sam Bankman-Fried $20.5B +$805M +$4.28B United States
## 8 112 Vicky Safra $16.3B $0 +$150M Greece
## 9 114 David Tepper $16.2B +$50.0M +$1.32B United States
## 10 116 Ray Dalio $16.1B $0 +$510M United States
## Industry
## 1 Finance
## 2 Finance
## 3 Finance
## 4 Finance
## 5 Finance
## 6 Finance
## 7 Finance
## 8 Finance
## 9 Finance
## 10 Finance
Retail Billionaires
## Retail Billonaires (Top 10):
bloomberg_billionaires_df %>% filter(Industry == 'Retail') %>% head(10)
## Rank Name Total_Net_Worth Last_Change YTD_Change Region
## 1 16 Jim Walton $70.0B +$1.59B +$5.51B United States
## 2 17 Rob Walton $69.2B +$1.56B +$5.12B United States
## 3 18 Alice Walton $67.6B +$1.54B +$5.00B United States
## 4 26 Amancio Ortega $47.4B +$433M -$20.1B Spain
## 5 47 Dieter Schwarz $27.1B +$272M -$1.85B Germany
## 6 55 Tadashi Yanai $24.6B +$589M -$5.01B Japan
## 7 59 Lukas Walton $23.8B +$537M +$1.66B United States
## 8 64 Henry Cheng $22.2B +$49.3M -$770M Hong Kong
## 9 69 Radhakishan Damani $21.4B +$25.3M -$3.22B India
## 10 90 John Menard $18.6B +$237M -$7.09B United States
## Industry
## 1 Retail
## 2 Retail
## 3 Retail
## 4 Retail
## 5 Retail
## 6 Retail
## 7 Retail
## 8 Retail
## 9 Retail
## 10 Retail