Reading In Various File Types In R

Hi there. In this page, I showcase how to read in various file types into R. This R code and work is experimental in nature as I was testing out stuff.

Reading In STATA Files Into R
Reading In SPSS Files Into R
SAS Files Into R
Excel Files Into R
Loading In .csv Files Into R
Loading In .data Files Into R
References

Note that the first three sections deal with R’s haven package.

Reading In STATA Files Into R

Not all places keep up to date with software and use R. There may be certain places that use STATA.

I have once used STATA for a survival analysis course where the instructor was familiar with STATA and not R. A classmate told me that STATA is used a lot in econometrics (statistics and economics).

There are two ways to load in a STATA file into R. You can use read_stata() or read_dta(). The code below deals with url links.

# Reading Files In R

library(haven) # For SPSS, STATA and SAS files

library(readxl) # For excel files.

### 1) STATA file
# http://www.principlesofeconometrics.com/stata.htm

# read_stata() function:

quizzes_data <- read_stata("http://www.principlesofeconometrics.com/stata/quizzes.dta")

head(quizzes_data) # Preview data

## # A tibble: 6 x 2
##   midterm missed
##     <dbl>  <dbl>
## 1    92.5      0
## 2    86.8      0
## 3    81.1      0
## 4    75.5      0
## 5    86.8      0
## 6    92.5      0

# read_dta() function:

housing_df <- data.frame(read_dta("http://www.principlesofeconometrics.com/stata/housing.dta"))

head(housing_df) # Preview data

##   house    ir
## 1  1711 13.08
## 2  1632 12.92
## 3  1800 13.17
## 4  1821 13.20
## 5  1680 12.91
## 6  1676 12.22

Reading In SPSS Files Into R

The main function for reading in an SPSS file is the read_spss() function from the haven package in R.

### 2) SPSS file
# http://staff.bath.ac.uk/pssiw/stats2/page16/page16.html
# relates people's body temperatures and heart rates

spss_url <- "http://staff.bath.ac.uk/pssiw/stats2/temprate.sav"

# Use read_sav or read_spss()

temp_rate_df <- data.frame(read_spss(spss_url))

head(temp_rate_df)

##       temp hrtrate
## 1 35.72222      70
## 2 35.94444      71
## 3 36.05556      74
## 4 36.11111      80
## 5 36.16667      73
## 6 36.16667      75

SAS Files Into R

Loading in a SAS file into R is pretty straightforward as well. The key function here is the read_sas() function.

### 3) SAS files
# http://libguides.library.kent.edu/SAS/SubsetData

sas_url <- "http://libguides.library.kent.edu/ld.php?content_id=11205331"

sas_dataset <- data.frame(read_sas(sas_url))

head(sas_dataset) # Preview data

##     ids       bday enrolldate expgradate Rank                   Major
## 1 43783 1995-03-22                         NA                        
## 2 20278 1995-01-01                         NA              Philosophy
## 3 20389 1994-12-31                         NA                        
## 4 22820 1994-12-01                         NA business administration
## 5 24559 1994-11-10                         NA                        
## 6 28980 1994-09-17                         NA            astrophysics
##   Gender Athlete Height Weight Smoking Sprint      MileMinDur English
## 1      0       0  72.35     NA       0  7.978              NA   88.24
## 2      0       0  70.66 179.20       0  8.004 00:06:21.249567   89.45
## 3      0       0  70.68 198.52       0     NA 00:07:00.966993   96.73
## 4      1       0     NA 198.34      NA  8.473 00:12:44.740049   74.06
## 5      1       1  67.43 128.17       2     NA 00:06:25.506137   82.61
## 6      0       1  68.45 171.61       0  4.650 00:07:24.585445   70.10
##   Reading  Math Writing        State LiveOnCampus HowCommute CommuteTime
## 1   81.50 60.02   81.44     In state            1         NA          NA
## 2   85.25 70.19   73.27                         1         NA          NA
## 3   86.88 71.20   84.24     In state           NA         NA          NA
## 4   88.68 55.89   73.16     In state            1         NA          NA
## 5   77.30 65.52   80.45 Out of state            1         NA          NA
## 6      NA 61.40   77.48     In state            1         NA          NA
##   SleepTime StudyTime
## 1         7         1
## 2         5         2
## 3         8         7
## 4         2         6
## 5         7         3
## 6         8         3

Excel Files Into R

From experimentation, I have found loading in Excel files into R somewhat difficult. I have found out that (I think) there is no way to load in an .xlsx file from a URL alone. The solution I propose is to save the file into a folder. Then you would set that folder as the working directory in RStudio with the Set As Working Directory from the More drop down.

Once you have set your working directory folder, you can load in the Excel file into R. Be mindful of spelling and punctuation.

excel_data = data.frame(read_excel("SuperStoreUS_2015.xlsx"))

head(excel_data, n = 10)

Loading In .csv Files Into R

Loading in .csv files is not very difficult. Do make sure to include header = TRUE if the first rows are column titles and the sep = argument.

### 5) .csv files

# Ref: http://introcs.cs.princeton.edu/java/data/

countries_data <- read.csv("http://introcs.cs.princeton.edu/java/data/countries.csv", 
                        header = TRUE, sep = ",")
head(countries_data)

##   UN.code.number X2.letter.ISO.abbreviation X3.letter.ISO.abbreviation
## 1              4                         AF                        AFG
## 2              8                         AL                        ALB
## 3             12                         DZ                        DZA
## 4             20                         AD                        AND
## 5             24                         AO                        AGO
## 6             28                         AG                        ATG
##                  name          capital
## 1         Afghanistan            Kabul
## 2             Albania           Tirana
## 3             Algeria          Algiers
## 4             Andorra Andorra la Vella
## 5              Angola           Luanda
## 6 Antigua and Barbuda       St. John's

Loading In .data Files Into R

From time to time I do play around with some datasets from the UCI Machine Learning Repository website. Some of them have this .data file extension. For this case, the read.table() function is used.

### 6) .data files

# Source: UCI Machine Learning Repository
# Read in the .data file:
# https://stackoverflow.com/questions/21101927/read-files-with-extension-data-into-r

url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/balloons/adult-stretch.data"

balloons_data <- read.table(url, sep = "\t", dec = ",") # Reading in a .data file.

head(balloons_data)

##                             V1
## 1 YELLOW,SMALL,STRETCH,ADULT,T
## 2 YELLOW,SMALL,STRETCH,CHILD,T
## 3     YELLOW,SMALL,DIP,ADULT,T
## 4     YELLOW,SMALL,DIP,CHILD,F
## 5     YELLOW,SMALL,DIP,CHILD,F
## 6 YELLOW,LARGE,STRETCH,ADULT,T

It appears that this balloons_data comes in as a single column with values that have four commas. This single column could be separated with the separate function from the tidyr package in R.

library(tidyr)

col_names <- c("Color", "size", "act", "age", "inflated")

sep_data <- separate(balloons_data, col = "V1", into = col_names, sep = ",")

head(sep_data)

##    Color  size     act   age inflated
## 1 YELLOW SMALL STRETCH ADULT        T
## 2 YELLOW SMALL STRETCH CHILD        T
## 3 YELLOW SMALL     DIP ADULT        T
## 4 YELLOW SMALL     DIP CHILD        F
## 5 YELLOW SMALL     DIP CHILD        F
## 6 YELLOW LARGE STRETCH ADULT        T