Reading and Writing Data in R: A Comprehensive Tutorial
Table of Content:
Before we start with analyzing any data, we must load it into our R workspace. This can be done directly either by loading an external R object (typical file extensions are .rda
or .RData
, but it is not limited to these extensions) or an internal R object for a package or a TXT, CSV, or Excel file. This recipe explains the methods that can be used to read data from a table or the .csv
format and/or write similar files into an R session.
Getting ready
We will use an iris dataset for this recipe, which is available with R Base packages. The dataset bears quantified features of the morphologic variation of the three related species of Iris flowers.
How to do it…
Perform the following steps to read and write functions in R:
- Load internal R data (already available with a package or base R) using the following
data
function:> data(iris)
- To learn more about iris data, check the
help
function in R using the following function:> ?iris
- Load external R data (conventionally saved as
.rda
or.RData
, but not limited to this) with the followingload
function:> load(file="mydata.RData")
- To save a data object, say,
D
, you can use thesave
function as follows:> save(D, file="myData.RData")
- To read the tabular data in the form of a
.csv
file withread.csv
orread.table
, type the following command:> mydata <- read.table("file.dat", header = TRUE, sep="\t", row.names = 1) > mydata <- read.csv("mydata.csv")
- It is also possible to read an Excel file in R. You can achieve this with various packages such as
xlsx
andgdata
. Thexlsx
package requires Java settings, whilegdata
is relatively simple. However, thexlsx
package offers more functionalities, such as read permissions for different sheets in a workbook and the newer versions of Excel files. For this example, we will use thexlsx
package. Use theread.xlsx
function to read an Excel file as follows:> install.packages("xlsx", dependencies=TRUE) > library(gdata) > mydata <- read.xls("mydata.xls")
- To write these data frames or table objects into a CSV or table file, use the
read.csv
orwrite.table
function as follows:> write.table(x, file = "myexcel.xls", append = FALSE, quote = TRUE, sep = " ") > write.csv(x, col.names = NA, sep = ",")
How it works…
The read.csv
or write.csv
commands take the filename in the current working directory—if a complete path has not been specified—and based on the separators (usually the sep
argument), import the data frames (or export them in case of write
commands). To find out the current working directory, use the getwd()
command. In order to change it to your desired directory, use the setwd
function as follows:
> setwd("path/to desired/directory")
The second argument header indicates whether or not the first row is a set of labels by taking the Boolean values TRUE
or FALSE
. The read.csv
function may not work in the case of incomplete tables with the default argument fill
. To overcome such issues, use the value, TRUE
for the fill
argument. To learn more about optional arguments, take a look at the help section of the read.table
function. Both the functions (read.table
and read.csv
) can use the headers (usually the first row) as column names and specify certain column numbers as row names.
There's more…
To get further information about the loaded dataset, use the class
function for the dataset to get the type of dataset (object class). The data or object type in R can be of numerous types. This is beyond the scope of the book. It is expected that the reader is acquainted with these terms. Here, in the case of the iris data, the type is a data frame with 150 rows and five columns (type the dim
command with iris
as the argument). A data frame class is like a matrix but can accommodate objects of different types, such as character, numeric, and factor, within it. You can take a look at the first or last few rows using the head
or tail
functions (there are six rows by default) respectively, as follows:
> class(iris) > dim(iris) > head(iris) > tail(iris)
The following WriteXLS
package allows us to write an object into an Excel file for the x
data object:
> install.packages(WriteXLS) > library(WriteXLS) > WriteXLS(x, ExcelFileName = "R.xls")
The package also allows us to write a list of data frames into the different sheets of an Excel file. The WriteXLS
function uses Perl in the background to carry out tasks. The sheet
argument can be set within the function and assigned the sheet number where you want to write the data.
The save
function in R is a standard way to save an object. However, the saveRDS
function offers an advantage as it doesn't save both the object and its name; it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that will be different from the name it had when it was originally serialized. Let's take a look at the following example:
> saveRDS(myObj, "myObj.rds") > myObj2 <- readRDS("myObj.rds") > ls() [1] "myObj" "myObj2"
Another package named data.table
can be used to perform data reading at a faster speed, which is especially suited for larger data. To know more about the package, visit the CRAN page for the package at http://cran.r-project.org/web/packages/data.table/index.html.
The foreign
package (http://cran.r-project.org/web/packages/foreign/index.html) is available to read/write data for other programs such as SPSS and SAS.