Reading and Writing Data in R: A Comprehensive Tutorial

Rumman Ansari   Software Engineer   2024-07-09 04:01:35   6835  Share
Subject Syllabus DetailsSubject Details
☰ TContent
☰Fullscreen

Table of Content:

Before we start with analyzing any data, we must load it into our R workspace. This can be done directly either by loading an external R object (typical file extensions are .rda or .RData, but it is not limited to these extensions) or an internal R object for a package or a TXT, CSV, or Excel file. This recipe explains the methods that can be used to read data from a table or the .csv format and/or write similar files into an R session.

Getting ready

We will use an iris dataset for this recipe, which is available with R Base packages. The dataset bears quantified features of the morphologic variation of the three related species of Iris flowers.

How to do it…

Perform the following steps to read and write functions in R:

  1. Load internal R data (already available with a package or base R) using the following datafunction:
    > data(iris) 
    
  2. To learn more about iris data, check the help function in R using the following function:
    > ?iris
    
  3. Load external R data (conventionally saved as .rda or .RData, but not limited to this) with the following load function:
    > load(file="mydata.RData")
    
  4. To save a data object, say, D, you can use the save function as follows:
    > save(D, file="myData.RData") 
    
  5. To read the tabular data in the form of a .csv file with read.csv or read.table, type the following command:
    > mydata <- read.table("file.dat", header = TRUE, sep="\t", row.names = 1)
    > mydata <- read.csv("mydata.csv")
    
  6. It is also possible to read an Excel file in R. You can achieve this with various packages such as xlsx and gdata. The xlsx package requires Java settings, while gdata is relatively simple. However, the xlsx package offers more functionalities, such as read permissions for different sheets in a workbook and the newer versions of Excel files. For this example, we will use the xlsxpackage. Use the read.xlsx function to read an Excel file as follows:
    > install.packages("xlsx", dependencies=TRUE)
    > library(gdata)
    > mydata <- read.xls("mydata.xls")
    
  7. To write these data frames or table objects into a CSV or table file, use the read.csv or write.table function as follows:
    > write.table(x, file = "myexcel.xls", append = FALSE, quote = TRUE, sep = " ") 
    > write.csv(x, col.names = NA, sep = ",")
    

How it works…

The read.csv or write.csv commands take the filename in the current working directory—if a complete path has not been specified—and based on the separators (usually the sep argument), import the data frames (or export them in case of write commands). To find out the current working directory, use the getwd() command. In order to change it to your desired directory, use the setwdfunction as follows:

 
> setwd("path/to desired/directory") 

The second argument header indicates whether or not the first row is a set of labels by taking the Boolean values TRUE or FALSE. The read.csv function may not work in the case of incomplete tables with the default argument fill. To overcome such issues, use the value, TRUE for the fillargument. To learn more about optional arguments, take a look at the help section of the read.tablefunction. Both the functions (read.table and read.csv) can use the headers (usually the first row) as column names and specify certain column numbers as row names.

There's more…

To get further information about the loaded dataset, use the class function for the dataset to get the type of dataset (object class). The data or object type in R can be of numerous types. This is beyond the scope of the book. It is expected that the reader is acquainted with these terms. Here, in the case of the iris data, the type is a data frame with 150 rows and five columns (type the dim command with iris as the argument). A data frame class is like a matrix but can accommodate objects of different types, such as character, numeric, and factor, within it. You can take a look at the first or last few rows using the head or tail functions (there are six rows by default) respectively, as follows:

> class(iris) 
> dim(iris) 
> head(iris) 
> tail(iris) 

The following WriteXLS package allows us to write an object into an Excel file for the x data object:

> install.packages(WriteXLS)
> library(WriteXLS)
> WriteXLS(x, ExcelFileName = "R.xls")

The package also allows us to write a list of data frames into the different sheets of an Excel file. The WriteXLS function uses Perl in the background to carry out tasks. The sheet argument can be set within the function and assigned the sheet number where you want to write the data.

The save function in R is a standard way to save an object. However, the saveRDS function offers an advantage as it doesn't save both the object and its name; it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that will be different from the name it had when it was originally serialized. Let's take a look at the following example:

> saveRDS(myObj, "myObj.rds")
> myObj2 <- readRDS("myObj.rds")
> ls()
[1] "myObj"  "myObj2"

Another package named data.table can be used to perform data reading at a faster speed, which is especially suited for larger data. To know more about the package, visit the CRAN page for the package at http://cran.r-project.org/web/packages/data.table/index.html.

The foreign package (http://cran.r-project.org/web/packages/foreign/index.html) is available to read/write data for other programs such as SPSS and SAS.