Data Frame Creation in R Programming Language: Methods and Examples

Rumman Ansari   Software Engineer   2024-07-05 06:59:36   5955  Share
Subject Syllabus DetailsSubject Details 4 Program
☰ TContent
☰Fullscreen

Data frames are used to store tabular data in R. They are an important type of object in R and are used in a variety of statistical modeling applications. Hadley Wickham’s package dplyr has an optimized set of functions designed to work efficiently with data frames.

Data frames are represented as a special type of list where every element of the list has to have the same length. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows.

Unlike matrices, data frames can store different classes of objects in each column. Matrices must have every element be the same class (e.g. all integers or all numeric).

In addition to column names, indicating the names of the variables or predictors, data frames have a special attribute called row.names which indicate information about each row of the data frame.

Data frames are usually created by reading in a dataset using the read.table() or read.csv(). However, data frames can also be created explicitly with the data.frame() function or they can be coerced from other types of objects like lists.

Data frames can be converted to a matrix by calling data.matrix() . While it might seem that the as.matrix() function should be used to coerce a data frame to a matrix, almost always, what you want is the result of data.matrix().

 <span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> x </span><span class="pun">&lt;-</span><span class="pln"> data</span><span class="pun">.</span><span class="pln">frame</span><span class="pun">(</span><span class="pln">foo </span><span class="pun">=</span><span class="pln"> </span><span class="lit">1</span><span class="pun">:</span><span class="lit">4</span><span class="pun">,</span><span class="pln"> bar </span><span class="pun">=</span><span class="pln"> c</span><span class="pun">(</span><span class="pln">T</span><span class="pun">,</span><span class="pln"> T</span><span class="pun">,</span><span class="pln"> F</span><span class="pun">,</span><span class="pln"> F</span><span class="pun">))</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> x
foo bar
</span><span class="lit">1</span><span class="pln"> </span><span class="lit">1</span><span class="pln"> TRUE
</span><span class="lit">2</span><span class="pln"> </span><span class="lit">2</span><span class="pln"> TRUE
</span><span class="lit">3</span><span class="pln"> </span><span class="lit">3</span><span class="pln"> FALSE
</span><span class="lit">4</span><span class="pln"> </span><span class="lit">4</span><span class="pln"> FALSE
</span><span class="pun">&gt;</span><span class="pln"> nrow</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span><span class="pln">
</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="lit">4</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> ncol</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span><span class="pln">
</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="lit">2</span><span class="pln">
</span>
 

Names

R objects can havenames, which is very useful for writing readable code and self-describing objects. Here is an example of assigning names to an integer vector.

 <span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> x </span><span class="pun">&lt;-</span><span class="pln"> </span><span class="lit">1</span><span class="pun">:</span><span class="lit">3</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> names</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span><span class="pln">
NULL
</span><span class="pun">&gt;</span><span class="pln"> names</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span><span class="pln"> </span><span class="pun">&lt;-</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"New York"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"Seattle"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"Los Angeles"</span><span class="pun">)</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> x
</span><span class="typ">New</span><span class="pln"> </span><span class="typ">York</span><span class="pln"> </span><span class="typ">Seattle</span><span class="pln"> </span><span class="typ">Los</span><span class="pln"> </span><span class="typ">Angeles</span><span class="pln">
</span><span class="lit">1</span><span class="pln"> </span><span class="lit">2</span><span class="pln"> </span><span class="lit">3</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> names</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span><span class="pln">
</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="str">"New York"</span><span class="pln"> </span><span class="str">"Seattle"</span><span class="pln"> </span><span class="str">"Los Angeles"</span><span class="pln">
</span>
 

Lists can also have names, which is often very useful.

 <span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> x </span><span class="pun">&lt;-</span><span class="pln"> list</span><span class="pun">(</span><span class="str">"Los Angeles"</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="lit">1</span><span class="pun">,</span><span class="pln"> </span><span class="typ">Boston</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="lit">2</span><span class="pun">,</span><span class="pln"> </span><span class="typ">London</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="lit">3</span><span class="pun">)</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> x
$</span><span class="str">`Los Angeles`</span><span class="pln">
</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="lit">1</span><span class="pln">
$Boston
</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="lit">2</span><span class="pln">
$London
</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="lit">3</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> names</span><span class="pun">(</span><span class="pln">x</span><span class="pun">)</span><span class="pln">
</span><span class="pun">[</span><span class="lit">1</span><span class="pun">]</span><span class="pln"> </span><span class="str">"Los Angeles"</span><span class="pln"> </span><span class="str">"Boston"</span><span class="pln"> </span><span class="str">"London"</span><span class="pln">
</span>
 

Matrices can have both column and row names.

 <span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> m </span><span class="pun">&lt;-</span><span class="pln"> matrix</span><span class="pun">(</span><span class="lit">1</span><span class="pun">:</span><span class="lit">4</span><span class="pun">,</span><span class="pln"> nrow </span><span class="pun">=</span><span class="pln"> </span><span class="lit">2</span><span class="pun">,</span><span class="pln"> ncol </span><span class="pun">=</span><span class="pln"> </span><span class="lit">2</span><span class="pun">)</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> dimnames</span><span class="pun">(</span><span class="pln">m</span><span class="pun">)</span><span class="pln"> </span><span class="pun">&lt;-</span><span class="pln"> list</span><span class="pun">(</span><span class="pln">c</span><span class="pun">(</span><span class="str">"a"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"b"</span><span class="pun">),</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"c"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"d"</span><span class="pun">))</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> m
c d
a </span><span class="lit">1</span><span class="pln"> </span><span class="lit">3</span><span class="pln">
b </span><span class="lit">2</span><span class="pln"> </span><span class="lit">4</span><span class="pln">
</span>
 

Column names and row names can be set separately using the colnames() and rownames() functions.

 <span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> colnames</span><span class="pun">(</span><span class="pln">m</span><span class="pun">)</span><span class="pln"> </span><span class="pun">&lt;-</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"h"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"f"</span><span class="pun">)</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> rownames</span><span class="pun">(</span><span class="pln">m</span><span class="pun">)</span><span class="pln"> </span><span class="pun">&lt;-</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"x"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"z"</span><span class="pun">)</span><span class="pln">
</span><span class="pun">&gt;</span><span class="pln"> m
h f
x </span><span class="lit">1</span><span class="pln"> </span><span class="lit">3</span><span class="pln">
z </span><span class="lit">2</span><span class="pln"> </span><span class="lit">4</span><span class="pln">
</span>
 

Note that for data frames, there is a separate function for setting the row names, the row.names() function. Also, data frames do not have column names, they just have names (like lists). So to set the column names of a data frame just use the names() function. Yes, I know its confusing. Here’s a quick summary:

Object Set column names  Set row names
data frame  names() row.names()
matrix colnames() rownames()

Example of Data Frame

> myvalues1 = c(348, -343, 937, 394, 124)
> myvalues2 = c(T, F, T, T, F)
> names = c("trial 1", "trial 2", "trial 3", "trial 4", "trial 5")
> dataframe1 = data.frame(myvalues1, myvalues2, row.names = names)

Get the Structure of the Data Frame

The structure of the data frame can be seen by using str() function.

> str(dataframe1)

Output

When we execute the above code, it produces the following result ?


'data.frame':	5 obs. of  2 variables:
 $ myvalues1: num  348 -343 937 394 124
 $ myvalues2: logi  TRUE FALSE TRUE TRUE FALSE

Summary of Data in Data Frame

The statistical summary and nature of the data can be obtained by applying summary() function.

 
> # Print the summary.
> print(summary(dataframe1))

Output

When we execute the above code, it produces the following result ?


   myvalues1    myvalues2      
 Min.   :-343   Mode :logical  
 1st Qu.: 124   FALSE:2        
 Median : 348   TRUE :3        
 Mean   : 292                  
 3rd Qu.: 394                  
 Max.   : 937  

Extract Data from Data Frame

Extract specific column from a data frame using column name.

 
> # Extract Specific columns.
> result <- data.frame(dataframe1$myvalues1)
> print(result)

Output

When we execute the above code, it produces the following result ?


  dataframe1.myvalues1
1                  348
2                 -343
3                  937
4                  394
5                  124

Extract the first two rows and then all columns

 
> # Extract first two rows.
> result <- dataframe1[1:2,]
> print(result)

Output

When we execute the above code, it produces the following result ?


        myvalues1 myvalues2
trial 1       348      TRUE
trial 2      -343     FALSE

Extract 3rd and 5th row with 1nd and 2th column

 
> # Extract 3rd and 5th row with 1nd and 2th column.
> result <- dataframe1[c(3,5),c(1,2)]
> print(result)

Output

When we execute the above code, it produces the following result ?


        myvalues1 myvalues2
trial 3       937      TRUE
trial 5       124     FALSE

No Questions Data Available.

Stay Ahead of the Curve! Check out these trending topics and sharpen your skills.