Data Frames in R Programming Language: Definition and Examples

Rumman Ansari   Software Engineer   2024-07-05 06:56:11   6206  Share
Subject Syllabus DetailsSubject Details
☰ TContent
☰Fullscreen

A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.

Following are the characteristics of a data frame.

  • The column names should be non-empty.
  • The row names should be unique.
  • The data stored in a data frame can be of numeric, factor or character type.
  • Each column should contain same number of data items.
<span class="pln">
</span><span class="com"># Create the data frame.</span><span class="pln">
emp</span><span class="pun">.</span><span class="pln">dataFra </span><span class="pun">&lt;-</span><span class="pln"> data</span><span class="pun">.</span><span class="pln">frame</span><span class="pun">(</span><span class="pln">
  emp_No </span><span class="pun">=</span><span class="pln"> c </span><span class="pun">(</span><span class="lit">1</span><span class="pun">:</span><span class="lit">6</span><span class="pun">),</span><span class="pln"> 
  emp_Name </span><span class="pun">=</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"Rumman"</span><span class="pun">,</span><span class="str">"Rambo"</span><span class="pun">,</span><span class="str">"Atnyla"</span><span class="pun">,</span><span class="str">"Ansari"</span><span class="pun">,</span><span class="str">"Kulut"</span><span class="pun">,</span><span class="str">"Hello"</span><span class="pun">),</span><span class="pln">
  salary </span><span class="pun">=</span><span class="pln"> c</span><span class="pun">(</span><span class="lit">923.15</span><span class="pun">,</span><span class="lit">232.0</span><span class="pun">,</span><span class="lit">511.0</span><span class="pun">,</span><span class="lit">629.0</span><span class="pun">,</span><span class="lit">743.25</span><span class="pun">,</span><span class="lit">898.23</span><span class="pun">),</span><span class="pln"> 
  
  start_date </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">as</span><span class="pun">.</span><span class="typ">Date</span><span class="pun">(</span><span class="pln">c</span><span class="pun">(</span><span class="str">"2018-01-01"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"2017-09-23"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"2017-11-15"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"2015-05-11"</span><span class="pun">,</span><span class="pln">
                         </span><span class="str">"2013-03-27"</span><span class="pun">,</span><span class="str">"2018-03-27"</span><span class="pun">)),</span><span class="pln">
  stringsAsFactors </span><span class="pun">=</span><span class="pln"> FALSE
</span><span class="pun">)</span><span class="pln">
</span><span class="com"># Print the data frame.			</span><span class="pln">
</span><span class="kwd">print</span><span class="pun">(</span><span class="pln">emp</span><span class="pun">.</span><span class="pln">data</span><span class="pun">)</span><span class="pln"> 
</span>
r programming language data frame r programming language data frame

Expand Data Frame

A data frame can be expanded by adding columns and rows.

Add Column

Just add the column vector using a new column name.

<span class="pln">
</span><span class="com"># Add the "Post" coulmn.</span><span class="pln">
emp</span><span class="pun">.</span><span class="pln">dataFra$Post </span><span class="pun">&lt;-</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"System Engineer"</span><span class="pun">,</span><span class="str">"Operations"</span><span class="pun">,</span><span class="str">"IT"</span><span class="pun">,</span><span class="str">"HR"</span><span class="pun">,</span><span class="str">"Finance"</span><span class="pun">,</span><span class="str">"IT"</span><span class="pun">)</span><span class="pln">
v </span><span class="pun">&lt;-</span><span class="pln"> emp</span><span class="pun">.</span><span class="pln">dataFra
</span><span class="kwd">print</span><span class="pun">(</span><span class="pln">v</span><span class="pun">)</span><span class="pln">
</span>

Output

> print(v)
  emp_No emp_Name salary start_date            Post
1      1   Rumman 923.15 2018-01-01 System Engineer
2      2    Rambo 232.00 2017-09-23      Operations
3      3   Atnyla 511.00 2017-11-15              IT
4      4   Ansari 629.00 2015-05-11              HR
5      5    Kulut 743.25 2013-03-27         Finance
6      6    Hello 898.23 2018-03-27              IT
r programming language data frame

Add Row

To add more rows permanently to an existing data frame, we need to bring in the new rows in the same structure as the existing data frame and use the rbind() function.

In the example below we create a data frame with new rows and merge it with the existing data frame to create the final data frame.

<span class="pln"> 
</span><span class="com"># Create the data frame.</span><span class="pln">
emp</span><span class="pun">.</span><span class="pln">dataFra </span><span class="pun">&lt;-</span><span class="pln"> data</span><span class="pun">.</span><span class="pln">frame</span><span class="pun">(</span><span class="pln">
  emp_No </span><span class="pun">=</span><span class="pln"> c </span><span class="pun">(</span><span class="lit">1</span><span class="pun">:</span><span class="lit">6</span><span class="pun">),</span><span class="pln"> 
  emp_Name </span><span class="pun">=</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"Rumman"</span><span class="pun">,</span><span class="str">"Rambo"</span><span class="pun">,</span><span class="str">"Atnyla"</span><span class="pun">,</span><span class="str">"Ansari"</span><span class="pun">,</span><span class="str">"Kulut"</span><span class="pun">,</span><span class="str">"Hello"</span><span class="pun">),</span><span class="pln">
  salary </span><span class="pun">=</span><span class="pln"> c</span><span class="pun">(</span><span class="lit">923.15</span><span class="pun">,</span><span class="lit">232.0</span><span class="pun">,</span><span class="lit">511.0</span><span class="pun">,</span><span class="lit">629.0</span><span class="pun">,</span><span class="lit">743.25</span><span class="pun">,</span><span class="lit">898.23</span><span class="pun">),</span><span class="pln"> 
  
  start_date </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">as</span><span class="pun">.</span><span class="typ">Date</span><span class="pun">(</span><span class="pln">c</span><span class="pun">(</span><span class="str">"2018-01-01"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"2017-09-23"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"2017-11-15"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"2015-05-11"</span><span class="pun">,</span><span class="pln">
                         </span><span class="str">"2013-03-27"</span><span class="pun">,</span><span class="str">"2018-03-27"</span><span class="pun">)),</span><span class="pln">
  stringsAsFactors </span><span class="pun">=</span><span class="pln"> FALSE
</span><span class="pun">)</span><span class="pln">


</span><span class="com"># Create the second data frame</span><span class="pln">
emp</span><span class="pun">.</span><span class="pln">newdata </span><span class="pun">&lt;-</span><span class="pln"> data</span><span class="pun">.</span><span class="pln">frame</span><span class="pun">(</span><span class="pln">
  emp_No </span><span class="pun">=</span><span class="pln"> c </span><span class="pun">(</span><span class="lit">7</span><span class="pun">:</span><span class="lit">8</span><span class="pun">),</span><span class="pln"> 
  emp_Name </span><span class="pun">=</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"Jaman"</span><span class="pun">,</span><span class="str">"Inza"</span><span class="pun">),</span><span class="pln">
  salary </span><span class="pun">=</span><span class="pln"> c</span><span class="pun">(</span><span class="lit">722.5</span><span class="pun">,</span><span class="lit">632.8</span><span class="pun">),</span><span class="pln"> 
  start_date </span><span class="pun">=</span><span class="pln"> </span><span class="kwd">as</span><span class="pun">.</span><span class="typ">Date</span><span class="pun">(</span><span class="pln">c</span><span class="pun">(</span><span class="str">"2013-07-30"</span><span class="pun">,</span><span class="str">"2014-06-17"</span><span class="pun">)),</span><span class="pln">
  stringsAsFactors </span><span class="pun">=</span><span class="pln"> FALSE
</span><span class="pun">)</span><span class="pln">

</span><span class="com"># Bind the two data frames.</span><span class="pln">
emp</span><span class="pun">.</span><span class="pln">finaldata </span><span class="pun">&lt;-</span><span class="pln"> rbind</span><span class="pun">(</span><span class="pln">emp</span><span class="pun">.</span><span class="pln">dataFra</span><span class="pun">,</span><span class="pln">emp</span><span class="pun">.</span><span class="pln">newdata</span><span class="pun">)</span><span class="pln">
</span><span class="kwd">print</span><span class="pun">(</span><span class="pln">emp</span><span class="pun">.</span><span class="pln">finaldata</span><span class="pun">)</span><span class="pln">
</span>

Output

> print(emp.finaldata)
  emp_No emp_Name salary start_date
1      1   Rumman 923.15 2018-01-01
2      2    Rambo 232.00 2017-09-23
3      3   Atnyla 511.00 2017-11-15
4      4   Ansari 629.00 2015-05-11
5      5    Kulut 743.25 2013-03-27
6      6    Hello 898.23 2018-03-27
7      7    Jaman 722.50 2013-07-30
8      8     Inza 632.80 2014-06-17

No Questions Data Available.
No Program Data.

Stay Ahead of the Curve! Check out these trending topics and sharpen your skills.