Dataframe is a data structure similar to matrix, with a special feature that different columns can have different data types. Dataframe is very useful for combining vectors of same length with different data types into a single data structure Similar to matrices, all the columns of a data frame should have same number of rows.
A data frame is made up of individual vectors of same length placed as columns. We can easily create a data frame from vectors using
> data1 <- c("Iron","Sulphur","Calcium", "Magnecium", "Copper") > data2 <- c(12.5, 32.6, 16.7, 20.6, 7.5) > data3 <- c(1122, 1123, 1124, 1125, 1126) > > frm1 <- data.frame(data1, data2, data3) > > frm1
data1 data2 data3 1 Iron 12.5 1122 2 Sulphur 32.6 1123 3 Calcium 16.7 1124 4 Magnecium 20.6 1125 5 Copper 7.5 1126
In the above example, note that the column names of the data frame 'frm1' we created are just the names of the objects themselves. A sequence of indices 1,2,3,4 and 5 have been added as row names, by default.
To get the column names of a data frame, call
> names(frm1)
> rname = rownames(frm1) > > rname
> cname = colnames(frm1) > > cname
The columns of a data frame can be named explicitly using a vector of strings. For the above frame "frm1", we can set the column names with our own vector of strings.
> names(frm1) <- c("Element", "Proportion", "Product_ID") > > frm1
Element Proportion Product_ID 1 Iron 12.5 1122 2 Sulphur 32.6 1123 3 Calcium 16.7 1124 4 Magnecium 20.6 1125 5 Copper 7.5 1126
In the above example, we can use
Similarly, the row names can be initialized by a vector of strings:
> rownames(frm1) = c("elmt-1","elmt-2","elmt-3","elmt-4","elmt-5") > > frm1
Element Proportion Product_ID elmt-1 Iron 12.5 1122 elmt-2 Sulphur 32.6 1123 elmt-3 Calcium 16.7 1124 elmt-4 Magnecium 20.6 1125 elmt-5 Copper 7.5 1126
The elements of a Data frame are accessed using same subscript convention as matrices.
Thus,
> frm1[1,3]
> frm1[1,]
> frm1[,2]
> frm1[1:3,]
Element Proportion Product_ID 1 Iron 12.5 1122 2 Sulphur 32.6 1123 3 Calcium 16.7 1124
We can also access a column of a dataframe by its name, by typing the frame name and the column names separated by a '$' sign. The accessed column is treated as a vector. For example, columns of the data frame 'frm1' can be accessed by their names as shown here:
> frm1$Element
> frm1$Proportion
> frm1$Product_ID
> 1000*frm1$Proportion
We can add a new column to the existing data frame by creating a vector and naming it as a new column of the frame. Obviously, this vector should have same length as the number of rows of the existing frame. We will add a new column called "symbol" to the existing frame "frm1":
> frm1$symbol = c("Fe","S","Ca","Mg","Cu") > > frm1
Element Proportion Product_ID symbol 1 Iron 12.5 1122 Fe 2 Sulphur 32.6 1123 S 3 Calcium 16.7 1124 Ca 4 Magnecium 20.6 1125 Mg 5 Copper 7.5 1126 Cu
A column can be removed from a data frame by accessing it by name and assigning NULL value to it. In the following example, we will access the column named "Product-ID" from frane "frm1" and remove it:
> frm1
Element Proportion Product_ID symbol elmt-1 Iron 12.5 1122 Fe elmt-2 Sulphur 32.6 1123 S elmt-3 Calcium 16.7 1124 Ca elmt-4 Magnecium 20.6 1125 Mg elmt-5 Copper 7.5 1126 Cu
> > frm1$Product_ID <- NULL > > frm1
Element Proportion symbol elmt-1 Iron 12.5 Fe elmt-2 Sulphur 32.6 S elmt-3 Calcium 16.7 Ca elmt-4 Magnecium 20.6 Mg elmt-5 Copper 7.5 Cu
We learnt to access a column of a data frame by mentioning the column name along with the frame name separated by '$' sign. When there are more than one data frame in memory with same column names(s), this format can distinguish between them. Suppose we have a situation when we do not have this naming conflict. In this case it will more convenient to access the column by mentioning only its name, dropping the frame name. We use
The
> frm1 Element Proportion symbol elmt-1 Iron 12.5 Fe elmt-2 Sulphur 32.6 S elmt-3 Calcium 16.7 Ca elmt-4 Magnecium 20.6 Mg elmt-5 Copper 7.5 Cu > > symbol
> > attach(frm1) > > symbol