The DNA and RNA sequence strings form an important component of the data from genomics experiments. Any language used for genomics data analysis should support a set standard operations on strings. R has excellent set of library functions that support string operations. We will learn them one by one.
> str = "abcacabac" > str1 = "qqqqqq" > str2 = " ++++++" > > str1
> str <- "abcacabac" > str1 <- "qqqqqq" > str2 <- " ++++++"
The = operator assigns the value on its right to the variable to its left. But the assignment operator <- can assign value to a variable either from left to right or from right to left. Thus the following two assignments for a string s are equally valid:
> str <- "abcacabac" > "abcacabac" -> str
The number of characters in a string (called string length ) is returned by a function called "nchar()". In the following commands, the number of characters in the string 'astr' returned by the function 'nchar' is copied on to a varable called 'slen':
> astr = "ATGCGCTAGACAG" > slen = nchar(astr) > slen
We can concatinate (join) two or more strings using paste() function. The "paste()" function takes two or more strings. By default, it joins the strings with a single space between them:
> str1 = "ATGCTGAG" > str2 = "XXXXX" > > ps = paste(str1,str2) > > ps
The
> scat <- paste(str, str1, sep="---") > scat
To concatinate the strings "str1" and "str2" without any gap between them, use a null separator:
> scat = paste(str1,str2,sep="") > scat
We can concatinate more than two strings with
> st1 = "AAAAA" > st2 = "TTTT" > st3 = "GGGG" > > combstr = paste(st1,st2,st3,sep="_") > > combstr
A substring can be formed by calling
> str = "Mitochondria and Golgi bodies" > > su = substr(str,4,8) > > su
We can also replace a portion of string with other substring:
> substr(scat,4,8) <- "UUUUU" > scat
In case we want a substring from a given start positition to the end of original string, give an arbitrarily large integer for the end location:
> str3 = "WWW.objsite.com" > sublg <- substr(str3,4,100000000L) > sublg
Instead of using a long integer to represent the end of the string, we can use the
> str3 = "WWW.objsite.com" > sublg <- substr(str3,4,nchar(str3)) > sublg
A string can be truncated to a certain number of characters from its beginning with
> str4 <- "AECH9939-ALM" > strunk <- strtrim(str4, 4) > strunk
The function
> st = "filename_doc" > > strsplit(st, "_")
[[1]] [1] "filename" "doc"
The two portions of the split string can be converted to a list, as shown below. More on lists later:
> aa <- unlist(strsplit("fname.doc", "\\.")) > aa[1]
[[1]] [1] "fname"
> aa[2]
To split a string by 'special characters' such as dot(.), we have to place the character after a double backslash inside quotes:
> ss = "filename.doc" > > strsplit(ss, "\\.")
[[1]] [1] "filename" "doc"
For converting the upper cases to lower cases and vice versa , we use functions
> str = "THIS IS a sentance" > > toupper(str)
> tolower(str)