Character vector splitting in R

Occasionally, I would encounter a problem in R where I want to split a string in a character columns with the same separator. However, there’s is no function in R that is capable of doing that, and the strsplit function always return a list which I have to unlist it.

So today, I finally typed up a Rcpp function to ease my work and the code is as followed.

The function takes three inputs:


Test the code:

library(Rcpp)
sourceCpp('~/scripts/R/Rcpp/string_split.cpp')

testVector <- rep('I~am~a~boy',10)
for (i in 1:4){
	print(string_split(testVector,'~',i))
}
##  [1] "I" "I" "I" "I" "I" "I" "I" "I" "I" "I"
##  [1] "am" "am" "am" "am" "am" "am" "am" "am" "am" "am"
##  [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
##  [1] "boy" "boy" "boy" "boy" "boy" "boy" "boy" "boy" "boy" "boy"

Benchmarking:

library(rbenchmark)
library(ggplot2)
## Loading required package: methods
r_string_split <- function(x){
	sapply(x,function(y) unlist(strsplit(x,'~'))[2])
}

bm <- benchmark(string_split(testVector,'~',2),r_string_split(testVector))
bm
##                               test replications elapsed relative user.self
## 2       r_string_split(testVector)          100   0.050       25     0.049
## 1 string_split(testVector, "~", 2)          100   0.002        1     0.001
##   sys.self user.child sys.child
## 2        0          0         0
## 1        0          0         0
ggplot(data = bm,aes(x = test, y = relative)) +
		geom_bar(stat='identity') +
		theme(axis.text.x = element_text(angle=90,
										hjust = 1,
										vjust = 0.5))+
		labs(y = 'relative speed',title = 'benchmarking result')

plot of chunk unnamed-chunk-2

The c++ function is ~25x faster.




Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License. If you liked this post, you can share it with your followers or follow me on Twitter!