Sunday 26 May 2013

R gotcha - regular expressions

Just a quick post --- I came across this today and thought it was worth mentioning.

By default, the regular expression functions grep, gsub, regexpr, etc use extended regular expressions. By passing in perl=TRUE as an argument, one can use Perl regular expressions.

Note that in extended regular expressions, the . character matches the newline character '\n'. In Perl regular expressions, it doesn't.

grep('.', '\n')
## [1] 1
grep('.', '\n', perl=T)
## integer(0)

Something to keep in mind if you use regular expressions in R with strings with embedded newlines and were having puzzling results.

No comments:

Post a Comment