Regex Cheat Sheet
29 Aug 2012Regexes (regular expressions) are an extremely useful tool, but I find myself getting tripped up by differences in their APIs and capabilities in different languages. Here’s a regex Rosetta Stone, covering how to use regexes in various programming language.
Supported syntax
Different languages and libraries support different syntaxes (supported special characters and so on).
Perl
Basic syntax, extended patterns
Python
Raw strings are useful to cut down on the number of backslashes you need.
C++
Boost.Regex syntax (Perl-compatible by default, including support for Perl extended patterns)
Raw string literals are useful to cut down on the number of backslashes you need, if your compiler supports them.
C#
Regular Expression Language Elements
@-quoted string literals are useful to cut down on the number of backslashes you need.
JavaScript
“Writing a Regular Expression Pattern” on the Mozilla Developer Network
Supported modifiers
Regexes generally support modifiers to control case sensitivity, etc.
Perl
Python
C++
boost::regex_constants::syntax_option_type
C#
System.Text.RegularExpressions.RegexOptions
JavaScript
See the “flags” section of the parameters to the RegExp object.
At the top of the file
To make regex functionality available in your module or source file:
Perl
Nothing necessary
Python
import re
C++
#include <boost/regex.hpp>
// Optionally:
using namespace boost::regex;
C#
using System.Text.RegularExpressions;
JavaScript
Nothing necessary
Matching an entire string
Perl
if ($text =~ /^hello \d+$/) { ... }
Perl regexes must be explicitly anchored using ^ and $ to match the entire string.
Python
if re.match(r'hello \d+$', text):
...
re.match starts at the beginning of the string but requires $ to anchor the match to the end of the string.
C++
if (regex_match(text, boost::regex("hello \\d+"))) { ... }
Use boost::regex_match if you want to require that the entire string match.
C#
if (Regex.isMatch(text, @"^Hello \d+$")) { ... }
C# regexes must be explicitly anchored using ^ and $ to match the entire string.
JavaScript
if (/^Hello \d+$/.test(text)) { ... }
JavaScript regexes must be explicitly anchored using ^ and $ to match the entire string.
Matching a substring
Perl
if ($text =~ /Hello \d+/) { ... }
Python
if re.search(r'Hello \d+', text):
...
Use re.search instead of re.match to search for a substring anywhere within the string.
C++
if (regex_search(text, boost::regex("Hello \\d+"))) { ... }
Use boost::regex_search if you want to search for a substring anywhere within the string.
C#
if (Regex.isMatch(text, @"Hello \d+")) { ... }
JavaScript
if (/Hello \d+/.test(text)) { ... }
Performing a case-insensitive match
Perl
if ($text =~ /hello \d+/i) { ... }
Python
if re.search(r'hello \d+', text, re.I):
...
C++
if (regex_search(text, boost::regex("hello \\d+",
boost::regex_constants::icase))) { ... }
C#
if (Regex.isMatch(text, @"hello \d+", RegexOptions.IgnoreCase)) { ... }
JavaScript
if (/hello \d+/i.test(s)) { ... }
Storing a regex for later use
Perl
$r = qr/hello \d+/i;
if ($text =~ $r) { ... }
Python
r = re.compile(r'hello \d+', re.I)
if r.search(text):
...
Note that Python automatically caches the most recently used patterns (see here and here), so you won’t necessarily see a performance gain by compiling a regex.
C++
const boost::regex r("hello \\d+", boost::regex_constants::icase);
if (regex_search(text, r)) { ... }
C#
Regex r = new Regex(@"hello \d", RegexOptions.IgnoreCase);
if (r.IsMatch(s)) { ... }
Note that .NET automatically caches the most recently used patterns, so you won't necessarily see a performance gain by storing a regex for later use. Also note that, while most other languages define “compiling” a regex as interpreting it, .NET supports compiling regexes to actual IL, as described here and here.
JavaScript
var r = /hello \d+/i;
// or var r = new RegExp("hello \\d+", "i");
if (r.test(s)) { ... }
Replacing part of a string
Perl
# Replace all occurrences:
$test =~ s/Hello/Goodbye/g;
# Replace the first occurrence only:
$text =~ s/Hello/Goodbye/;
Python
# Replace all occurrences:
text = re.sub('Hello', 'Goodbye', text)
# Replace the first occurrence only:
text = re.sub('Hello', 'Goodbye', text, count=1)
C++
// Replace all occurrences:
text = regex_replace(text, boost::regex("Hello"), "Goodbye");
// Replace the first occurrence only:
text = regex_replace(text, boost::regex("Hello"), "Goodbye",
boost::regex_constants::format_first_only);
C#
// Replace all occurrences:
text = Regex.Replace(text, "Hello", "Goodbye");
// Replace the first occurrence only:
Regex r = new Regex("Hello");
text = r.Replace(text, "Goodbye", 1);
JavaScript
// Replace all occurrences:
text = text.replace(/Hello/g, 'Goodbye');
// Replace the first occurrence only:
text = text.replace(/Hello/, 'Goodbye');
Extracting parts of a string
Perl
if (($title, $name) = $text =~ /(Mr\.|Mrs\.|Dr\.) (\w+)/) { ... }
Python
m = re.search(r'(Mr\.|Mrs\.|Dr\.) (\w+)', text)
if m:
title, name = m.groups()
...
C++
boost::smatch m;
if (regex_search(text, m, boost::regex("(Mr\\.|Mrs\\.|Dr\\.) (\\w+)"))) {
const std::string& title = m[1].str();
const std::string& name = m[2].str();
...
}
C#
Match m = Regex.Match(text, @"(Mr\.|Mrs\.|Dr\.) (\w+)");
if (m.Success) {
string title = m.Groups[1].Value;
string name = m.Groups[2].Value;
...
...
}
JavaScript
var match = /(Mr\.|Mrs\.|Dr\.) (\w+)/.exec(text);
if (match !== null) {
var title = match[1];
var name = match[2];
...
}