Regular Expressions in Ruby
Regular expressions (regex) are a powerful tool for pattern matching and text manipulation. In Ruby, regex is built into the language at a fundamental level, making it incredibly ergonomic to use. Whether you need to validate input, extract data, or transform strings, regex is an essential skill for any Ruby developer.
Creating Regular Expressions in Ruby
Ruby provides two ways to create a regular expression: using slashes /pattern/ or the %r{} syntax. The slash notation is most common for simple patterns, while %r{} is useful when your pattern contains forward slashes.
# Simple regex using slashes
email_pattern = /\w+@\w+\.\w+/
# Using %r{} syntax - useful when pattern contains slashes
url_pattern = %r{https?://[\w.-]+}
# Regex with options (i for case-insensitive, x for extended)
case_insensitive = /ruby/i
The =~ operator returns the index of the first match or nil if no match is found. Ruby also provides the .match() method, which returns a MatchData object with rich information about the match.
Basic Pattern Matching
At its core, regex matches characters against patterns. The most basic patterns match literal characters, but the real power comes from character classes and quantifiers.
text = "The quick brown fox jumps over the lazy dog"
# Find position of first match
text =~ /fox/ # => 16
# Using match method - returns MatchData object
match = text.match(/\w+/)
match[0] # => "The"
match.begin(0) # => 0
Character Classes
Character classes let you match sets of characters. Use brackets [] to define a class, and use ^ at the start to negate.
"Ruby 123" =~ /[0-9]/ # => 5 (first digit position)
"Ruby 123" =~ /[a-z]/ # => 0 (first lowercase letter)
"Ruby 123" =~ /[^a-z]/ # => 4 (first non-lowercase)
"test" =~ /\w/ # => 0 (word character: a-z, A-Z, 0-9, _)
Common Shortcuts
Ruby regex provides convenient shortcuts for common patterns:
| Shortcut | Matches |
|---|---|
\d | Any digit (0-9) |
\D | Any non-digit |
\w | Word character (a-z, A-Z, 0-9, _) |
\W | Non-word character |
\s | Whitespace |
\S | Non-whitespace |
. | Any character except newline |
phone = "Call me at 555-123-4567"
phone =~ /\d{3}-\d{3}-\d{4}/ # => 11
phone =~ /\d+/ # => 11 (first sequence of digits)
Quantifiers
Quantifiers specify how many times a pattern should match. Ruby supports greedy and non-greedy variants.
text = "aaaa"
"a{2,4}" # Match 2 to 4 times - matches "aaaa" (greedy)
"a{2,4}?" # Non-greedy - matches "aa"
"a+" # One or more - matches "aaaa"
"a*" # Zero or more - matches ""
"a?" # Zero or one - matches "" or "a"
The difference between greedy and non-greedy quantifiers matters when extracting matched text. Greedy quantifiers match as much as possible, while non-greedy match as little as possible.
Capturing Groups
Groups let you extract specific parts of a match. Use parentheses () to create groups, which you can then reference by number.
date = "2026-03-07"
match = date.match(/(\d{4})-(\d{2})-(\d{2})/)
match[0] # => "2026-03-07" (full match)
match[1] # => "2026" (first group)
match[2] # => "03" (second group)
match[3] # => "07" (third group)
match.captures # => ["2026", "03", "07"]
Named Captures
Ruby also supports named capture groups, which make your code more readable:
text = "John Doe"
match = text.match(/(?<first>\w+) (?<last>\w+)/)
match[:first] # => "John"
match[:last] # => "Doe"
match["first"] # => "John"
Anchors
Anchors don’t match characters—they match positions in the string. They’re essential for validating entire strings.
# ^ matches start of string
"hello" =~ /^hello/ # => 0
"ohello" =~ /^hello/ # => nil
# $ matches end of string
"hello" =~ /hello$/ # => 0
"helloo" =~ /hello$/ # => nil
# \b matches word boundary
"cat catalog" =~ /\bcat\b/ # => 0 (matches "cat", not "catalog")
Ruby-Specific Regex Features
Ruby extends regex with some convenient features:
String#scan
The scan method returns an array of all matches:
"one two three".scan(/\w+/) # => ["one", "two", "three"]
"abc123def456".scan(/\d+/) # => ["123", "456"]
# With captures - returns array of arrays
"john@email.com".scan(/(\w+)@(\w+)/)
# => [["john", "email"]]
String#gsub and String#sub
Replace text using regex:
"hello world".gsub(/world/, "Ruby") # => "hello Ruby"
# Using captured groups in replacement
"first second".gsub(/(\w+) (\w+)/, '\2, \1')
# => "second, first"
# Block form - match object available
"price: $100".gsub(/\$(\d+)/) { |m| "##{$1}" }
# => "price: ##100"
String#partition and String#rpartition
Split string at first (or last) regex match:
"filename.txt".partition(/\./)
# => ["filename", ".", "txt"]
"a-b-c-d".partition(/-/)
# => ["a", "-", "b-c-d"]
Practical Examples
Email Validation
def valid_email?(email)
email =~ /\A[\w.+-]+@[\w.-]+\.\w{2,}\z/
end
valid_email?("user@example.com") # => 0 (truthy)
valid_email?("invalid") # => nil (falsy)
Extracting Numbers from Text
text = "Total: $1,250.00 (USD)"
# Extract all numbers including decimals
text.scan(/\d+\.?\d*/).reject(&:empty?)
# => ["1", "250", "00"]
# Or more precisely
text.scan(/[\d,]+\.\d{2}/)
# => ["1,250.00"]
URL Parsing
url = "https://rubyguides.dev/guides regex/?page=2#section"
match = url.match(%r{
(?<protocol>https?)://
(?<domain>[\w.-]+)
(?<path>/[^?#]*)?
(?:\?(?<query>[^#]*))?
(?:\#(?<fragment>.*))?
}x)
match[:protocol] # => "https"
match[:domain] # => "rubyguides.dev"
match[:path] # => "/guides regex/"
match[:query] # => "page=2"
When to Use Regular Expressions
Regular expressions excel at pattern matching tasks: validation, extraction, and replacement. However, they’re not always the right tool. For simple string operations, Ruby methods like start_with?, end_with?, include?, or split may be clearer and faster.
Use regex when:
- You need to match complex, variable patterns
- You’re validating input against a specific format
- You need to extract structured data from unstructured text
Use simpler methods when:
- You’re doing exact matches or simple prefix/suffix checks
- Performance is critical and the pattern is simple
- The code needs to be readable by others
Summary
Ruby’s regex support is elegant and powerful. The =~ operator and MatchData object make pattern matching intuitive, while methods like scan, gsub, and partition provide functional ways to work with matches. Remember to use anchors ^ and $ when validating entire strings, and consider named captures for readability when working with complex patterns.