String#scan
scan finds every match of a pattern in the string and returns them as an array. When a block is given, it yields each match and returns the string itself.
Signature
str.scan(pattern) → array
str.scan(pattern) { |match| block } → self
Parameters:
pattern— a Regexp or String
Returns: Without a block, an Array of all matches. With a block, returns self after yielding each match in turn.
Basic Usage
Without a block, scan returns an array:
"the quick brown fox".scan(/\w+/)
# => ["the", "quick", "brown", "fox"]
"hello world".scan(/[aeiou]/)
# => ["e", "o", "o"]
With a block, each match is passed to the block and the original string is returned:
result = "hello world".scan(/[aeiou]/) { |v| v.upcase }
result # => "hello world"
Capturing Groups
When the pattern includes groups, each match is an array with one element per group:
"price: $100, total: $50".scan(/(\$\d+)/)
# => [["$100"], ["$50"]]
"abc123xyz456".scan(/(\d+)([a-z]+)/)
# => [["123", "abc"], ["456", "xyz"]]
Without groups, each match is just the matched string:
"abc123xyz456".scan(/\d+/)
# => ["123", "456"]
"abc123xyz456".scan(/[a-z]+/)
# => ["abc", "xyz"]
Practical Examples
Extracting Structured Data
data = "user: alice, user: bob, user: carol"
names = data.scan(/user: (\w+)/)
names # => [["alice"], ["bob"], ["carol"]]
# Flatten if you just want the names:
names.flatten
# => ["alice", "bob", "carol"]
Parsing Log Lines
log = <<~LOG
2024-01-15 ERROR connection timeout
2024-01-16 WARN retry attempt 3
2024-01-17 ERROR db query failed
LOG
errors = log.scan(/^\d{4}-\d{2}-\d{2} ERROR (.+)$/)
errors.flatten
# => ["connection timeout", "db query failed"]
Building a Word Frequency Map
text = "the cat sat on the mat"
frequency = Hash.new(0)
text.scan(/\w+/).each { |word| frequency[word] += 1 }
frequency
# => {"the"=>2, "cat"=>1, "sat"=>1, "on"=>1, "mat"=>1}
With a Block for Streaming
When processing large strings, use a block to avoid building a large intermediate array:
File.read("large.log").scan(/ERROR: (.+)/) do |message|
puts "Error found: #{message.first}"
end
The block receives the match(es) as separate arguments when there are groups:
"abc123def456".scan(/(.+?)(\d+)/) do |letters, digits|
puts "#{digits} digits follow #{letters}"
end
# Output:
# 123 digits follow abc
# 456 digits follow def
Compared to Other Methods
scan vs split:
"one,two,three".split(",")
# => ["one", "two", "three"]
"one,two,three".scan(/[^,]+/)
# => ["one", "two", "three"]
# split is cleaner for simple delimiter-based splitting
scan vs match:
"abc123".match(/\d+/)
# => #<MatchData "123"> (first match only)
"abc123xyz456".scan(/\d+/)
# => ["123", "456"] (all matches)
scan vs gsub:
# gsub returns the transformed string
"hello".gsub(/[aeiou]/, "*")
# => "h*ll*"
# scan returns the matched values
"hello".scan(/[aeiou]/)
# => ["e", "o"]
See Also
- /reference/string-methods/match/ — find the first match, returns MatchData
- /reference/string-methods/gsub-bang/ — find and replace all occurrences
- /reference/string-methods/sub/ — replace only the first occurrence