rubyguides

String#scan

scan finds every match of a pattern in the string and returns them as an array. When a block is given, it yields each match and returns the string itself.

Signature

str.scan(pattern)           → array
str.scan(pattern) { |match| block } → self

Parameters:

  • pattern — a Regexp or String

Returns: Without a block, an Array of all matches. With a block, returns self after yielding each match in turn.

Basic Usage

Without a block, scan returns an array:

"the quick brown fox".scan(/\w+/)
# => ["the", "quick", "brown", "fox"]

"hello world".scan(/[aeiou]/)
# => ["e", "o", "o"]

With a block, each match is passed to the block and the original string is returned:

result = "hello world".scan(/[aeiou]/) { |v| v.upcase }
result  # => "hello world"

Capturing Groups

When the pattern includes groups, each match is an array with one element per group:

"price: $100, total: $50".scan(/(\$\d+)/)
# => [["$100"], ["$50"]]

"abc123xyz456".scan(/(\d+)([a-z]+)/)
# => [["123", "abc"], ["456", "xyz"]]

Without groups, each match is just the matched string:

"abc123xyz456".scan(/\d+/)
# => ["123", "456"]

"abc123xyz456".scan(/[a-z]+/)
# => ["abc", "xyz"]

Practical Examples

Extracting Structured Data

data = "user: alice, user: bob, user: carol"
names = data.scan(/user: (\w+)/)
names  # => [["alice"], ["bob"], ["carol"]]

# Flatten if you just want the names:
names.flatten
# => ["alice", "bob", "carol"]

Parsing Log Lines

log = <<~LOG
  2024-01-15 ERROR connection timeout
  2024-01-16 WARN retry attempt 3
  2024-01-17 ERROR db query failed
LOG

errors = log.scan(/^\d{4}-\d{2}-\d{2} ERROR (.+)$/)
errors.flatten
# => ["connection timeout", "db query failed"]

Building a Word Frequency Map

text = "the cat sat on the mat"
frequency = Hash.new(0)
text.scan(/\w+/).each { |word| frequency[word] += 1 }
frequency
# => {"the"=>2, "cat"=>1, "sat"=>1, "on"=>1, "mat"=>1}

With a Block for Streaming

When processing large strings, use a block to avoid building a large intermediate array:

File.read("large.log").scan(/ERROR: (.+)/) do |message|
  puts "Error found: #{message.first}"
end

The block receives the match(es) as separate arguments when there are groups:

"abc123def456".scan(/(.+?)(\d+)/) do |letters, digits|
  puts "#{digits} digits follow #{letters}"
end

# Output:
# 123 digits follow abc
# 456 digits follow def

Compared to Other Methods

scan vs split:

"one,two,three".split(",")
# => ["one", "two", "three"]

"one,two,three".scan(/[^,]+/)
# => ["one", "two", "three"]
# split is cleaner for simple delimiter-based splitting

scan vs match:

"abc123".match(/\d+/)
# => #<MatchData "123"> (first match only)

"abc123xyz456".scan(/\d+/)
# => ["123", "456"] (all matches)

scan vs gsub:

# gsub returns the transformed string
"hello".gsub(/[aeiou]/, "*")
# => "h*ll*"

# scan returns the matched values
"hello".scan(/[aeiou]/)
# => ["e", "o"]

See Also