Lazy Enumerators for Large Datasets
When working with large datasets in Ruby, loading everything into memory can become a bottleneck. Ruby provides Enumerator::Lazy to solve this problem — it lets you process data piece by piece, stopping as soon as you have what you need.
What Are Lazy Enumerators?
A lazy enumerator doesn’t process elements until you explicitly request them. Unlike regular enumerators that compute all values upfront, lazy enumerators build a pipeline of operations and execute them only when needed.
# Eager evaluation - processes everything immediately
(1..Float::INFINITY).map { |i| i * 2 }.first(5)
# => [2, 4, 6, 8, 10]
# Problem: Would hang forever trying to build infinite array
# Lazy evaluation - processes only what you ask for
(1..Float::INFINITY).lazy.map { |i| i * 2 }.first(5)
# => [2, 4, 6, 8, 10]
# Works fine - only computes the 5 values needed
The key difference: .lazy returns an Enumerator::Lazy instead of an Enumerator or Array.
Creating Lazy Enumerators
The simplest way is calling .lazy on any enumerable:
# From a range
(1..1000).lazy
# From an array
[1, 2, 3].lazy
# From a file (common use case)
File.open('large_file.txt').each_line.lazy
You can also use Enumerator::Lazy.new for custom lazy enumerators:
def filter_map(sequence)
Enumerator::Lazy.new(sequence) do |yielder, *values|
result = yield(*values)
yielder << result if result
end
end
filter_map(1..10) { |i| i * 2 if i.even? }.first(3)
# => [4, 8, 12]
Lazy Methods
All these Enumerable methods work lazily on a lazy enumerator:
| Method | Description |
|---|---|
.map | Transforms each element |
.select / .filter | Filters elements by condition |
.filter_map | Filter and transform in one pass |
.flat_map / .collect_concat | Flattens nested results |
.take(n) | Takes first n elements |
.drop(n) | Skips first n elements |
.take_while | Takes elements until condition fails |
.drop_while | Drops elements until condition fails |
.grep(pattern) | Filters matching a pattern |
.chunk | Groups consecutive elements |
Forcing Evaluation
Lazy enumerators don’t execute until you force them. Use these methods to trigger evaluation:
lazy_enum = (1..).lazy.map { |i| i * 2 }
# force - returns an array of all elements
lazy_enum.force
# Warning: Will hang on infinite sequences!
# first(n) - returns first n elements (most common)
(1..).lazy.map { |i| i * 2 }.first(10)
# => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
# to_a - converts to array (same as force)
lazy_enum.take(5).to_a
# each - evaluates and iterates
lazy_enum.take(5).each { |i| puts i }
Converting Back to Eager
Sometimes you need a regular enumerator. Use .eager:
lazy_enum = (1..100).lazy.map { |i| i * 2 }
eager_enum = lazy_enum.eager
# Returns Enumerator, not Enumerator::Lazy
Practical Example: Processing a Large File
Lazy enumerators shine when processing files too big to fit in memory:
# Read file line by line, find first 10 matches
File.open('server.log', 'r')
.each_line
.lazy
.grep(/ERROR/)
.take(10)
.each { |line| puts line }
# Process CSV without loading entire file
require 'csv'
CSV.foreach('massive.csv', lazy: true)
.select { |row| row[3].to_i > 1000 }
.map { |row| [row[0], row[3]] }
.take(100)
.to_a
Common Gotchas
1. Lazy Doesn’t Mean “No Computation”
Lazy still executes blocks for each element — it just delays when:
# This still calls the block 10 times, not 5
(1..).lazy.select { |i| i.even? }.take(5).each { |i| puts i }
# Output: 2, 4, 6, 8, 10
2. Performance Overhead
Enumerator::Lazy has significant overhead — it’s often 2-4x slower than eager evaluation for small datasets. Only use lazy when dealing with:
- Infinite sequences
- Large files
- Remote API streams
- Cases where you need only the first few results
# Don't use lazy for small arrays - just use regular map
[1, 2, 3].map { |i| i * 2 } # Fast
# Use lazy for large/infinite data
(1..1_000_000).lazy.map { |i| i * 2 }.first(5) # Efficient
3. Forgetting to Force
A lazy enumerator that never gets forced just builds a chain of promises:
result = (1..).lazy.map { |i| puts "computing #{i}"; i * 2 }
# Nothing printed yet - no evaluation happened
result.first(3)
# Now it evaluates: computing 1, computing 2, computing 3
4. Infinite Loops
Without limiting results, you can hang your program:
# Bad - will run forever
(1..).lazy.map { |i| i * 2 }.each { |i| puts i }
# Good - limits results
(1..).lazy.map { |i| i * 2 }.first(10).each { |i| puts i }
Performance Tip: filter_map
Ruby 2.7+ provides filter_map which combines filter and map in a single pass:
# Two passes (less efficient)
(1..10).lazy.select { |i| i.even? }.map { |i| i * 2 }.force
# One pass (more efficient)
(1..10).lazy.filter_map { |i| i * 2 if i.even? }.force
# => [4, 8, 12, 16, 20]
When to Use Lazy Enumerators
Use lazy when:
- Processing files too large for memory
- Working with infinite sequences
- Only needing the first N results from a large dataset
- Streaming data from external sources
Avoid lazy when:
- Working with small, fixed-size data
- You need all results anyway
- Performance is critical for small datasets
See Also
- Enumerable#reduce — Combining elements into a single value
- Enumerable#each_cons — Iterating consecutive elements
- Enumerable#chunk — Grouping consecutive elements