Lazy Enumerators for Large Datasets
When working with large datasets in Ruby, loading everything into memory can become a bottleneck. Ruby provides Enumerator::Lazy to solve this problem ; it lets you process data piece by piece, stopping as soon as you have what you need.
What are lazy enumerators?
A lazy enumerator doesn’t process elements until you explicitly request them. Unlike regular enumerators that compute all values upfront, lazy enumerators build a pipeline of operations and execute them only when needed.
# Eager evaluation - processes everything immediately
(1..Float::INFINITY).map { |i| i * 2 }.first(5)
# => [2, 4, 6, 8, 10]
# Problem: Would hang forever trying to build infinite array
# Lazy evaluation - processes only what you ask for
(1..Float::INFINITY).lazy.map { |i| i * 2 }.first(5)
# => [2, 4, 6, 8, 10]
# Works fine - only computes the 5 values needed
The key difference: .lazy returns an Enumerator::Lazy instead of an Enumerator or Array. That means the pipeline can keep a little state, but it does not force Ruby to build every intermediate result up front.
Key takeaways
- Lazy enumerators delay work until you ask for results, which helps when the source might be huge or infinite
- The biggest win is stopping early, not making every pipeline faster
- They are a poor fit for tiny arrays where eager evaluation is simpler and quicker
- Lazy pipelines are easiest to reason about when you keep each step small and readable
A lazy pipeline is best understood as a promise. You say how the data should be transformed, but Ruby waits until you request a value before doing the work. That gives you control over memory use and makes it possible to combine filtering, mapping, and limiting in a way that would be awkward with eager arrays.
That flexibility is useful in scripts, background jobs, and data processing tasks. It is less useful when the source is already tiny or when every item must be processed anyway. In those cases, lazy evaluation adds ceremony without saving much time.
Lazy enumerators also make it easier to express an early-exit workflow. Instead of saying, “build everything, then throw away most of it,” you describe the shape of the data and then stop as soon as the answer is good enough. That is a nice fit for search results, log scanning, and any other task where the first few matches matter more than the whole collection.
Building a lazy pipeline
The simplest way is calling .lazy on any enumerable. This keeps the code close to the eager version, which makes the lazy version easier to teach and easier to review later:
# From a range
(1..1000).lazy
# From an array
[1, 2, 3].lazy
# From a file (common use case)
File.open('large_file.txt').each_line.lazy
You can also use Enumerator::Lazy.new for custom lazy enumerators. That is useful when you want to wrap a custom data source or add a small transformation layer before the caller starts consuming values. The constructor takes an existing enumerable and a block that acts as a lazy combinator, giving you control over how each value gets yielded down the pipeline:
def filter_map(sequence)
Enumerator::Lazy.new(sequence) do |yielder, *values|
result = yield(*values)
yielder << result if result
end
end
filter_map(1..10) { |i| i * 2 if i.even? }.first(3)
# => [4, 8, 12]
Lazy Methods
All these Enumerable methods work lazily on a lazy enumerator:
| Method | Description |
|---|---|
.map | Transforms each element |
.select / .filter | Filters elements by condition |
.filter_map | Filter and transform in one pass |
.flat_map / .collect_concat | Flattens nested results |
.take(n) | Takes first n elements |
.drop(n) | Skips first n elements |
.take_while | Takes elements until condition fails |
.drop_while | Drops elements until condition fails |
.grep(pattern) | Filters matching a pattern |
.chunk | Groups consecutive elements |
Forcing Evaluation
Lazy enumerators don’t execute until you force them. Use these methods to trigger evaluation:
lazy_enum = (1..).lazy.map { |i| i * 2 }
# force - returns an array of all elements
lazy_enum.force
# Warning: Will hang on infinite sequences!
# first(n) - returns first n elements (most common)
(1..).lazy.map { |i| i * 2 }.first(10)
# => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
# to_a - converts to array (same as force)
lazy_enum.take(5).to_a
# each - evaluates and iterates
lazy_enum.take(5).each { |i| puts i }
Each of these forcing methods serves a different purpose. first(n) is the most common choice because it returns a plain array of the first n results without evaluating the rest of the pipeline. force and to_a evaluate everything, which only makes sense when the collection is finite. each lets you iterate over a limited window with take.
Converting back to eager
Sometimes you need a regular enumerator again, for example when a later API expects an ordinary enumerator or array pipeline. Use .eager:
lazy_enum = (1..100).lazy.map { |i| i * 2 }
eager_enum = lazy_enum.eager
# Returns Enumerator, not Enumerator::Lazy
Practical example: processing a large file
Lazy enumerators shine when processing files too big to fit in memory. They also help when the data source is external, because you can stop once you have enough rows instead of reading the whole file or response body:
# Read file line by line, find first 10 matches
File.open('server.log', 'r')
.each_line
.lazy
.grep(/ERROR/)
.take(10)
.each { |line| puts line }
# Process CSV without loading entire file
require 'csv'
CSV.foreach('massive.csv', lazy: true)
.select { |row| row[3].to_i > 1000 }
.map { |row| [row[0], row[3]] }
.take(100)
.to_a
That pattern is often the sweet spot for lazy enumerators: read a stream, narrow it down, and stop as soon as you have the rows you want. If you needed all rows anyway, eager code would probably be easier to read. If you only need a slice of a huge file, lazy evaluation keeps the memory profile modest.
Common gotchas
1. Lazy does not mean “no computation”
Lazy still executes blocks for each element ; it just delays when:
# This still calls the block 10 times, not 5
(1..).lazy.select { |i| i.even? }.take(5).each { |i| puts i }
# Output: 2, 4, 6, 8, 10
2. Performance overhead
Enumerator::Lazy has significant overhead ; it’s often 2-4x slower than eager evaluation for small datasets. Only use lazy when dealing with:
- Infinite sequences
- Large files
- Remote API streams
- Cases where you need only the first few results
The overhead exists because each lazy element must pass through every stage of the pipeline, checking whether the caller actually wants the next value. For a large dataset where you only take the first few results, that bookkeeping is worth the trade. For a tiny array, the bookkeeping is just noise on top of work that was already cheap:
# Don't use lazy for small arrays - just use regular map
[1, 2, 3].map { |i| i * 2 } # Fast
# Use lazy for large/infinite data
(1..1_000_000).lazy.map { |i| i * 2 }.first(5) # Efficient
3. Forgetting to force
A lazy enumerator that never gets forced just builds a chain of promises:
result = (1..).lazy.map { |i| puts "computing #{i}"; i * 2 }
# Nothing printed yet - no evaluation happened
result.first(3)
# Now it evaluates: computing 1, computing 2, computing 3
That silence can be confusing when you are first learning the pattern. The pipeline is ready but nothing runs until a terminal method arrives. That same property, though, is what makes lazy enumerators composable: you can pass a half-built pipeline to another method and let the final consumer decide when to pull values.
4. Infinite loops
Without limiting results, you can hang your program:
# Bad - will run forever
(1..).lazy.map { |i| i * 2 }.each { |i| puts i }
# Good - limits results
(1..).lazy.map { |i| i * 2 }.first(10).each { |i| puts i }
Performance tip: filter_map
Ruby 2.7+ provides filter_map which combines filter and map in a single pass:
# Two passes (less efficient)
(1..10).lazy.select { |i| i.even? }.map { |i| i * 2 }.force
# One pass (more efficient)
(1..10).lazy.filter_map { |i| i * 2 if i.even? }.force
# => [4, 8, 12, 16, 20]
By combining filter and map in a single pass, filter_map reduces the number of block invocations and avoids building an intermediate array. That saving matters more with lazy enumerators than with eager ones because each avoided pass means fewer elements are pulled through the pipeline.
When to reach for lazy evaluation
Use lazy when:
- Processing files too large for memory
- Working with infinite sequences
- Only needing the first N results from a large dataset
- Streaming data from external sources
Avoid lazy when:
- Working with small, fixed-size data
- You need all results anyway
- Performance is critical for small datasets
Frequently asked questions
When should I choose lazy enumerators over regular arrays?
Choose lazy enumerators when you only need part of the result set, when the input might be huge, or when the source could be infinite. Regular arrays are still better when the dataset is small and you know you need every result.
Do lazy enumerators save memory automatically?
They usually save memory because Ruby processes one element at a time instead of materializing the whole chain. The actual gain depends on the source and on whether your block creates large intermediate objects.
What is the most common mistake?
The most common mistake is using lazy evaluation everywhere by default. That makes simple code harder to read and can slow down small workloads. Treat lazy enumerators as a tool for the cases where they clearly help.
Summary
Lazy enumerators help Ruby process data step by step, which is especially useful for large files, streaming inputs, and infinite sequences. They are not a universal optimization, but they can make the difference between a pipeline that stops early and one that tries to build more data than memory can comfortably hold.
The practical rule is simple: use lazy enumerators when you need to defer work, limit a large stream, or stop after a few results. Use ordinary eager methods when the dataset is small and clarity matters more than saving a few allocations.
See Also
- Enumerable#reduce — Combining elements into a single value
- Enumerable#each_cons — Iterating consecutive elements
- Enumerable#chunk — Grouping consecutive elements