rubyguides

String manipulation in Ruby: create, index, and format text

Strings are one of the most used data types in Ruby. Whether you are processing user input, building formatted output, or sanitizing data, string manipulation in Ruby is something you will do constantly. This guide covers the essential techniques for creating, accessing, modifying, and formatting strings in Ruby, with each section building on the one before it.

TL;DR

  • Create strings with double quotes for interpolation and single quotes for literal text.
  • Index with str[0] and slice with str[0, length], str[range], or regex patterns.
  • Use gsub to replace all occurrences and sub to replace only the first.
  • Clean edges with strip, lstrip, rstrip, and chomp.
  • Convert case with upcase, downcase, capitalize, and swapcase.
  • Split strings into arrays with split and join arrays back with join.

Intro context

String work is everywhere in Ruby because nearly every boundary between your program and the outside world passes through text. HTTP responses, CSV rows, command-line arguments, log lines, and configuration files all arrive as strings first. That makes string handling feel simple at the surface, but in practice it is one of the places where small mistakes turn into bugs quickly.

It helps to think of strings as the raw material you shape before the rest of your code can use it. Sometimes you only need to display the text as-is. Other times you need to trim it, split it, compare it, or convert it into numbers. Keeping that intent clear makes the methods in this guide easier to choose and easier to remember.

The same workflow also connects to Ruby arrays because split text usually becomes a list, and to file I/O in Ruby because text files are usually read into strings before you process them. Once you see strings as the starting point for those other tasks, the API feels much more natural.

How do I create strings in Ruby?

Ruby offers several ways to create strings, and each one has a specific trade-off between convenience and control:

# Double quotes, allows interpolation
name = "Alice"
greeting = "Hello, #{name}!"  # => "Hello, Alice!"

# Single quotes, literal, no interpolation
path = 'C:\Users\Alice'       # => "C:\\Users\\Alice"

# %q and %Q syntax
message = %Q(Multi-line string
with "quotes" and #{name})
heredoc = <<~TEXT
  Multi-line string
  without escaping
TEXT

# String.new (rarely needed)
empty = String.new

The #{} interpolation syntax inside double quotes lets you embed Ruby expressions directly into a string. Single quotes treat everything literally, which is useful for paths on Windows or strings containing #{ that should not be evaluated.

When you choose a string literal, you are also choosing how much Ruby should help you. Double quotes are useful when the text should be assembled from variables and expressions. Single quotes are better when the contents should stay exactly as typed, such as file paths or literal format strings. That small decision affects everything from display messages to file access.

How do I index and slice strings?

Ruby strings are sequences of characters, and each character has a numeric index. Understanding indexing is the first step toward extracting specific parts of a string without splitting the whole thing:

str = "hello"

str[0]   # => "h"
str[-1]  # => "o"  (negative indices count from the end)
str[1]   # => "e"
str[10]  # => nil  (out of bounds returns nil)

Ruby 3.0 introduced a significant change that affects how indexing works: str[0] now returns a single-character String instead of an Integer character code. This aligns with the principle that str[i] and str[i, 1] should behave consistently, making both return values chainable with other string methods:

# Ruby 3.0+
str = "hello"
str[0]     # => "h"  (was 104 in Ruby 2.x)
str[0, 1]  # => "h"  (substring of length 1)
str[0..2]  # => "hel"

That change matters when you are upgrading older code or reading examples written for Ruby 2.x. If you remember strings returning character codes in the past, the newer behavior is easier to work with because it stays consistent with other substring operations. The return type is always a string, which means you can chain additional string methods directly after the index without converting.

Slicing with []

The [] method supports several argument types for slicing, making it one of the most flexible accessors on Ruby strings:

str = "hello world"

str[6, 5]      # => "world"  (start index, length)
str[6..10]     # => "world"  (inclusive range)
str[6...11]    # => "world"  (exclusive range)
str["world"]   # => "world"  (substring search, returns nil if not found)
str[/wo+/]     # => "wo"     (regex match)

Ruby also provides slice and slice! for callers that prefer a named method over bracket notation. The key difference between them is mutability: slice returns a new string without touching the original, while slice! removes the extracted portion from the receiver and returns it. This makes slice! the right choice when you are consuming a string character by character in a parser or tokenizer:

str = "hello"
str.slice(0)      # => "h"   (returns a new string)
str               # => "hello"  (original unchanged)
str.slice!(0)     # => "h"
str               # => "ello"   (original mutated)

These access methods are most useful when you know the position you want, such as the first character, a prefix, or a short fixed-length segment. When the slice itself matters more than the rest of the string, they are often clearer than splitting the string into an array first. The destructive slice! removes the extracted portion from the original string, which is convenient for parsing tasks that consume input character by character.

How do I substitute text with gsub and sub?

Ruby gives you two main tools for find-and-replace operations on strings. gsub handles global substitution while sub limits itself to the first match.

gsub, global substitution

gsub replaces all occurrences of a pattern:

text = "foo bar foo baz foo"

text.gsub("foo", "qux")       # => "qux bar qux baz qux"
text.gsub("foo", "qux").object_id == text.object_id  # => false (returns new string)

That return value is easy to miss, but it is important. gsub is non-destructive, so if you need to keep the original string around, assign the result to a new variable. In cleanup code that distinction keeps the original input available for logging, debugging, or later comparison without cluttering your variable namespace.

You can also use a hash to map multiple replacements at once:

replacements = { "foo" => "qux", "bar" => "quux" }
text.gsub(/\w+/, replacements)  # => "qux quux qux baz qux"

With a block, gsub passes each matched string to your code and substitutes the block’s return value. This is powerful when the replacement is not a fixed string but depends on the matched content itself. For example, capitalizing every word in a sentence:

"hello world".gsub(/\w+/) { |word| word.capitalize }
# => "Hello World"

gsub is the method you want when the replacement should happen everywhere in the string. If you are normalizing user input, collapsing repeated spacing, or rewriting punctuation across an entire document, gsub handles the full sweep. The block form is particularly expressive because it lets you compute the replacement dynamically from the match itself.

sub, single substitution

sub replaces only the first occurrence:

"foo bar foo".sub("foo", "baz")  # => "baz bar foo"

Choose sub when only the first match should change, for example when you are normalizing a prefix or replacing a leading label. Using the narrower method keeps the code honest about intent and avoids accidental replacements in the rest of the text that might share the same pattern.

Destructive variants

Both gsub! and sub! modify the string in place and return the number of substitutions made, or nil if nothing was replaced:

str = "hello world"
str.gsub!("world", "ruby")
str  # => "hello ruby"

str = "foo bar foo"
str.sub!("foo", "baz")  # => "baz bar foo"
str  # => "baz bar foo"

The destructive versions are useful in tight loops or in code that deliberately mutates a buffer, but they also deserve more caution. If you are not sure whether a string is shared elsewhere, the non-destructive methods are usually safer. The nil return on no-match is a gotcha to watch for; it means you cannot safely chain a method call after sub! or gsub! without a guard.

How do I handle whitespace with chomp and strip?

Cleaning up whitespace at the edges of strings is one of the most common string manipulation tasks in Ruby. Input from forms, files, and command-line tools almost always carries extra whitespace that needs removal before processing.

chomp

Removes trailing record separators (usually newlines) from a string:

"hello\n".chomp        # => "hello"
"hello\r\n".chomp     # => "hello"
"hello".chomp         # => "hello"  (no trailing newline, no change)

chomp removes the default record separator, but you can pass a custom string argument to remove a specific suffix instead. This is useful when cleaning up punctuation, stripping known file extensions, or removing a trailing delimiter from a data format:

"hello!".chomp("!")   # => "hello"

The destructive variant chomp! modifies in place. chomp is often the right choice when you are reading lines from a file or STDIN. It removes the trailing separator without stripping meaningful spaces from the middle of the string, which makes it better suited to text that should stay otherwise untouched.

strip

Removes leading and trailing whitespace:

"  hello  ".strip          # => "hello"
"\thello\r\n".strip        # => "hello"

strip removes whitespace from both sides at once. When you only need to clean one side, lstrip handles leading whitespace and rstrip handles trailing whitespace. These directional methods are useful when one side of the string has meaningful content that should not be touched, such as indentation you want to preserve or a trailing marker:

"  hello  ".lstrip   # => "hello  "
"  hello  ".rstrip   # => "  hello"

These methods are small, but they show up constantly in input cleanup. If a form field or command-line argument has accidental leading or trailing spaces, trimming early prevents bugs later when comparisons or lookups fail. The directional variants lstrip and rstrip are helpful when only one side needs cleanup, like removing a variable-width prefix but preserving meaningful trailing content.

How do I convert string case?

Ruby provides four case-conversion methods, each serving a different purpose in output formatting and input normalization:

"Hello World".upcase    # => "HELLO WORLD"
"HELLO world".downcase  # => "hello world"
"hello world".capitalize  # => "Hello world"  (first char up, rest down)
"Hello World".swapcase    # => "hELLO wORLD"

These methods return new strings by default and leave the original object unchanged. When you want to modify the original string in place instead, each method has a destructive bang variant that mutates the receiver directly and returns the modified string:

str = "Hello"
str.upcase!
str  # => "HELLO"

Case conversion is usually part of a larger workflow rather than the final step. You might downcase a username before comparing it, capitalize a title before displaying it, or swap case as a debugging trick to prove that you are looking at the right string. Each method is small, but choosing the right one keeps the surrounding code honest about what it is doing to the text.

Splitting and joining

String manipulation in Ruby frequently moves between text and arrays. You split a string into pieces, transform each piece, and then join them back together.

split

Breaks a string into an array of substrings:

"apple,banana,cherry".split(",")
# => ["apple", "banana", "cherry"]

"one   two  three".split(/\s+/)
# => ["one", "two", "three"]

"hello".split("")
# => ["h", "e", "l", "l", "o"]

Splitting is one of the most common ways to turn text into structure. Once the string becomes an array, you can map over it, count it, filter it, or join it back together in a different shape. That makes split a bridge between plain text and data you can process step by step.

You can limit the number of splits with a second argument, which is useful when the tail of the string should stay intact:

"a,b,c,d".split(",", 2)
# => ["a", "b,c,d"]

Once you have split a string into an array and processed the pieces, you often need to assemble them back into a single string. That is where join comes in. It takes an array and a delimiter, inserting the delimiter between elements but never at the start or end of the result:

["apple", "banana", "cherry"].join(",")
# => "apple,banana,cherry"

["hello", "world"].join(" ")
# => "hello world"

["a", "b", "c"].join
# => "abc"

Joining is the matching step that turns processed parts back into a readable string. It is easy to think of split and join as opposites, but in practice they are both part of the same text-processing loop. You split, transform, and then join when the output should be human-readable again.

A common idiom is split followed by map and join to transform every element:

"one two three".split.map(&:capitalize).join(", ")
# => "One, Two, Three"

This pattern appears in formatting code, URL construction, and any place where you need to convert a delimited string into a different delimited format. The chain reads left to right: break apart, transform each piece, reassemble with a new separator.

How do I use string interpolation?

Interpolation lets you embed Ruby expressions directly inside a string. It only works with double quotes:

name = "Alice"
age = 30

"#{name} is #{age} years old"
# => "Alice is 30 years old"

'#{name} is #{age} years old'
# => "\#{name} is \#{age} years old"  (literal, no interpolation)

Interpolation is not limited to simple variable substitution. You can call methods on objects, perform arithmetic, or embed any Ruby expression inside the #{} braces. The expression is evaluated, converted to a string with to_s, and inserted at that position:

"I have #{[1, 2, 3].length} items"
# => "I have 3 items"

"Sum: #{1 + 2 + 3}"
# => "Sum: 6"

Interpolation is convenient, but it can also hide logic inside what looks like display code. If the expression inside #{} becomes too long, consider pulling it into a variable first. That keeps the string readable and makes the calculation easier to test on its own.

For cleaner output with numbers, Kernel#format or String#% is often preferable to inline arithmetic:

price = 19.99
format("Price: $%.2f", price)
# => "Price: $19.99"

"Price: $#{'%.2f' % price}"
# => "Price: $19.99"

Formatted output becomes easier to maintain when the template is obvious. Instead of mixing arithmetic into the string itself, let the formatting method handle presentation and keep the data preparation separate.

Encoding and string manipulation

Ruby strings carry an encoding. Misaligned encodings are a common source of errors when working with non-ASCII text or file I/O. Strings that travel between systems can pick up encoding problems quickly, especially when files, databases, and APIs do not agree on the same character set.

Checking and setting encoding

str = "café"
str.encoding          # => #<Encoding:UTF-8>

str.encode("ISO-8859-1")
# => "caf\xe9"  (in ISO-8859-1 encoding)

encode performs a full transcoding from the current encoding to the target. This is the safe path when you need to pass a string to a system that expects a specific encoding. It raises an error if the string contains characters that cannot be represented in the target encoding.

Force encoding

raw = "café".b  # .b forces binary (ASCII-8BIT) encoding
raw.encoding    # => #<Encoding:ASCII-8BIT>

force_encoding changes the encoding tag on the string without modifying the underlying bytes. It is lightweight but only safe when you already know the bytes are valid in the target encoding. If the encoding assumption is wrong, the string may contain invalid byte sequences that cause errors later.

A common place encoding problems surface is during file I/O. Reading a file without specifying the encoding can trigger Encoding::CompatibilityError when Ruby encounters bytes that do not match the default external encoding:

# Ruby 3.0+
File.read("data.txt", encoding: "UTF-8")
# or
File.read("data.txt", encoding: "r:UTF-8")  # transcode from UTF-8

When you know your application will primarily work with UTF-8 text, setting the default external encoding at startup is a one-line fix that reliably prevents surprises across every file read, socket receive, and string operation downstream in your program:

Encoding.default_external = Encoding::UTF_8

Even with the correct encoding set, incoming data can contain invalid byte sequences from corrupted transfers or mixed-encoding sources. The valid_encoding? method lets you check a string before passing it through encoding-sensitive operations, giving you a chance to fix, scrub, or reject the input gracefully. This check is cheap and is worth running on any string that crossed a network boundary:

"café".valid_encoding?   # => true
"\xff".valid_encoding?  # => false (invalid UTF-8 byte)

Frequently asked questions about string manipulation

What is the difference between single and double quotes in Ruby?

Double quotes support string interpolation with #{} and escape sequences like \n. Single quotes treat everything literally, which makes them suitable for file paths, regex patterns, and any string where the $ or \ characters should stay as typed.

When should I use gsub instead of sub?

Use gsub when you want to replace every occurrence of a pattern in the string. Use sub when only the first match should change. The choice communicates intent: gsub says “clean the whole thing” while sub says “fix the first one only.”

How do I handle encoding errors during string manipulation?

Check the encoding with .encoding first, then decide whether to transcode with .encode or simply tag the bytes with .force_encoding. Always validate with .valid_encoding? before passing a string to encoding-sensitive code, and set Encoding.default_external early in your program to avoid surprises with file I/O.

See Also