String#encoding

Updated May 30, 2026· String Methods

rubystringencodingstring-methods

Basic Inspection

Every Ruby text value carries an encoding with it. The encoding method tells you which one:

str = "Hello"
str.encoding
# => #<Encoding:UTF-8>

Literals created in source files inherit the script’s encoding, which is UTF-8 by default in Ruby 2.0 and later. The encoding method returns an Encoding object that describes how Ruby interprets the bytes, which affects comparison, concatenation, and I/O operations. Checking the encoding is a useful first step when debugging text that doesn’t display as expected.

String literals in your source code inherit the script encoding, which defaults to UTF-8 in Ruby 2.0+. A string you create with String.new gets ASCII-8BIT encoding by default:

empty = String.new
empty.encoding
# => #<Encoding:ASCII-8BIT>

The encoding affects how Ruby interprets the string’s bytes, determining whether comparison and concatenation operations succeed or fail. Two values with different encodings may look identical but are not equal because the underlying byte representations and encoding labels differ:

a = "é"
b = "é".force_encoding("ISO-8859-1")

a == b
# => false
a.encoding
# => #<Encoding:UTF-8>
b.encoding
# => #<Encoding:ISO-8859-1>

Even though both strings display as “é”, the differing encoding labels cause Ruby to treat them as incompatible. The bytes themselves differ too: UTF-8 uses two bytes for é while ISO-8859-1 uses one. This explains why encoding-aware operations like comparison and concatenation can fail even when the visual output matches.

Transcoding with `encode`

The encode method converts a string’s bytes to a different encoding:

utf8_string = "Résumé"
utf8_string.encoding
# => #<Encoding:UTF-8>

iso_string = utf8_string.encode("ISO-8859-1")
iso_string.encoding
# => #<Encoding:ISO-8859-1>

encode actually transcodes the bytes. UTF-8 uses multiple bytes for non-ASCII characters, while ISO-8859-1 uses one byte per character. The transcoding process converts the internal representation byte by byte. This matters when exporting data to systems that require a specific encoding, such as a legacy database, a fixed-width file format, or an API that only accepts ASCII. Transcoding early in the data pipeline avoids encoding errors later.

Handling invalid or undefined characters

When transcoding fails because a character can’t be represented in the target encoding, encode raises an exception by default. You can control this behavior with keyword arguments:

# Replace characters that can't be encoded
str = "R\u00E9sum\u00E9"  # "Résumé" as UTF-8 codepoints
str.encode("ASCII", invalid: :replace, undef: :replace, replace: "?")
# => "R?sum?"

The available options are:

invalid: — what to do with byte sequences that are invalid in the source encoding (:replace substitutes a replacement character)
undef: — what to do with characters undefined in the target encoding (:replace substitutes a replacement character)
replace: — the replacement string to use (defaults to "?")

Without these options, invalid sequences raise Encoding::UndefinedConversionError or Encoding::InvalidByteSequenceError. Choosing the right option depends on whether you can tolerate data loss or need strict validation. For user-facing text, replacement is usually acceptable; for data integrity, raising an error is safer.

Re-labeling with `force_encoding`

force_encoding does something different. It changes the encoding label without touching the bytes:

raw = "\xC0\xC1".force_encoding("UTF-8")
raw.encoding
# => #<Encoding:UTF-8>

The bytes stay the same. Ruby just starts interpreting them as UTF-8. This is useful when you know the actual encoding of some data but Ruby misidentified it. The key is that no conversion happens during re-labeling, so you are responsible for ensuring the bytes are valid for the new encoding you assign.

The danger is that force_encoding can create invalid strings:

raw = "\xFF".force_encoding("UTF-8")
raw.valid_encoding?
# => false

An invalid encoding label on a string can cause confusing errors far from the point where the label was applied. Before passing a force-encoded string to other methods, it is prudent to verify that its bytes match the encoding you just assigned to it.

Checking Validity

Use valid_encoding? to check whether a string’s bytes are valid for its encoding:

valid = "hello"
valid.valid_encoding?
# => true

invalid = "\xFF".force_encoding("UTF-8")
invalid.valid_encoding?
# => false

This matters when you read data from external sources like files, databases, or network sockets. Always validate before processing to prevent encoding errors from propagating through your application. A quick check at the point of entry saves time debugging issues later in the data pipeline.

Encoding Compatibility

When you try to concatenate strings with incompatible encodings, Ruby raises an error:

a = "\u00A9".force_encoding("ASCII-8BIT")
b = "hello"

a + b
# => Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8

Ruby considers encodings compatible when they can be safely combined without data loss or misinterpretation. ASCII-8BIT (binary data) is often incompatible with text encodings because it can contain byte values that are invalid in those encodings. Checking compatibility before concatenation helps avoid runtime errors.

Use Encoding.compatible? to check before concatenating:

a = "\xC0".force_encoding("ASCII-8BIT")
b = "hello"

Encoding.compatible?(a, b)
# => nil (not compatible)

Common encoding constants

Ruby provides named constants for common encodings, which are convenient for specifying encoding arguments without remembering the exact string names. These constants cover the most frequently used text and binary encodings and are available anywhere in your Ruby program:

Encoding::UTF_8
Encoding::ASCII_8BIT
Encoding::ISO_8859_1
Encoding::Windows_1252

You can also look up encodings by name, which is useful when working with encoding names stored in configuration or received from external systems where a string is the natural input format. The Encoding.find method accepts a string and returns the matching constant:

Encoding.find("UTF-8")
# => #<Encoding:UTF-8>