force_encoding

str.force_encoding(encoding)

Returns String (self)· Added in v1.9.0· Updated May 29, 2026· String Methods

rubystringencodingstring-methodsforce-encoding

Syntax

str.force_encoding(encoding)

encoding can be a string or a symbol naming the encoding. Returns self.

What it does

String#force_encoding assigns an encoding label to a string without modifying its underlying byte sequence. The bytes stay exactly as they are — only Ruby’s interpretation of those bytes changes.

Ruby stores strings as a sequence of bytes tagged with an encoding. That tag tells Ruby how to read the bytes as characters. When data comes from an external source (file, network, database), Ruby’s automatic detection sometimes guesses wrong. force_encoding lets you correct the label after the fact.

Understanding Ruby’s default encoding settings

Before reaching for force_encoding, understand Ruby’s global encoding settings — they affect every string your program handles.

Encoding.default_external sets the default encoding for all new strings created from external sources (file I/O, network, etc.). Encoding.default_internal is a fallback encoding Ruby uses internally when converting between encodings, though it defaults to nil in modern Ruby.

Encoding.default_external
# => #<Encoding:UTF-8>

Encoding.default_external = "ISO-8859-1"
Encoding.default_external
# => #<Encoding:ISO-8859-1>

Set these values at the top of your script before any file or network I/O happens, because Ruby uses default_external to tag every incoming string. You can either use a magic comment at the top of the file or set the encoding programmatically before opening any handles:

# encoding: utf-8
# Or programmatically:
Encoding.default_external = "UTF-8"

A common pitfall: if default_external is set incorrectly (e.g., to ASCII-8BIT), Ruby’s I/O layer may tag all incoming strings as that encoding, causing silent corruption of multi-byte characters. Use force_encoding to correct individual strings, or fix default_external to prevent the problem at source.

Examples

Relabeling raw bytes as UTF-8

raw = "\xC0\xC1"
raw.encoding
# => #<Encoding:ASCII-8BIT>

raw.force_encoding("UTF-8")
# => "\xC0\xC1"
raw.encoding
# => #<Encoding:UTF-8>

The example above relabels raw ASCII-8BIT bytes as UTF-8, changing only the encoding tag while the bytes stay identical. The method accepts either a string or a symbol for the encoding name, giving you flexibility in how you call it:

Accepts symbol or string

"hello".force_encoding("UTF-8")
# => "hello"
"hello".force_encoding(:UTF_8)
# => "hello"

Both "UTF-8" and :UTF_8 are valid arguments — the method normalises them internally and treats both forms identically. Because force_encoding returns self, you can chain it with other string methods to inspect the result without storing an intermediate variable:

Chainable — returns self

str = "\xE2\x80\x93".force_encoding("UTF-8")
str.valid_encoding?
# => true

Chaining valid_encoding? immediately confirms whether the relabeled bytes form valid characters in the target encoding — a quick sanity check that prevents silent corruption downstream. To examine the encoding object itself for its name, dummy status, or ASCII compatibility, call .encoding on the string:

Inspecting the encoding object

s = "café"
s.force_encoding("ISO-8859-1").encoding
# => #<Encoding:ISO-8859-1>

force_encoding vs. encode

These two methods look similar but behave very differently. Knowing when to relabel and when to transcode is one of the most important distinctions in Ruby’s encoding system.

encode transcodes — it converts the actual byte content to match the target encoding:

utf8 = " résumé"
iso = utf8.encode("ISO-8859-1")
iso.bytes
# => [114, 101, 115, 117, 109, 233]

Transcoding with encode creates a new string whose bytes differ from the original — the new bytes represent the same characters in the target encoding. The in-place variant encode! modifies the original string instead of allocating a new one, reducing object allocation when processing many strings:

utf8 = " résumé"
utf8.encode!("ISO-8859-1")
utf8.bytes
# => [114, 101, 115, 117, 109, 233]

Both encode and encode! actually transform the byte sequence itself by converting characters between encodings. In contrast, force_encoding changes only the label — the bytes remain untouched and are simply relabeled as a different encoding. Inspect the byte values to see the difference clearly:

utf8 = " résumé"
iso = utf8.force_encoding("ISO-8859-1")
iso.bytes
# => [114, 101, 115, 117, 109, 195, 169]  # bytes unchanged — just relabeled
iso.valid_encoding?
# => false  # ISO-8859-1 cannot represent bytes 195, 169 as a single character

force_encoding is re-labeling — only the tag changes. encode is conversion — the bytes themselves change. Keeping this distinction clear is the key to avoiding encoding bugs in Ruby.

Gotchas and edge cases

Once you understand the relabel-versus-transcode distinction, the practical pitfalls become clearer. The most common issues are passing invalid encoding names, ignoring validation after relabeling, and mixing incompatible encodings during concatenation.

Invalid encoding names raise ArgumentError

"hello".force_encoding("Fake-Encoding")
# => ArgumentError: unknown encoding name - Fake-Encoding

Ruby raises ArgumentError immediately for unknown encoding names, so this kind of mistake is caught early. A subtler problem: force_encoding performs no validation of the byte content — it labels any byte sequence as any encoding, even if the bytes are invalid for that encoding:

No validation — invalid sequences pass silently

raw = "\xFF".force_encoding("UTF-8")
raw.valid_encoding?
# => false

The string does not crash immediately, but downstream operations can fail or behave unexpectedly. Pattern matching, string comparison, and I/O operations may raise errors or produce wrong results. After force_encoding, use scrub to replace or remove invalid byte sequences:

raw = "\xFF\xFE".force_encoding("UTF-8")
raw.scrub
# => "�"
raw.scrub("")
# => ""

After calling force_encoding, use scrub to clean up invalid byte sequences before the string reaches other parts of your program. Another danger arises when concatenating strings: Ruby raises Encoding::CompatibilityError if the encodings are incompatible:

Encoding incompatibility on concatenation

Combining strings with incompatible encodings raises Encoding::CompatibilityError:

a = "\u00A9".force_encoding("ASCII-8BIT")
b = "hello"
a + b
# => Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8

The Encoding::CompatibilityError stops your program immediately, which is risky in production code that processes unpredictable input from users or external systems. Rather than rescuing the exception, which can clutter your code, test the pair with Encoding.compatible? before attempting concatenation:

Encoding.compatible?("\xC0", "hello")
# => nil

When Encoding.compatible? returns nil, the two strings cannot be concatenated safely — you would need to transcode one of them first with encode. On the plus side, because force_encoding only touches the encoding metadata, it works even on frozen strings where the byte content is locked:

Frozen strings are not a problem

force_encoding only changes the encoding tag, not the string’s content. It works on frozen strings:

frozen = "test".freeze
frozen.force_encoding("ASCII-8BIT")
# => "test"
frozen.frozen?
# => true

Frozen strings accept force_encoding without complaint because the method only touches metadata. A related metadata concern is the Byte Order Mark. When reading files tagged as UTF-8, Ruby does not strip the BOM automatically, so a force_encoding("UTF-8") call leaves those leading bytes in place — strip them before further processing:

BOM handling

Byte Order Marks (BOMs) can cause confusion when reading files tagged as UTF-8. Ruby does not strip BOMs automatically:

content = File.binread("file_with_bom.txt").force_encoding("UTF-8")
content.valid_encoding?
# => true (BOM is technically valid UTF-8)
content.bytes.first(3)
# => [239, 187, 191]  # UTF-8 BOM
content = content.gsub(/\A#{Regexp.escape("\xEF\xBB\xBF")}/, "")
# => content without BOM

The three-byte BOM at the start of a UTF-8 file is technically valid UTF-8, so valid_encoding? returns true even though you probably do not want the BOM in your processed text. Strip it with a regex before any string operations. Another real-world trap: CSV files exported from Excel on Windows often use Windows-1252 encoding, not UTF-8:

CSV and Excel encoding edge case

CSV files from Excel often contain characters that appear valid in one encoding but are misinterpreted in another. Windows users saving CSV from Excel may produce files in Windows-1252, not UTF-8:

require "csv"
content = File.read("excel_export.csv").force_encoding("Windows-1252")
csv = CSV.parse(content, encoding: "Windows-1252:UTF-8")

If you skip the force_encoding("Windows-1252") step, Windows-1252 bytes get interpreted as UTF-8, producing mojibake that encode alone cannot easily fix.

Common use cases

Reading binary file data

When you read a file opened in binary mode, Ruby tags it as ASCII-8BIT. If you know the actual encoding, relabel it:

data = File.binread("data.bin")
data.encoding
# => #<Encoding:ASCII-8BIT>
text = data.force_encoding("UTF-8")

Binary file reads tag the result as ASCII-8BIT regardless of actual content. Relabeling with force_encoding("UTF-8") is the right fix when you know the source encoding. The same issue affects network data — Ruby may tag raw socket reads as ASCII-8BIT:

Fixing mislabeled network data

require "socket"

socket = TCPSocket.open("example.com", 80)
response = socket.readpartial(1024)
socket.close
# Ruby may tag this as ASCII-8BIT if it contains raw bytes
text = response.force_encoding("UTF-8")

After receiving network data, force_encoding corrects the encoding tag so that string operations like slicing, regex matching, and comparison work correctly. Without this step, any operation that inspects individual characters may raise errors. When comparing strings that carry different encodings, relabel them to match:

Preparing strings for comparison

a = "\xC0".force_encoding("UTF-8")
b = "\xC0".force_encoding("ISO-8859-1")
a == b
# => false — different byte sequences, different encodings

Strings with different encodings compare as unequal even when their byte values are identical — Ruby considers the encoding tag part of the string’s identity. For reliable comparisons, ensure both strings share the same encoding before comparing. The method returns self, making chaining convenient for immediate inspection:

Return Value

Returns self, the same string object with the updated encoding tag. This allows chaining:

result = raw.force_encoding("UTF-8").valid_encoding?
# => true or false

Version Information

Ruby 1.9.0 introduced the encoding system that includes force_encoding.
No breaking changes in Ruby 2.0 through 3.3.