rubyguides

force_encoding

str.force_encoding(encoding)

Syntax

str.force_encoding(encoding)

encoding can be a string or a symbol naming the encoding. Returns self.

What It Does

String#force_encoding assigns an encoding label to a string without modifying its underlying byte sequence. The bytes stay exactly as they are — only Ruby’s interpretation of those bytes changes.

Ruby stores strings as a sequence of bytes tagged with an encoding. That tag tells Ruby how to read the bytes as characters. When data comes from an external source (file, network, database), Ruby’s automatic detection sometimes guesses wrong. force_encoding lets you correct the label after the fact.

Understanding Ruby’s Default Encoding Settings

Before reaching for force_encoding, understand Ruby’s global encoding settings — they affect every string your program handles.

Encoding.default_external sets the default encoding for all new strings created from external sources (file I/O, network, etc.). Encoding.default_internal is a fallback encoding Ruby uses internally when converting between encodings, though it defaults to nil in modern Ruby.

Encoding.default_external
# => #<Encoding:UTF-8>

Encoding.default_external = "ISO-8859-1"
Encoding.default_external
# => #<Encoding:ISO-8859-1>

Set these at the top of your script before any I/O happens:

# encoding: utf-8
# Or programmatically:
Encoding.default_external = "UTF-8"

A common pitfall: if default_external is set incorrectly (e.g., to ASCII-8BIT), Ruby’s I/O layer may tag all incoming strings as that encoding, causing silent corruption of multi-byte characters. Use force_encoding to correct individual strings, or fix default_external to prevent the problem at source.

Examples

Relabeling raw bytes as UTF-8

raw = "\xC0\xC1"
raw.encoding
# => #<Encoding:ASCII-8BIT>

raw.force_encoding("UTF-8")
# => "\xC0\xC1"
raw.encoding
# => #<Encoding:UTF-8>

Accepts Symbol or String

"hello".force_encoding("UTF-8")
# => "hello"
"hello".force_encoding(:UTF_8)
# => "hello"

Chainable — returns self

str = "\xE2\x80\x93".force_encoding("UTF-8")
str.valid_encoding?
# => true

Inspecting the encoding object

s = "café"
s.force_encoding("ISO-8859-1").encoding
# => #<Encoding:ISO-8859-1>

force_encoding vs. encode

These two methods look similar but behave very differently.

encode transcodes — it converts the actual byte content to match the target encoding:

utf8 = " résumé"
iso = utf8.encode("ISO-8859-1")
iso.bytes
# => [114, 101, 115, 117, 109, 233]

encode! is the in-place variant, modifying the string directly:

utf8 = " résumé"
utf8.encode!("ISO-8859-1")
utf8.bytes
# => [114, 101, 115, 117, 109, 233]

force_encoding changes only the label. The bytes remain untouched — they are just relabeled as a different encoding:

utf8 = " résumé"
iso = utf8.force_encoding("ISO-8859-1")
iso.bytes
# => [114, 101, 115, 117, 109, 195, 169]  # bytes unchanged — just relabeled
iso.valid_encoding?
# => false  # ISO-8859-1 cannot represent bytes 195, 169 as a single character

force_encoding is re-labeling. encode is conversion.

Gotchas and Edge Cases

Invalid encoding names raise ArgumentError

"hello".force_encoding("Fake-Encoding")
# => ArgumentError: unknown encoding name - Fake-Encoding

No validation — invalid sequences pass silently

raw = "\xFF".force_encoding("UTF-8")
raw.valid_encoding?
# => false

The string does not crash immediately, but downstream operations can fail or behave unexpectedly. Pattern matching, string comparison, and I/O operations may raise errors or produce wrong results. After force_encoding, use scrub to replace or remove invalid byte sequences:

raw = "\xFF\xFE".force_encoding("UTF-8")
raw.scrub
# => "�"
raw.scrub("")
# => ""

Encoding incompatibility on concatenation

Combining strings with incompatible encodings raises Encoding::CompatibilityError:

a = "\u00A9".force_encoding("ASCII-8BIT")
b = "hello"
a + b
# => Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8

Check compatibility first:

Encoding.compatible?("\xC0", "hello")
# => nil

Frozen strings are not a problem

force_encoding only changes the encoding tag, not the string’s content. It works on frozen strings:

frozen = "test".freeze
frozen.force_encoding("ASCII-8BIT")
# => "test"
frozen.frozen?
# => true

BOM handling

Byte Order Marks (BOMs) can cause confusion when reading files tagged as UTF-8. Ruby does not strip BOMs automatically:

content = File.binread("file_with_bom.txt").force_encoding("UTF-8")
content.valid_encoding?
# => true (BOM is technically valid UTF-8)
content.bytes.first(3)
# => [239, 187, 191]  # UTF-8 BOM
content = content.gsub(/\A#{Regexp.escape("\xEF\xBB\xBF")}/, "")
# => content without BOM

CSV and Excel encoding edge case

CSV files from Excel often contain characters that appear valid in one encoding but are misinterpreted in another. Windows users saving CSV from Excel may produce files in Windows-1252, not UTF-8:

require "csv"
content = File.read("excel_export.csv").force_encoding("Windows-1252")
csv = CSV.parse(content, encoding: "Windows-1252:UTF-8")

If you skip the force_encoding("Windows-1252") step, Windows-1252 bytes get interpreted as UTF-8, producing mojibake that encode alone cannot easily fix.

Common Use Cases

Reading binary file data

When you read a file opened in binary mode, Ruby tags it as ASCII-8BIT. If you know the actual encoding, relabel it:

data = File.binread("data.bin")
data.encoding
# => #<Encoding:ASCII-8BIT>
text = data.force_encoding("UTF-8")

Fixing mislabeled network data

require "socket"

socket = TCPSocket.open("example.com", 80)
response = socket.readpartial(1024)
socket.close
# Ruby may tag this as ASCII-8BIT if it contains raw bytes
text = response.force_encoding("UTF-8")

Preparing strings for comparison

a = "\xC0".force_encoding("UTF-8")
b = "\xC0".force_encoding("ISO-8859-1")
a == b
# => false — different byte sequences, different encodings

Return Value

Returns self, the same string object with the updated encoding tag. This allows chaining:

result = raw.force_encoding("UTF-8").valid_encoding?
# => true or false

Version Information

  • Ruby 1.9.0 introduced the encoding system that includes force_encoding.
  • No breaking changes in Ruby 2.0 through 3.3.

See Also