String#valid_encoding?

str.valid_encoding? → true or false

Returns Boolean· Added in v1.9.1· Updated May 30, 2026· String Methods

rubyencoding

What `valid_encoding?` Checks

String#valid_encoding? returns true if the byte sequence is valid for its currently assigned encoding. It returns false if the bytes form an invalid sequence.

The method checks against its current encoding, not against some intended or assumed encoding. It never raises an exception and never modifies the string.

str = "hello"
str.encoding        # => #<Encoding:UTF-8>
str.valid_encoding? # => true

The return value comes from checking the internal byte layout against the encoding rules. When the encoding is UTF-8, the method verifies that every byte falls within the valid UTF-8 range and that multi-byte sequences have the right continuation bytes. This is a fast, in-memory check that does not require any external validation or ICU library calls.

Valid and invalid byte sequences

UTF-8 strings with properly formed characters return true. This includes ASCII text, accented characters, and CJK characters.

"hello".valid_encoding?          # => true
"héllo".valid_encoding?          # => true
"日本語".valid_encoding?         # => true

These all return true because each byte composes a valid UTF-8 character. ASCII characters use a single byte, accented Latin letters use two, and CJK characters use three or four. As long as every byte in the sequence follows the encoding specification, the method reports a clean result.

Invalid byte sequences return false. This happens when a multi-byte character is truncated, or when bytes that are invalid in UTF-8 appear in the string.

# \xc2 starts a 2-byte UTF-8 character but has no continuation byte
"\xc2".force_encoding("UTF-8").valid_encoding?       # => false

# Completely invalid UTF-8 start bytes
"bad\xff\xfefood".force_encoding("UTF-8").valid_encoding?  # => false

A byte like \xff can never start a valid UTF-8 character, so valid_encoding? returns false immediately upon encountering it. The check short-circuits at the first invalid position rather than scanning the remaining bytes, which keeps the method fast even on long strings with early corruption.

The full sequence "\xc2\xa1" (which represents the character \u00a1) is valid UTF-8:

"\xc2\xa1".force_encoding("UTF-8").valid_encoding?  # => true

The difference between these two cases shows how multi-byte encoding works at the byte level. A complete UTF-8 sequence includes the right number of continuation bytes after the lead byte. When one of those bytes is missing, the entire character becomes invalid and the method reports the problem immediately.

But if you truncate it by removing the continuation byte \xa1, validity fails:

"\xc2".force_encoding("UTF-8").valid_encoding?     # => false

The ASCII-8BIT Gotcha

Strings tagged with ASCII-8BIT (also called BINARY) always return true when all bytes are 7-bit ASCII, because 7-bit ASCII is valid in every encoding. This can be misleading when you’re actually dealing with binary data that happens to look valid.

raw = "\xde\xad\xbe\xef".b  # .b forces ASCII-8BIT encoding
raw.valid_encoding?          # => true  (7-bit bytes pass every encoding check)

raw.encoding                 # => #<Encoding:ASCII-8BIT>

Remember that ASCII-8BIT is effectively a pass-through encoding: every byte from 0x00 to 0xFF is considered valid, so the encoding check cannot distinguish between harmless text and arbitrary binary data. When you are processing data from an external source, checking the encoding alongside the validity test gives you a much clearer picture of what you are working with.

A true result from an ASCII-8BIT string does not mean the data is valid UTF-8. It only means the bytes happen to be valid in the BINARY encoding. Check encoding alongside valid_encoding? to know what you’re actually dealing with.

How It Differs from `encode` and `scrub`

These three methods handle invalid byte sequences differently:

Method	Behavior on invalid bytes
`valid_encoding?`	Returns `false`, no exception
`encode`	Raises `Encoding::InvalidByteSequenceError`
`scrub`	Returns a copy with invalid bytes replaced

# valid_encoding? — no exception
"\xc2".force_encoding("UTF-8").valid_encoding?  # => false

# encode — raises
"\xc2".force_encoding("UTF-8").encode("UTF-8")
# => Encoding::InvalidByteSequenceError: "\xC2" on UTF-8

# scrub — replaces invalid bytes
"\xc2".force_encoding("UTF-8").scrub  # => "�"

Each of these three methods handles the same input differently, so choosing the right one depends on your error-handling strategy. When you want to inspect a string and decide what to do without catching exceptions, valid_encoding? is the right first step. When you are confident the data should be valid and want an exception on failure, use encode. When you need a repaired copy without losing information, scrub replaces bad bytes with a placeholder.

Use valid_encoding? as a diagnostic check before deciding whether to encode or scrub. It tells you the problem exists without throwing you into exception handling.

Practical Examples

Validate input before processing

def ensure_utf8(str)
  unless str.valid_encoding?
    str = str.encode("UTF-8", invalid: :replace, replace: "?")
  end
  str
end

ensure_utf8("Hello")                        # => "Hello"
ensure_utf8("\xc2".force_encoding("UTF-8")) # => "?"

The ensure_utf8 helper wraps the encoding check in a conditional that replaces bad bytes with a question mark. This approach works well when you need to clean up a single input value before storing or displaying it. For larger collections, you may prefer to filter out invalid strings entirely instead of replacing their contents with placeholder characters.

Filter a collection by encoding validity

strings = ["hello", "\xff Invalid".force_encoding("UTF-8"), "日本"]
strings.select(&:valid_encoding?)  # => ["hello", "日本"]

Common Mistakes

Assuming true means UTF-8. A string in ASCII-8BIT encoding with 7-bit ASCII bytes returns true. The string is valid for BINARY, not necessarily for UTF-8. Always check encoding if you need to know which encoding is in use.

Using valid_encoding? after force_encoding without understanding what it checks. force_encoding only changes the encoding label — it does not change the bytes. valid_encoding? then checks whether those bytes are valid for the new label. It cannot detect that you originally intended a different encoding.

valid_utf8_bytes = "\xc2\xa1"
valid_utf8_bytes.force_encoding("ASCII-8BIT").valid_encoding?  # => true
valid_utf8_bytes.force_encoding("UTF-8").valid_encoding?        # => true

Both return true because the bytes are genuinely valid in both encodings. But if those bytes were actually meant to be interpreted as, say, Windows-1252, the UTF-8 label gives you a false sense of correctness.

What valid_encoding? Checks