rubyguides

String#bytes

Basic Usage

bytes returns an array of integers, one per byte in the string:

"hello".bytes
# => [104, 101, 108, 108, 111]

Each integer is a value between 0 and 255. No arguments, no block — just the array.

Signature

str.bytes -> array
  • Receiver: Any String
  • Arguments: None
  • Return: Array<Integer>

Multi-byte Characters Are Not Single Bytes

This is where bytes trips up many developers. A UTF-8 character can take 1 to 4 bytes. The string "café" has 4 characters but 5 bytes:

"café".bytes
# => [99, 97, 102, 195, 169]

"café".chars.count   # => 4
"café".bytes.count   # => 5

The é (U+00E9) encodes as two bytes: 195 and 169. If you loop over bytes expecting one element per character, you will get surprises.

For Japanese characters, which use 3 bytes each in UTF-8:

s = "日本語"
s.bytes
# => [230, 151, 165, 230, 152, 165, 230, 156, 165]

s.chars.count   # => 3
s.bytes.count   # => 9

Encoding Changes the Byte Values

The same character sequence produces different bytes depending on the string’s encoding:

"é".encode("UTF-8").bytes
# => [195, 169]

"é".encode("ISO-8859-1").bytes
# => [233]

UTF-8 uses multiple bytes for non-ASCII characters. ISO-8859-1 (Latin-1) uses one byte per character. Always check str.encoding before interpreting byte values.

bytes vs codepoints vs chars

These three methods get confused constantly. Here is the difference:

"café".chars       # => ["c", "a", "f", "é"]  — array of single-char strings
"café".codepoints  # => [99, 97, 102, 233]     — Unicode codepoint integers
"café".bytes       # => [99, 97, 102, 195, 169] — raw byte integers
  • chars gives you one array element per character
  • codepoints gives you one element per character as a Unicode integer
  • bytes gives you one element per byte — non-ASCII characters span multiple elements

Iterating with each_byte

If you need to process each byte individually, each_byte is an explicit iterator:

byte_values = []
"hello".each_byte { |b| byte_values << b }
byte_values
# => [104, 101, 108, 108, 111]

Both each_byte and bytes without a block return the same values. Using each_byte makes the iteration intent clear in your code.

Checking Bytesize Separately

If you only need the count of bytes, bytesize (or bytesize? in newer Ruby) is more direct:

"café".bytesize
# => 5

"hello".bytesize
# => 5

Calling bytes.count is equivalent but creates an intermediate array first.

Working with Binary Data

bytes is useful when you need to inspect or manipulate binary data stored in a string:

data = "\xFF\x00\x80"
data.bytes
# => [255, 0, 128]

# Check if a specific byte value is present
data.bytes.include?(255)
# => true

Ruby strings can hold arbitrary binary data when using BINARY (ASCII-8BIT) encoding:

binary_string = "hello".b
binary_string.encoding
# => #<Encoding:ASCII-8BIT>

binary_string.bytes
# => [104, 101, 108, 108, 111]

Practical Example: Checksum Calculation

A common use case is computing a simple byte-sum checksum:

def byte_sum(str)
  str.bytes.sum
end

byte_sum("hello")
# => 532  (104 + 101 + 108 + 108 + 111)

See Also