String#bytes
Basic Usage
bytes returns an array of integers, one per byte in the string:
"hello".bytes
# => [104, 101, 108, 108, 111]
Each integer is a value between 0 and 255. No arguments, no block — just the array.
Signature
str.bytes -> array
- Receiver: Any
String - Arguments: None
- Return:
Array<Integer>
Multi-byte Characters Are Not Single Bytes
This is where bytes trips up many developers. A UTF-8 character can take 1 to 4 bytes. The string "café" has 4 characters but 5 bytes:
"café".bytes
# => [99, 97, 102, 195, 169]
"café".chars.count # => 4
"café".bytes.count # => 5
The é (U+00E9) encodes as two bytes: 195 and 169. If you loop over bytes expecting one element per character, you will get surprises.
For Japanese characters, which use 3 bytes each in UTF-8:
s = "日本語"
s.bytes
# => [230, 151, 165, 230, 152, 165, 230, 156, 165]
s.chars.count # => 3
s.bytes.count # => 9
Encoding Changes the Byte Values
The same character sequence produces different bytes depending on the string’s encoding:
"é".encode("UTF-8").bytes
# => [195, 169]
"é".encode("ISO-8859-1").bytes
# => [233]
UTF-8 uses multiple bytes for non-ASCII characters. ISO-8859-1 (Latin-1) uses one byte per character. Always check str.encoding before interpreting byte values.
bytes vs codepoints vs chars
These three methods get confused constantly. Here is the difference:
"café".chars # => ["c", "a", "f", "é"] — array of single-char strings
"café".codepoints # => [99, 97, 102, 233] — Unicode codepoint integers
"café".bytes # => [99, 97, 102, 195, 169] — raw byte integers
charsgives you one array element per charactercodepointsgives you one element per character as a Unicode integerbytesgives you one element per byte — non-ASCII characters span multiple elements
Iterating with each_byte
If you need to process each byte individually, each_byte is an explicit iterator:
byte_values = []
"hello".each_byte { |b| byte_values << b }
byte_values
# => [104, 101, 108, 108, 111]
Both each_byte and bytes without a block return the same values. Using each_byte makes the iteration intent clear in your code.
Checking Bytesize Separately
If you only need the count of bytes, bytesize (or bytesize? in newer Ruby) is more direct:
"café".bytesize
# => 5
"hello".bytesize
# => 5
Calling bytes.count is equivalent but creates an intermediate array first.
Working with Binary Data
bytes is useful when you need to inspect or manipulate binary data stored in a string:
data = "\xFF\x00\x80"
data.bytes
# => [255, 0, 128]
# Check if a specific byte value is present
data.bytes.include?(255)
# => true
Ruby strings can hold arbitrary binary data when using BINARY (ASCII-8BIT) encoding:
binary_string = "hello".b
binary_string.encoding
# => #<Encoding:ASCII-8BIT>
binary_string.bytes
# => [104, 101, 108, 108, 111]
Practical Example: Checksum Calculation
A common use case is computing a simple byte-sum checksum:
def byte_sum(str)
str.bytes.sum
end
byte_sum("hello")
# => 532 (104 + 101 + 108 + 108 + 111)
See Also
- String#chars — array of character strings
- String#bytesize — number of bytes without creating an array
- String#encode — convert between encodings
- String#each_char — iterate over characters
- String#slice — extract substrings by index or regex