rubyguides

JSON XML in Ruby: How to Parse, Generate, and Transform Data

Ruby ships with solid built-in support for both JSON and XML, making it easy to interchange data with web APIs, configuration files, and external services. This json xml guide covers everything you need to work with these formats in everyday Ruby code.

Key takeaways

  • Use JSON when you want a lightweight format that maps cleanly to Ruby hashes and arrays.
  • Use XML when you need namespaces, mixed content, or a stricter document structure.
  • JSON.parse and JSON.generate cover the most common JSON workflows in Ruby.
  • REXML ships with Ruby, while Nokogiri is the stronger choice when you want faster and more powerful XML tooling.
  • Choosing the right format is often about the system on the other side of the wire, not just the Ruby code you are writing.

If you think about the structure first, the library choice becomes easier. JSON is usually the best fit for API payloads and simple data exchange. XML still matters for feeds, legacy integrations, and documents that need namespaces or richer markup.

JSON versus XML

JSON and XML solve similar problems, but they feel very different in Ruby code. JSON usually turns into hashes and arrays without much ceremony, which makes it easy to inspect and transform. XML is more verbose, but that extra structure can be helpful when the data needs attributes, nested elements, or namespace-aware lookups.

That means you do not need to treat them as competing technologies. In practice, the question is often which format the other system already speaks. Once that is clear, the Ruby side becomes a matter of choosing the parser that matches the shape of the data.

JSON in Ruby

Ruby includes the JSON module in its standard library. No extra installation is required for basic use.

When people say “working with JSON in Ruby,” they usually mean a small set of repeatable tasks: parse incoming text, generate output, pretty-print for humans, and handle the occasional custom object. The examples below cover those tasks in the order most projects need them.

Parsing JSON

Use JSON.parse to convert a JSON string into Ruby objects:

require "json"

data = '{"name": "Alice", "age": 30, "active": true}'
person = JSON.parse(data)

person["name"]    # => "Alice"
person["age"]     # => 30
person["active"]  # => true

JSON.parse converts the JSON text into a Ruby hash, with keys as strings and values mapped to their closest Ruby equivalents: strings stay strings, numbers become Integer or Float, booleans become true/false, and null becomes nil. This mapping is automatic and predictable, which makes JSON parsing feel natural in Ruby. Because the parser can fail on malformed input, wrap it in a rescue block when dealing with untrusted data:

begin
  result = JSON.parse(dangerous_input)
rescue JSON::ParserError => e
  puts "Invalid JSON: #{e.message}"
end

That pattern is worth using any time the input comes from a request, a queue, or a file that might be edited by hand. The rescue block keeps a bad payload from crashing the entire process and gives you a place to log or report the problem. Once the parsing path is covered, the next logical step is producing JSON from Ruby objects. The JSON module can serialize hashes, arrays, strings, numbers, and booleans back into valid JSON text with a single call.

Generating JSON

Convert Ruby objects to JSON strings with JSON.generate or the shorter #to_json method:

require "json"

data = { name: "Bob", scores: [95, 87, 92] }

JSON.generate(data)  # => "{\"name\":\"Bob\",\"scores\":[95,87,92]}"
data.to_json        # => "{\"name\":\"Bob\",\"scores\":[95,87,92]}"

Both produce identical output. Use whichever reads better in context.

If you are generating API payloads, JSON.generate is usually the most explicit option. If you are serializing a Ruby object that already knows how to represent itself, to_json can keep the call site short and readable.

Pretty Printing

For human-readable output, pass the indent option:

data = { user: "Carol", roles: ["admin", "editor"] }

puts JSON.pretty_generate(data)
# {
#   "user": "Carol",
#   "roles": [
#     "admin",
#     "editor"
#   ]
# }

This is especially useful when writing configuration files or debugging API responses.

Pretty-printed JSON is also useful when humans need to inspect the output later. That could be a config file, a fixture, or a debug dump that you plan to read outside of Ruby.

Custom object serialization

When you need to serialize custom objects, implement to_json on your class:

require "json"

class Point
  attr_accessor :x, :y

  def initialize(x, y)
    @x = x
    @y = y
  end

  def to_json(*_args)
    { x: @x, y: @y }.to_json
  end
end

point = Point.new(3, 4)
point.to_json  # => "{\"x\":3,\"y\":4}"

Custom serialization is one of the places where Ruby’s flexibility shows up nicely. You keep the public object small, but still control exactly how it turns into transport data.

JSON with Symbols

By default, JSON keys become strings in Ruby. If you prefer symbol keys, use the symbolize_names option:

data = '{"count": 42}'
parsed = JSON.parse(data, symbolize_names: true)

parsed[:count]  # => 42 (Symbol, not String)

The symbolized form is often easier to work with inside Ruby because it matches the way many Ruby APIs already use hashes. Just remember that the keys are still strings in the JSON text itself.


XML in Ruby

Ruby offers two main approaches for working with XML: REXML, which ships with the standard library, and Nokogiri, a popular gem with a more powerful API.

When the XML is small and you want to avoid extra dependencies, REXML is usually enough. When the XML is large, heavily queried, or part of a real application that already depends on Nokogiri, the gem gives you a smoother experience.

REXML: built-in XML parsing

REXML is part of Ruby’s standard library and handles both tree-based and stream-based parsing.

Parsing a Document

require "rexml/document"

xml_string = <<~XML
  <users>
    <user id="1">
      <name>Alice</name>
      <email>alice@example.com</email>
    </user>
    <user id="2">
      <name>Bob</name>
      <email>bob@example.com</email>
    </user>
  </users>
XML

doc = REXML::Document.new(xml_string)

# Get all user names
doc.elements.each("users/user/name") do |element|
  puts element.text
end
# Alice
# Bob

REXML is easy to reach for because it is already available. The tradeoff is that the API feels older and more verbose than Nokogiri, so it is better for straightforward parsing than for complex document manipulation.

Extracting Attributes

Access element attributes with attributes:

doc.elements.each("users/user") do |user|
  puts "ID: #{user.attributes['id']}, Name: #{user.elements['name'].text}"
end
# ID: 1, Name: Alice
# ID: 2, Name: Bob

REXML’s element traversal uses an XPath-like syntax inside each, which works well for straightforward document shapes. The block variable gives you each matching element, and from there you can read child text or drill into nested elements with another elements call. For documents where the structure is known ahead of time, this is often enough to get the data out without bringing in extra dependencies.

Building XML

Create XML documents programmatically with REXML::Element:

require "rexml/document"
require "rexml/formatters/pretty"

root = REXML::Element.new("config")
root.add_attribute("version", "1.0")

database = root.add_element("database")
database.add_element("host").text = "localhost"
database.add_element("port").text = "5432"

formatter = REXML::Formatters::Pretty.new
formatter.write(root, $stdout)
# <config version='1.0'>
#   <database>
#     <host>localhost</host>
#     <port>5432</port>
#   </database>
# </config>

Building XML by hand is useful when the structure is small and predictable. Once the document starts getting large or nested, you may prefer a library that makes querying and transformation easier.

Nokogiri: faster and more powerful

Nokogiri is a gem that wraps native XML parsers (libxml2 and libxslt). It is faster than REXML and the de facto standard for XML processing in Ruby on Rails applications.

Nokogiri is the practical choice when you want better selectors, more speed, or a richer XML API. It is the standard answer for many Ruby projects because it bridges the gap between low-level XML data and the kind of traversal code Ruby developers usually want to write.

Install it with:

gem install nokogiri

Once installed, Nokogiri gives you access to fast, native XML and HTML parsing through a Ruby-friendly API. The library wraps libxml2 under the hood, so you get C-level performance without writing any C yourself. For most Ruby projects that touch XML or HTML, Nokogiri is the first dependency added after the standard library. Here is what parsing a simple HTML document looks like:

Parsing HTML or XML

require "nokogiri"

html = <<~HTML
  <html>
    <body>
      <article>
        <h1>Hello World</h1>
        <p class="intro">Welcome to the site.</p>
        <p>More content here.</p>
      </article>
    </body>
  </html>
HTML

doc = Nokogiri::HTML(html)

# Find elements with CSS selectors
doc.css("h1").text          # => "Hello World"
doc.css("p.intro").text     # => "Welcome to the site."
doc.css("p").map(&:text)    # => ["Welcome to the site.", "More content here."]

Nokogiri’s CSS selector support reads like familiar web-scraping code: you pass a selector string and get back a node set you can iterate over or call methods like .text and .attr on. This is often enough for simple extraction tasks. When the document structure is deeper or you need to match on relationships that CSS selectors handle poorly, XPath gives you more precise control.

XPath Queries

For precise element selection, use XPath:

doc = Nokogiri::XML(xml_string)

# Select all user elements
doc.xpath("//user").each do |node|
  puts node.at_xpath("name").text
end

XPath is often the better fit when the structure is deeply nested or when you need to match on relationships that CSS selectors cannot express cleanly. Another area where Nokogiri stands apart from REXML is namespace handling: when an XML document uses xmlns, you must account for it in your queries, or they will silently return nothing.

Searching with Namespaces

Nokogiri handles XML namespaces cleanly:

xml = <<~XML
  <feed xmlns="https://example.org/feed/1.0">
    <entry>
      <title>Sample Post</title>
    </entry>
  </feed>
XML

doc = Nokogiri::XML(xml)
ns = { "f" => "https://example.org/feed/1.0" }
title = doc.at_xpath("//f:title", ns)
puts title.text  # => "Sample Post"

Namespaces are the part of XML that often surprises people moving from JSON. The prefix in the query does not have to match the prefix in the source document, but the namespace URI has to be correct. Once you see that pattern a few times, it becomes much easier to reason about the lookups. With parsing and querying covered, Nokogiri can also build entirely new XML documents from Ruby code, which is useful for generating feeds, configuration files, or API payloads programmatically.

Building Documents

Nokogiri also excels at creating new XML documents:

require "nokogiri"

builder = Nokogiri::XML::Builder.new do |xml|
  xml.products {
    xml.product(id: "p1") {
      xml.name("Widget")
      xml.price("29.99")
    }
    xml.product(id: "p2") {
      xml.name("Gadget")
      xml.price("49.99")
    }
  }
end

puts builder.to_xml
# <?xml version="1.0"?>
# <products>
#   <product id="p1">
#     <name>Widget</name>
#     <price>29.99</price>
#   </product>
#   <product id="p2">
#     <name>Gadget</name>
#     <price>49.99</price>
#   </product>
# </products>

Document building is where Nokogiri starts to feel more like a general XML toolkit than a parser. You can query, mutate, and generate documents without switching mental models.


Performance considerations

JSON Speed

The standard json gem is pure Ruby and works everywhere, but it is not the fastest option. For high-throughput scenarios, consider these alternatives:

GemDescription
ojOptimized JSON parser, 3-10x faster than the stdlib
yajl-rubyStreaming JSON parser, low memory footprint
JSON (stdlib)Convenient, fully compatible, moderate speed

Using oj is as simple as:

require "oj"

data = Oj.load(json_string)   # parse
output = Oj.dump(ruby_object) # generate

That example is here to show the shape of a faster replacement, not to suggest that you need another gem for every JSON task. The stdlib is still the right starting point when convenience matters more than raw throughput.

XML Speed

For XML processing, Nokogiri consistently outperforms REXML because it uses compiled C libraries. Prefer Nokogiri when:

  • Processing large documents (megabytes of XML)
  • Parsing HTML from web pages
  • Performance is critical in your application

Use REXML for quick scripts, small documents, or when you want to avoid adding gem dependencies.

Memory Usage

Both JSON and XML parsers hold entire documents in memory. For very large files, consider:

  • Streaming XML parsers like Nokogiri::XML::Reader: processes node by node
  • Line-delimited JSON (JSON Lines): parse one JSON object per line without loading everything at once

Large files are where parser choice stops being abstract. If the document size is measured in megabytes or the data arrives continuously, streaming becomes much more important than the convenience of a single parse call.


Choosing between JSON and XML

FactorJSONXML
Typical useWeb APIs, config filesDocument formats, SOAP services
VerbosityCompactVerbose with tags
Schema supportNone (schema-less)XSD, DTD support
Ruby stdlibYes (JSON)Yes (REXML)
Recommended for large docsoj gemNokogiri

For most modern web APIs, JSON is the default choice. XML remains relevant for legacy enterprise systems, document formats like RSS/Atom, and scenarios requiring formal schema validation.

When you pick a format deliberately, the rest of the code gets easier to explain. A method named parse_json or build_xml_feed tells the reader what kind of input or output to expect, which is often more valuable than squeezing every line into the shortest form possible.


Common Pitfalls

JSON:

  • forgetting to require "json"; the module is not auto-loaded
  • mixing up JSON.parse (string to object) and JSON.generate (object to string)
  • not handling JSON::ParserError when parsing external input

XML:

  • REXML does not handle malformed HTML well; use Nokogiri for HTML
  • Nokogiri’s XPath expressions are case-sensitive; double-check element names
  • Namespace prefixes can break queries if not declared explicitly

Frequently asked questions

Should I prefer JSON over XML in new Ruby code?

Usually yes, unless the service or file format already requires XML. JSON is smaller, easier to read, and maps cleanly to Ruby hashes and arrays.

Do I need Nokogiri for every XML task?

No. REXML is fine for smaller or simpler tasks. Nokogiri becomes more attractive when you need speed, richer traversal, or better selector support.

Is pretty-printed JSON safe for production?

Yes, as long as the receiving system expects JSON text and the extra whitespace does not matter. It is especially useful for config files, logs, and generated fixtures.

Conclusion

JSON and XML are both useful in Ruby, but they solve slightly different problems. JSON is the natural fit for most API work because it stays compact and maps cleanly to Ruby data structures. XML is still the right choice when you need document structure, namespaces, or compatibility with older systems.

The main decision is not which parser is better in the abstract. It is which data format makes the integration easier to read, easier to validate, and easier to keep working over time. Once you answer that question, the Ruby code itself usually stays straightforward.


See Also