XML Formatting and Validation: A Modern Developer's Guide
JSON won the API format war, but XML never went away. It powers SOAP services that process trillions of dollars in enterprise transactions. It defines every SVG image on the web. It structures Android layouts, Maven build files, .NET configuration, RSS feeds, and XHTML documents. If you work in software long enough, you will encounter XML — and you will need to read, format, and validate it.
Where XML still lives
The perception that XML is a legacy format does not match reality. Here are the domains where XML remains the primary or only option:
SOAP web services — Banking, healthcare, government, and enterprise systems rely heavily on SOAP. These APIs exchange XML exclusively, defined by WSDL schemas.
SVG — Every scalable vector graphic is an XML document. When you open an SVG in a text editor, you see XML:
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 100 100">
<circle cx="50" cy="50" r="40" fill="#2E86C1" />
<text x="50" y="55" text-anchor="middle" fill="white">Hi</text>
</svg>
Configuration files — Maven's pom.xml, .NET's app.config and .csproj, Android's AndroidManifest.xml, and Spring's application context are all XML.
RSS and Atom feeds — Blog syndication, podcast directories, and news aggregators consume XML feeds.
Office documents — DOCX, XLSX, and PPTX files are ZIP archives containing XML files. The Open Document Format (ODF) uses XML too.
XSLT and XPath — Transformation and query languages that only work with XML, still widely used in publishing and data pipelines.
Formatting rules
Well-formatted XML follows a small set of rules that make it readable and parseable:
Indentation
Use consistent indentation (2 or 4 spaces) for nested elements:
<!-- Hard to read -->
<catalog><book><title>Clean Code</title><author>Robert Martin</author><price>29.99</price></book></catalog>
<!-- Formatted -->
<catalog>
<book>
<title>Clean Code</title>
<author>Robert Martin</author>
<price>29.99</price>
</book>
</catalog>
Self-closing tags
Elements with no content should use self-closing syntax:
<!-- Verbose -->
<input type="text"></input>
<!-- Preferred -->
<input type="text" />
Attribute alignment
For elements with many attributes, place each attribute on its own line:
<dependency
groupId="org.springframework"
artifactId="spring-core"
version="6.1.0"
scope="compile"
/>
CDATA sections
When element content contains characters that would need escaping (<, >, &), use a CDATA section instead of entity references:
<!-- Entity escaping (verbose) -->
<script>if (a < b && c > d) { }</script>
<!-- CDATA (readable) -->
<script><![CDATA[if (a < b && c > d) { }]]></script>
Namespaces
Namespaces prevent element name collisions when combining XML from different vocabularies. They are the most confusing aspect of XML for newcomers.
<root xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:f="http://example.com/furniture">
<h:table>
<h:tr>
<h:td>Cell</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>Coffee Table</f:name>
<f:width>120</f:width>
</f:table>
</root>
Without namespaces, both <table> elements would collide. The namespace URI does not need to resolve to anything — it is just a unique identifier.
Well-formed vs valid
These are two different things:
Well-formed XML follows the syntax rules:
- Single root element
- Every opening tag has a closing tag (or is self-closing)
- Tags are properly nested (no
<a><b></a></b>) - Attribute values are quoted
- Special characters are escaped
Valid XML is well-formed AND conforms to a schema (DTD, XSD, or RelaxNG) that defines which elements and attributes are allowed, their types, and their structure.
Most formatters check for well-formedness. Validation requires a schema definition, which is a separate step.
XML vs JSON: when XML is the right choice
JSON is simpler and lighter. Use it when you can. But XML has genuine advantages in specific scenarios:
Document markup — XML can mix content and structure naturally. <p>This is <em>important</em> text.</p> has no clean JSON equivalent.
Schema validation — XSD provides rich validation: data types, patterns, enumerations, occurrence constraints, and complex type inheritance. JSON Schema exists but is less mature.
Transformation — XSLT can transform XML into HTML, PDF, or other XML formats declaratively. There is no JSON equivalent with comparable power.
Namespaces — When combining data from multiple sources, namespaces prevent conflicts. JSON has no namespace mechanism.
Existing ecosystems — If your industry uses XML (finance with FIX/FpML, healthcare with HL7, publishing with DITA), converting to JSON adds complexity without benefit.
Practical XML workflow
When working with XML in a modern development environment:
- Format first. Take minified or messy XML and apply consistent indentation. This alone makes the structure visible.
- Validate against the schema. If an XSD or DTD is available, validate before processing. Catch structural errors early.
- Use XPath for querying. Instead of parsing the entire document, use XPath expressions to extract specific values:
//book[price > 20]/title → All titles of books over $20
/catalog/book[1]/author → Author of the first book
//book[@category='fiction'] → All fiction books
- Transform with XSLT when needed. For converting XML reports to HTML or restructuring data between systems, XSLT is purpose-built and reliable.
- Consider streaming for large files. DOM parsing loads the entire document into memory. For files over 100 MB, use SAX or StAX (streaming) parsers that process elements one at a time.
Try our XML Formatter to format and validate your XML instantly — right in your browser, no upload required.