Understanding Diffs: A Visual Guide to Text Comparison
Diffs are everywhere in software development: git commits, code reviews, deployment configs, database migrations. Yet many developers treat diff output as something to skim rather than understand. This guide explains how diffing works, what the output means, and how to use diffs effectively.
How diffing works: the LCS algorithm
At its core, computing a diff between two texts is about finding the longest common subsequence (LCS). The LCS is the longest sequence of lines that appear in both files in the same order, though not necessarily consecutively.
Given file A and file B, the diff algorithm:
- Finds the LCS -- the lines that are identical in both files
- Lines in A but not in the LCS are marked as removed (deletions)
- Lines in B but not in the LCS are marked as added (insertions)
Consider a simple example:
File A: File B:
apple apple
banana cherry
cherry cherry pie
date date
The LCS is apple, cherry, date (note: "cherry" in A matches "cherry" in B as a common line is not found here since B has "cherry" not "banana"). The actual LCS is apple, date. The diff shows banana and cherry removed from A, and cherry and cherry pie added from B.
The classic LCS algorithm uses dynamic programming with O(n*m) time and space complexity, where n and m are the line counts. Modern implementations like Myers' diff algorithm optimize this significantly for the common case where files are mostly similar.
Unified diff format
The unified diff format is the most common output you will encounter. It is what git diff produces and what most code review tools display.
--- a/config.json
+++ b/config.json
@@ -1,6 +1,7 @@
{
"name": "my-app",
- "version": "1.2.0",
+ "version": "1.3.0",
"description": "A sample application",
+ "license": "MIT",
"main": "index.js"
}
Here is what each part means:
File headers: --- a/config.json is the original file, +++ b/config.json is the modified file.
Hunk header: @@ -1,6 +1,7 @@ means the hunk starts at line 1 in the original (showing 6 lines) and line 1 in the modified (showing 7 lines).
Line prefixes:
- Lines starting with a space are context (unchanged)
- Lines starting with
-are removed from the original - Lines starting with
+are added in the modified version
Split (side-by-side) diff format
Split diff shows the original and modified files in two columns. This format is easier to read for large changes because you can see the before and after simultaneously:
Original │ Modified
─────────────────────────── │ ───────────────────────────
{ │ {
"name": "my-app", │ "name": "my-app",
"version": "1.2.0", [-] │ "version": "1.3.0", [+]
"description": "...", │ "description": "...",
│ "license": "MIT", [+]
"main": "index.js" │ "main": "index.js"
} │ }
Most code review tools (GitHub, GitLab) let you toggle between unified and split views. Split is better for reviewing changes, while unified is better for copying and applying patches.
Reading git diff output
The git diff command produces unified diffs with some extra metadata:
# Unstaged changes
git diff
# Staged changes
git diff --staged
# Between two commits
git diff abc123 def456
# Specific file
git diff HEAD~1 -- src/app.js
Word-level diffs
Line-level diffs can be noisy when only a small part of a line changed. Git supports word-level highlighting:
git diff --word-diff
This produces output like:
"version": [-"1.2.0"-]{+"1.3.0"+},
Where [-...-] shows removed text and {+...+} shows added text within the same line.
Stat summary
For a high-level overview of changes:
git diff --stat
src/app.js | 12 ++++++------
src/config.json | 3 ++-
tests/app.test.js | 25 +++++++++++++++++++++++++
3 files changed, 31 insertions(+), 9 deletions(-)
Practical use cases
Code review
The most common use of diffs. When reviewing a pull request, focus on:
- What was removed (potential regressions)
- What was added (new logic to understand)
- Context lines (does the change fit its surroundings)
Configuration debugging
When a deployment breaks, comparing the current config with the last known working version instantly reveals what changed:
diff production.env staging.env
Database migration verification
Before running a migration, diff the generated SQL against the expected schema change to catch unintended modifications.
Document comparison
Diffs are not limited to code. Comparing legal documents, API specifications, or requirements documents helps track what changed between revisions.
API response debugging
Save an API response, make a change, save the new response, and diff them. This is often the fastest way to understand the impact of a code change on API output.
Tips for better diffs
Keep commits small and focused. A diff that changes 5 lines across 2 files is easy to review. A diff that changes 500 lines across 30 files is not.
Use meaningful whitespace settings. Sometimes whitespace changes add noise. Use git diff -w to ignore whitespace or git diff --ignore-blank-lines to skip blank line changes.
Understand rename detection. Git detects file renames by comparing content similarity. Use git diff -M to enable rename detection and see clean "file renamed" output instead of a full deletion and addition.
Use patience diff for better results. The default diff algorithm sometimes produces confusing output for heavily modified files. The patience algorithm often produces more human-readable diffs:
git diff --patience
Summary
Diffs are a fundamental tool for understanding change. Whether you are reviewing code, debugging a config issue, or tracking document revisions, understanding how diffs work -- the LCS algorithm, unified and split formats, and git's diff options -- makes you more effective at spotting what changed and why.
Try our Diff Viewer to compare any two texts side by side instantly -- right in your browser, no upload required.