Understanding Diffs: A Visual Guide to Text Comparison

Diffs are everywhere in software development: git commits, code reviews, deployment configs, database migrations. Yet many developers treat diff output as something to skim rather than understand. This guide explains how diffing works, what the output means, and how to use diffs effectively.

How diffing works: the LCS algorithm

At its core, computing a diff between two texts is about finding the longest common subsequence (LCS). The LCS is the longest sequence of lines that appear in both files in the same order, though not necessarily consecutively.

Given file A and file B, the diff algorithm:

Finds the LCS -- the lines that are identical in both files
Lines in A but not in the LCS are marked as removed (deletions)
Lines in B but not in the LCS are marked as added (insertions)

Consider a simple example:

File A:          File B:
apple            apple
banana           cherry
cherry           cherry pie
date             date

The LCS is apple, cherry, date (note: "cherry" in A matches "cherry" in B as a common line is not found here since B has "cherry" not "banana"). The actual LCS is apple, date. The diff shows banana and cherry removed from A, and cherry and cherry pie added from B.

The classic LCS algorithm uses dynamic programming with O(n*m) time and space complexity, where n and m are the line counts. Modern implementations like Myers' diff algorithm optimize this significantly for the common case where files are mostly similar.

Unified diff format

The unified diff format is the most common output you will encounter. It is what git diff produces and what most code review tools display.

--- a/config.json
+++ b/config.json
@@ -1,6 +1,7 @@
 {
   "name": "my-app",
-  "version": "1.2.0",
+  "version": "1.3.0",
   "description": "A sample application",
+  "license": "MIT",
   "main": "index.js"
 }

Here is what each part means:

File headers: --- a/config.json is the original file, +++ b/config.json is the modified file.

Hunk header: @@ -1,6 +1,7 @@ means the hunk starts at line 1 in the original (showing 6 lines) and line 1 in the modified (showing 7 lines).

Line prefixes:

Lines starting with a space are context (unchanged)
Lines starting with - are removed from the original
Lines starting with + are added in the modified version

Split (side-by-side) diff format

Split diff shows the original and modified files in two columns. This format is easier to read for large changes because you can see the before and after simultaneously:

Original                    │ Modified
─────────────────────────── │ ───────────────────────────
{                           │ {
  "name": "my-app",        │   "name": "my-app",
  "version": "1.2.0",  [-] │   "version": "1.3.0",  [+]
  "description": "...",     │   "description": "...",
                            │   "license": "MIT",    [+]
  "main": "index.js"       │   "main": "index.js"
}                           │ }

Most code review tools (GitHub, GitLab) let you toggle between unified and split views. Split is better for reviewing changes, while unified is better for copying and applying patches.

Reading git diff output

The git diff command produces unified diffs with some extra metadata:

# Unstaged changes
git diff

# Staged changes
git diff --staged

# Between two commits
git diff abc123 def456

# Specific file
git diff HEAD~1 -- src/app.js

Word-level diffs

Line-level diffs can be noisy when only a small part of a line changed. Git supports word-level highlighting:

git diff --word-diff

This produces output like:

"version": [-"1.2.0"-]{+"1.3.0"+},

Where [-...-] shows removed text and {+...+} shows added text within the same line.

Stat summary

For a high-level overview of changes:

git diff --stat

 src/app.js     | 12 ++++++------
 src/config.json |  3 ++-
 tests/app.test.js | 25 +++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 9 deletions(-)

Practical use cases

Code review

The most common use of diffs. When reviewing a pull request, focus on:

What was removed (potential regressions)
What was added (new logic to understand)
Context lines (does the change fit its surroundings)

Configuration debugging

When a deployment breaks, comparing the current config with the last known working version instantly reveals what changed:

diff production.env staging.env

Database migration verification

Before running a migration, diff the generated SQL against the expected schema change to catch unintended modifications.

Document comparison

Diffs are not limited to code. Comparing legal documents, API specifications, or requirements documents helps track what changed between revisions.

API response debugging

Save an API response, make a change, save the new response, and diff them. This is often the fastest way to understand the impact of a code change on API output.

Tips for better diffs

Keep commits small and focused. A diff that changes 5 lines across 2 files is easy to review. A diff that changes 500 lines across 30 files is not.

Use meaningful whitespace settings. Sometimes whitespace changes add noise. Use git diff -w to ignore whitespace or git diff --ignore-blank-lines to skip blank line changes.

Understand rename detection. Git detects file renames by comparing content similarity. Use git diff -M to enable rename detection and see clean "file renamed" output instead of a full deletion and addition.

Use patience diff for better results. The default diff algorithm sometimes produces confusing output for heavily modified files. The patience algorithm often produces more human-readable diffs:

git diff --patience

Summary

Diffs are a fundamental tool for understanding change. Whether you are reviewing code, debugging a config issue, or tracking document revisions, understanding how diffs work -- the LCS algorithm, unified and split formats, and git's diff options -- makes you more effective at spotting what changed and why.

Try our Diff Viewer to compare any two texts side by side instantly -- right in your browser, no upload required.

Understanding Diffs: A Visual Guide to Text Comparison

How diffing works: the LCS algorithm

Given file A and file B, the diff algorithm:

Finds the LCS -- the lines that are identical in both files
Lines in A but not in the LCS are marked as removed (deletions)
Lines in B but not in the LCS are marked as added (insertions)

Consider a simple example:

File A:          File B:
apple            apple
banana           cherry
cherry           cherry pie
date             date

Unified diff format

The unified diff format is the most common output you will encounter. It is what git diff produces and what most code review tools display.

--- a/config.json
+++ b/config.json
@@ -1,6 +1,7 @@
 {
   "name": "my-app",
-  "version": "1.2.0",
+  "version": "1.3.0",
   "description": "A sample application",
+  "license": "MIT",
   "main": "index.js"
 }

Here is what each part means:

File headers: --- a/config.json is the original file, +++ b/config.json is the modified file.

Hunk header: @@ -1,6 +1,7 @@ means the hunk starts at line 1 in the original (showing 6 lines) and line 1 in the modified (showing 7 lines).

Line prefixes:

Lines starting with a space are context (unchanged)
Lines starting with - are removed from the original
Lines starting with + are added in the modified version

Split (side-by-side) diff format

Split diff shows the original and modified files in two columns. This format is easier to read for large changes because you can see the before and after simultaneously:

Original                    │ Modified
─────────────────────────── │ ───────────────────────────
{                           │ {
  "name": "my-app",        │   "name": "my-app",
  "version": "1.2.0",  [-] │   "version": "1.3.0",  [+]
  "description": "...",     │   "description": "...",
                            │   "license": "MIT",    [+]
  "main": "index.js"       │   "main": "index.js"
}                           │ }

Most code review tools (GitHub, GitLab) let you toggle between unified and split views. Split is better for reviewing changes, while unified is better for copying and applying patches.

Reading git diff output

The git diff command produces unified diffs with some extra metadata:

# Unstaged changes
git diff

# Staged changes
git diff --staged

# Between two commits
git diff abc123 def456

# Specific file
git diff HEAD~1 -- src/app.js

Word-level diffs

Line-level diffs can be noisy when only a small part of a line changed. Git supports word-level highlighting:

git diff --word-diff

This produces output like:

"version": [-"1.2.0"-]{+"1.3.0"+},

Where [-...-] shows removed text and {+...+} shows added text within the same line.

Stat summary

For a high-level overview of changes:

git diff --stat

 src/app.js     | 12 ++++++------
 src/config.json |  3 ++-
 tests/app.test.js | 25 +++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 9 deletions(-)

Practical use cases

Code review

The most common use of diffs. When reviewing a pull request, focus on:

What was removed (potential regressions)
What was added (new logic to understand)
Context lines (does the change fit its surroundings)

Configuration debugging

When a deployment breaks, comparing the current config with the last known working version instantly reveals what changed:

diff production.env staging.env

Database migration verification

Before running a migration, diff the generated SQL against the expected schema change to catch unintended modifications.

Document comparison

Diffs are not limited to code. Comparing legal documents, API specifications, or requirements documents helps track what changed between revisions.

API response debugging

Save an API response, make a change, save the new response, and diff them. This is often the fastest way to understand the impact of a code change on API output.

Tips for better diffs

Keep commits small and focused. A diff that changes 5 lines across 2 files is easy to review. A diff that changes 500 lines across 30 files is not.

Use meaningful whitespace settings. Sometimes whitespace changes add noise. Use git diff -w to ignore whitespace or git diff --ignore-blank-lines to skip blank line changes.

Use patience diff for better results. The default diff algorithm sometimes produces confusing output for heavily modified files. The patience algorithm often produces more human-readable diffs:

git diff --patience

Summary

Try our Diff Viewer to compare any two texts side by side instantly -- right in your browser, no upload required.

Understanding Diffs: A Visual Guide to Text Comparison

Article

Understanding Diffs: A Visual Guide to Text Comparison

How diffing works: the LCS algorithm

Unified diff format

Split (side-by-side) diff format

Reading git diff output

Word-level diffs

Stat summary

Practical use cases

Code review

Configuration debugging

Database migration verification

Document comparison

API response debugging

Tips for better diffs

Summary

Understanding Diffs: A Visual Guide to Text Comparison

Article

Understanding Diffs: A Visual Guide to Text Comparison

How diffing works: the LCS algorithm

Unified diff format

Split (side-by-side) diff format

Reading git diff output

Word-level diffs

Stat summary

Practical use cases

Code review

Configuration debugging

Database migration verification

Document comparison

API response debugging

Tips for better diffs

Summary