How Browsers Work: From URL to Pixels on Your Screen

The Question That Starts Everything
You open your browser. You type:
https://google.com
You press Enter.
Within milliseconds, a fully rendered page appears.
But what actually happened?
This article breaks down the journey from URL to pixels in a way that makes sense—no overwhelming specs, just clear understanding.
What Is a Browser, Really?
A browser is not just "a thing that opens websites."
It's actually:
- A rendering engine that turns code into visuals
- A networking client that fetches resources
- A JavaScript runtime that executes code
- A user interface you interact with
Think of it as:
A translator between code (HTML, CSS, JS) and what you see on screen.
Browsers don't just display content—they interpret, process, and render it.
Main Components of a Browser (High-Level)
A browser is made of several cooperating parts:
┌─────────────────────────────────────────────────────┐
│ User Interface │
│ (Address bar, tabs, back/forward buttons, etc.) │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Browser Engine │
│ (Coordinates between UI and rendering) │
└─────────────────────────────────────────────────────┘
↓
┌──────────────────────┬──────────────────────────────┐
│ Rendering Engine │ JavaScript Engine │
│ (Parses HTML/CSS) │ (Executes JS code) │
└──────────────────────┴──────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Networking │
│ (Fetches HTML, CSS, JS, images, etc.) │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ Data Storage │
│ (Cookies, localStorage, cache, etc.) │
└─────────────────────────────────────────────────────┘
Each part has a job. They work together like an assembly line.
The User Interface (What You See and Touch)
The User Interface (UI) includes:
- Address bar
- Back/forward buttons
- Bookmarks
- Tabs
- Refresh button
This is the only part users directly interact with.
Everything else happens behind the scenes.
Browser Engine vs Rendering Engine
These terms sound similar but mean different things.
Browser Engine
- Coordinates between the UI and the rendering engine
- Example: You click "back" → browser engine tells rendering engine to load previous page
Rendering Engine
- Parses HTML and CSS
- Builds the visual representation
- Paints pixels on screen
Examples of rendering engines:
- Blink (Chrome, Edge, Opera)
- WebKit (Safari)
- Gecko (Firefox)
Simple distinction:
Browser engine = conductor
Rendering engine = orchestra
Step 1: Networking — Fetching Resources
When you press Enter after typing a URL:
User types URL
↓
DNS resolution (find IP address)
↓
Establish TCP connection
↓
Send HTTP request
↓
Receive HTTP response (HTML file)
↓
Parse HTML → find CSS, JS, image references
↓
Fetch those resources too
The networking layer:
- Makes HTTP/HTTPS requests
- Downloads HTML, CSS, JavaScript, images, fonts
- Handles caching, cookies, redirects
Key point:
The browser doesn't get everything at once. It fetches resources progressively.
Step 2: HTML Parsing and DOM Creation
The browser receives HTML as text:
<!DOCTYPE html>
<html>
<head>
<title>Example</title>
</head>
<body>
<h1>Hello World</h1>
<p>This is a paragraph.</p>
</body>
</html>
The rendering engine parses this HTML into a tree structure called the DOM (Document Object Model).
HTML → DOM Conversion
HTML Text
↓
┌─────────────────────┐
│ HTML Parser │
└─────────────────────┘
↓
DOM Tree:
Document
|
html
|
┌────┴────┐
head body
| |
title ┌───┴───┐
h1 p
The DOM is:
- A tree representation of your HTML
- Something JavaScript can manipulate
- The structure browsers use internally
Analogy:
HTML is the blueprint.
DOM is the actual building constructed from it.
Step 3: CSS Parsing and CSSOM Creation
Similarly, CSS is parsed into the CSSOM (CSS Object Model).
Example CSS:
body {
font-size: 16px;
}
h1 {
color: blue;
}
p {
margin: 10px;
}
CSS → CSSOM Conversion
CSS Text
↓
┌─────────────────────┐
│ CSS Parser │
└─────────────────────┘
↓
CSSOM Tree:
root
|
┌────┴────┐
body h1 p
(font-size) (color) (margin)
The CSSOM is:
- A tree of style rules
- Used to determine how each element should look
Why separate from DOM?
Because HTML defines structure, CSS defines style. Keeping them separate makes processing efficient.
Step 4: Combining DOM and CSSOM → Render Tree
The browser now has:
- DOM (what to display)
- CSSOM (how to display it)
Next step: combine them into a Render Tree.
DOM CSSOM
| |
└────────┬────────┘
↓
Render Tree
(Only visible elements with styles applied)
html
|
body
|
┌────┴────┐
h1 p
(blue text) (10px margin)
Important:
The Render Tree only includes visible elements.
Elements with display: none are in the DOM but not in the Render Tree.
Step 5: Layout (Reflow) — Calculating Positions
Now the browser knows:
- What to display
- What styles to apply
But it doesn't yet know where to place things.
Layout (also called Reflow) calculates:
- Position of each element
- Size of each element
- How elements flow together
Render Tree
↓
┌─────────────────────┐
│ Layout Engine │
│ (Calculate boxes) │
└─────────────────────┘
↓
Box Model with positions:
┌────────────────────────┐
│ <html> │
│ ┌──────────────────┐ │
│ │ <body> │ │
│ │ ┌────────────┐ │ │
│ │ │ <h1> │ │ │
│ │ └────────────┘ │ │
│ │ ┌────────────┐ │ │
│ │ │ <p> │ │ │
│ │ └────────────┘ │ │
│ └──────────────────┘ │
└────────────────────────┘
Step 6: Painting — Drawing Pixels
The browser now knows:
- What to display
- How it should look
- Where to place it
Painting is the process of filling in pixels.
Layout Information
↓
┌─────────────────────┐
│ Paint Engine │
│ (Fill in pixels) │
└─────────────────────┘
↓
Pixel data sent to screen
The browser:
- Converts elements into actual pixels
- Applies colors, borders, shadows, images
- Handles layering (z-index)
Result:
Visual representation ready for display.
Step 7: Display — Pixels on Screen
Finally, the painted pixels are sent to the display.
The browser uses:
- GPU for rendering (modern browsers)
- Compositing layers for efficiency
You see the webpage.
The Complete Flow: URL to Pixels
Here's the full journey:
1. User enters URL
↓
2. DNS lookup → IP address
↓
3. HTTP request → Fetch HTML
↓
4. Parse HTML → Build DOM
↓
5. Fetch CSS → Parse CSS → Build CSSOM
↓
6. Combine DOM + CSSOM → Render Tree
↓
7. Layout (calculate positions)
↓
8. Paint (fill in pixels)
↓
9. Display on screen
All of this happens in milliseconds.
What Is Parsing? (Simple Explanation)
You might wonder: What does "parsing" actually mean?
Parsing means:
Breaking down text into meaningful structure.
Example: Simple Math Expression
Input (text):
3 + 5 * 2
Parser breaks this down:
Expression Tree:
+
/ \
3 *
/ \
5 2
The parser:
- Reads the text
- Understands the grammar (math rules)
- Builds a tree that represents meaning
Same idea with HTML:
Input:
<div><p>Hello</p></div>
Parser output:
div
|
p
|
"Hello"
Parsing = turning text into structure.
Simple Parsing Example: HTML
Let's see how HTML parsing actually works.
Input HTML:
<div id="container">
<h1>Title</h1>
<p class="text">Paragraph</p>
</div>
Parsing Steps:
Step 1: Tokenization
<div id="container"> → START_TAG(div, id="container")
<h1> → START_TAG(h1)
Title → TEXT("Title")
</h1> → END_TAG(h1)
<p class="text"> → START_TAG(p, class="text")
Paragraph → TEXT("Paragraph")
</p> → END_TAG(p)
</div> → END_TAG(div)
Step 2: Tree Construction
div (id="container")
|
┌─────┴─────┐
h1 p (class="text")
| |
"Title" "Paragraph"
The parser:
- Tokenizes (breaks into pieces)
- Builds a tree (creates structure)
This tree becomes the DOM.
Visual Diagram: Complete Browser Flow
┌──────────────────────────────────────────────────────────┐
│ USER TYPES URL │
└────────────────────────┬─────────────────────────────────┘
↓
┌────────────────────────────────────────────────────────────┐
│ NETWORKING LAYER │
│ • DNS Resolution │
│ • TCP Connection │
│ • HTTP Request/Response │
│ • Fetch HTML, CSS, JS, Images │
└────────────────────────┬───────────────────────────────────┘
↓
┌───────────────┴───────────────┐
↓ ↓
┌─────────────────┐ ┌──────────────────┐
│ HTML PARSER │ │ CSS PARSER │
│ │ │ │
│ Tokenize HTML │ │ Tokenize CSS │
│ Build Tree │ │ Build Tree │
└────────┬────────┘ └────────┬─────────┘
↓ ↓
┌────────┐ ┌─────────┐
│ DOM │ │ CSSOM │
└────┬───┘ └────┬────┘
└──────────┬──────────────────┘
↓
┌────────────────────┐
│ RENDER TREE │
│ (DOM + CSSOM) │
└──────────┬─────────┘
↓
┌────────────────────┐
│ LAYOUT │
│ (Calculate boxes, │
│ positions, sizes)│
└──────────┬─────────┘
↓
┌────────────────────┐
│ PAINT │
│ (Fill in pixels) │
└──────────┬─────────┘
↓
┌────────────────────┐
│ COMPOSITE │
│ (Layer assembly) │
└──────────┬─────────┘
↓
┌────────────────────┐
│ DISPLAY │
│ (Pixels on screen)│
└────────────────────┘
Render Tree Construction (Detailed)
DOM Tree: CSSOM Tree:
html styles
| |
┌─┴─┐ ┌────┴────┐
head body body h1
| | | |
title ┌┴┐ font-size color
h1 p 16px blue
↓ COMBINE ↓
Render Tree:
html
|
body (font-size: 16px)
|
┌───┴───┐
h1 p
(color:blue)
Key insight:
Only visible, styled elements make it to the Render Tree.
What Happens When You Scroll or Resize?
When you interact with a page, the browser might need to:
Reflow (Re-layout)
- When element positions/sizes change
- Example: Window resize, font size change
- Expensive operation
Repaint
- When visual properties change (color, background)
- Less expensive than reflow
Composite
- When layers change (transforms, opacity)
- Cheapest operation (GPU-accelerated)
Performance tip:
Modern web development tries to minimize reflows and maximize compositing.
JavaScript's Role in All This
We haven't talked much about JavaScript yet, but here's where it fits:
DOM is built
↓
JavaScript executes
↓
JavaScript can modify DOM
↓
DOM changes trigger re-rendering
↓
Layout → Paint → Display (again)
JavaScript:
- Can manipulate the DOM
- Can change styles (triggering CSSOM updates)
- Can trigger reflows and repaints
- Runs in a separate engine (V8, SpiderMonkey, JavaScriptCore)
Important:
JavaScript execution can block rendering. This is why script tags can slow down page loads.
Parsing Tree Example: Mathematical Expression
Let's revisit parsing with a clearer example.
Input Expression:
(3 + 5) * 2 - 4
Tokenization:
Tokens:
LPAREN: (
NUMBER: 3
PLUS: +
NUMBER: 5
RPAREN: )
MULTIPLY: *
NUMBER: 2
MINUS: -
NUMBER: 4
Parse Tree:
-
/ \
* 4
/ \
+ 2
/ \
3 5
How to Read This Tree:
- Start from the bottom:
3 + 5 = 8 - Go up:
8 * 2 = 16 - Go to top:
16 - 4 = 12
Result: 12
This is exactly how HTML/CSS parsers work—they build trees that represent meaning.
Browser Rendering Engines at a Glance
| Browser | Rendering Engine | JavaScript Engine |
| Chrome | Blink | V8 |
| Firefox | Gecko | SpiderMonkey |
| Safari | WebKit | JavaScriptCore |
| Edge (modern) | Blink | V8 |
| Opera | Blink | V8 |
You don't need to memorize this.
Just know: different browsers use different engines, but the concepts are the same.
Common Misconceptions About Browsers
Misconception 1: "The browser just displays HTML"
Reality: The browser parses, interprets, styles, lays out, and renders HTML.
Misconception 2: "JavaScript runs in the browser"
Reality: JavaScript runs in a separate engine (V8, SpiderMonkey) that interacts with the browser.
Misconception 3: "CSS is just styling"
Reality: CSS affects layout, rendering performance, and even behavior (animations, transitions).
Misconception 4: "Browsers are simple"
Reality: Modern browsers are some of the most complex software ever built.
Why Understanding This Matters
Knowing how browsers work helps you:
As a Developer:
- Write faster websites (avoid unnecessary reflows)
- Debug rendering issues
- Optimize performance (critical rendering path)
- Understand browser DevTools
As a Learner:
- Appreciate the complexity of the web
- Understand web standards better
- Prepare for interviews (this is a common topic)
As a Problem Solver:
- Diagnose why pages load slowly
- Fix cross-browser compatibility issues
- Make informed architectural decisions
The Critical Rendering Path (Bonus Concept)
The Critical Rendering Path is the sequence of steps browsers take to render a page.
HTML → DOM
CSS → CSSOM
DOM + CSSOM → Render Tree
Render Tree → Layout
Layout → Paint
Paint → Composite → Display
Optimization goal:
Minimize time to first meaningful paint.
Techniques:
- Inline critical CSS
- Defer non-critical JavaScript
- Optimize images
- Minimize render-blocking resources
Key Takeaways
- Browsers are complex systems with multiple cooperating parts
- DOM and CSSOM are tree representations of HTML and CSS
- Parsing means converting text into meaningful structure
- Rendering involves: Layout → Paint → Composite → Display
- Every pixel on screen is the result of this entire process
- JavaScript can modify the DOM, triggering re-rendering
- Understanding this flow makes you a better web developer
You Don't Need to Remember Everything
This is a lot of information.
Good news:
You don't need to memorize every step.
What matters:
- Understanding the flow (URL → DOM → Render Tree → Pixels)
- Knowing why things happen (parsing, layout, paint)
- Having a mental model you can reference
The details will come with practice and experience.
Further Learning
If you want to dive deeper:
- How Browsers Work (classic article by Tali Garsiel)
- MDN Web Docs (rendering, performance)
- Chrome DevTools (Performance tab)
- Web Vitals (Core Web Vitals, FCP, LCP)
- Browser engine source code (Chromium, Gecko)
Final Thoughts
Next time you type a URL and press Enter, remember:
You're not just "opening a website."
You're triggering:
- DNS resolution
- HTTP requests
- HTML parsing
- DOM construction
- CSS parsing
- CSSOM construction
- Render tree building
- Layout calculation
- Pixel painting
- GPU compositing
All in milliseconds.
The web is magic—but it's understandable magic.
And now you understand how browsers make it happen.
Happy browsing. Happy building. 🚀