URL to Markdown — How to Convert Any Web Page to Clean Markdown

Q: Can I convert a URL directly without copying HTML?

Craft Markdown's HTML converter accepts pasted HTML content. For direct URL conversion, browser extensions like MarkDownload are the easiest option. Command-line tools like Pandoc with curl can also fetch and convert URLs in one step.

Q: What about JavaScript-rendered pages (SPAs)?

Use the copy-and-paste method from the rendered page, a browser extension that sees the rendered DOM, or tools like Puppeteer/Playwright to render the page before conversion.

Q: Can I batch convert an entire website to markdown?

Yes, combine wget or curl to download pages with Pandoc for batch conversion. For WordPress sites, export-and-convert is more reliable than scraping.

Q: Is web scraping legal?

Converting publicly available web pages for personal use is generally acceptable. Republishing may violate copyright. Always respect robots.txt and terms of service.

Q: How do I preserve images from web pages?

Download images locally alongside your markdown file and update the image paths. Browser extensions like MarkDownload often handle image downloading automatically.

Web pages are designed for browsers — full of CSS, JavaScript, navigation menus, ad scripts, tracking pixels, and cookie banners. When you need the actual content from a web page in a portable, clean format, markdown is the answer.

Whether you're archiving articles for Obsidian, migrating a WordPress site to a static site generator, preparing web content for an AI knowledge base, or extracting reference material for documentation, converting web pages to markdown strips away the visual noise and gives you pure, structured content.

This guide covers every method — from the simplest (copy-paste into a browser tool) to the most powerful (command-line batch conversion) — so you can choose the right approach for your use case.

Why Convert Web Pages to Markdown?

Saving a web page as HTML gives you everything — including everything you don't want. A typical web page's HTML file includes:

CSS stylesheets (layout, fonts, colors, responsive design rules)
JavaScript (analytics, tracking, interactive features, ad loading)
Navigation menus, sidebars, and footer boilerplate
Cookie consent banners and popup scripts
Social sharing buttons and comment sections
Embedded tracking pixels and third-party scripts

For a 2,000-word article, the actual content might be 10% of the HTML file. The rest is noise.

Markdown gives you just the content — clean headings, paragraphs, lists, tables, links, and emphasis — in a lightweight, portable, future-proof format.

Common Reasons to Convert Web Pages to Markdown

Personal knowledge management — Save articles, tutorials, and reference material to Obsidian, Logseq, or other PKM tools in a format that's searchable, linkable, and version-controllable
Website migration — Move content from WordPress, Drupal, Squarespace, or any CMS to static site generators like Hugo, Astro, Jekyll, or Eleventy
Content archival — Preserve web content in a future-proof format that doesn't depend on CSS frameworks or JavaScript libraries that may break over time
AI and LLM preparation — Feed clean web content to ChatGPT, Claude, or RAG systems without the token overhead of HTML tags, CSS classes, and JavaScript noise
Documentation creation — Extract technical content from web resources and convert to docs-as-code formats for MkDocs, Docusaurus, or GitBook
Research and analysis — Build datasets from web content for content analysis, competitive research, or academic study

Method 1: Copy and Paste via Craft Markdown (Easiest)

The fastest way to convert a web page to markdown — no installation, no extension, no technical skills required.

Step 1: Get the HTML Content

Option A — Copy from the rendered page:

Navigate to the web page in your browser
Select the content you want to convert (Ctrl+A for everything, or click and drag to select specific sections)
Right-click → "View Page Source" or press Ctrl+U (Cmd+Option+U on Mac)
Copy the HTML source code

Option B — Copy from Developer Tools (more precise):

Right-click on the article content area → "Inspect" (or press F12)
In the Elements panel, find the <article>, <main>, or content <div>
Right-click the element → "Copy" → "Copy outerHTML"
This gives you just the content container, without navigation and footer

Option B is more work but produces cleaner results because you're selecting only the content portion of the page.

Step 2: Convert with Craft Markdown

Go to Craft Markdown's HTML to Markdown converter
Paste the HTML content into the input area
The converter instantly produces clean markdown
Preview the result to verify headings, lists, and links converted correctly
Copy to clipboard or download as a .md file

Why this method works well:

Completely private — the HTML never leaves your browser
Strips CSS, JavaScript, navigation, ads, and tracking automatically
Preserves headings, lists, tables, links, and emphasis
No installation, no account, no extensions to manage
Works on any device with a browser

Step 3: Clean Up (If Needed)

After conversion, you may want to:

Remove navigation links and menu items that were included in your selection
Delete footer content, copyright notices, and legal boilerplate
Fix image paths — relative URLs (like /images/photo.jpg) won't work outside the original site
Remove tracking parameters from URLs (the ?utm_source=... parts)
Add YAML frontmatter if you're saving to a knowledge base (title, source URL, date)

For most articles and blog posts, the conversion output is clean enough to use immediately. Complex pages with sidebars, multi-column layouts, or heavy JavaScript rendering may need a few minutes of cleanup.

Method 2: Browser Extensions (Best for Regular Use)

If you convert web pages to markdown frequently, a browser extension saves time by converting the current page with a single click.

MarkDownload (Chrome, Firefox, Edge)

The most popular web-to-markdown extension:

One-click conversion of the current page to a .md file
Automatically downloads the markdown file
Configurable: choose what to include/exclude (images, links, metadata)
Supports custom filename templates
Free and open-source

Best for: Users who frequently save articles and want a one-click workflow.

Obsidian Web Clipper (Chrome, Firefox, Edge, Safari)

Built specifically for Obsidian users:

Converts web pages to markdown and saves directly to your Obsidian vault
Adds metadata automatically — title, URL, date saved, author
Customizable templates for consistent note formatting
Tag support for organization
Works with Obsidian's folder and naming conventions

Best for: Obsidian users building a web-sourced knowledge base.

Joplin Web Clipper (Chrome, Firefox)

For Joplin notebook users:

Clips web pages as markdown into Joplin notebooks
Multiple modes: simplified page, complete page, or selection only
Preserves images by downloading them locally
Tags and notebook selection during clipping

Best for: Joplin users who want integrated web clipping.

Extension Limitations

Browser extensions are convenient but come with trade-offs:

Varying conversion quality — some extensions handle tables and complex formatting poorly
Privacy varies — check whether the extension processes content locally or sends it to a server
Maintenance risk — extensions can become abandoned; check last update date before installing
No batch processing — one page at a time
Browser-specific — most extensions only work in certain browsers

For maximum control over conversion quality and privacy, the copy-paste method via Craft Markdown remains the most reliable approach.

Method 3: Command-Line Tools (For Developers)

When you need to convert many pages or build conversion into an automated workflow, command-line tools offer the most power and flexibility.

Pandoc

The industry-standard document converter handles HTML to markdown conversion:

Convert a local HTML file:

pandoc input.html -f html -t markdown -o output.md

Download and convert a URL in one command:

curl -s "https://example.com/article" | pandoc -f html -t markdown -o output.md

Batch convert all HTML files in a directory:

for file in pages/*.html; do
  pandoc "$file" -f html -t markdown -o "markdown/$(basename "$file" .html).md"
done

Pros: Extremely powerful, scriptable, handles edge cases well, excellent markdown output quality.

Cons: Requires installation, command-line comfort, and configuration for optimal results.

Turndown (JavaScript)

A JavaScript library for building HTML-to-markdown conversion into your own tools:

const TurndownService = require('turndown');
const turndown = new TurndownService();

const html = '<h1>Hello</h1><p>This is a <strong>test</strong>.</p>';
const markdown = turndown.turndown(html);
console.log(markdown);
// # Hello
// This is a **test**.

Turndown is actually what many browser-based converters (including several extensions) use under the hood. It's lightweight, configurable, and handles most HTML structures well.

html2text (Python)

For Python developers who want to convert HTML in data pipelines:

import html2text
import requests

response = requests.get("https://example.com/article")
converter = html2text.HTML2Text()
converter.ignore_links = False
converter.ignore_images = False
converter.body_width = 0  # Don't wrap lines

markdown = converter.handle(response.text)
print(markdown)

Best for: Python developers building web scraping or content processing pipelines.

Readability + Turndown (Node.js — Best for Articles)

For extracting just the article content from cluttered web pages, combine Mozilla's Readability (which extracts the main content) with Turndown (which converts to markdown):

const { Readability } = require('@mozilla/readability');
const { JSDOM } = require('jsdom');
const TurndownService = require('turndown');

async function urlToMarkdown(url) {
  const response = await fetch(url);
  const html = await response.text();
  const dom = new JSDOM(html, { url });
  const article = new Readability(dom.window.document).parse();

  const turndown = new TurndownService();
  return turndown.turndown(article.content);
}

This approach produces the cleanest results because Readability strips navigation, sidebars, and ads before Turndown converts the remaining article content.

Use Case Guides

Migrating WordPress to a Static Site Generator

WordPress stores content as HTML in a database. To migrate to Hugo, Astro, Jekyll, Eleventy, or another static site generator, you need that content as markdown files.

Workflow:

Export from WordPress — Go to Tools → Export → All content. This produces an XML file with all your posts and pages.
Extract post HTML — Each post's content is stored as HTML within the XML export. You can extract these manually or use a migration tool.
Convert HTML to markdown — For each post, convert the HTML content to markdown using Craft Markdown (for a few posts) or Pandoc (for hundreds of posts).
Add frontmatter — Each markdown file needs YAML frontmatter with the post title, date, categories, tags, and slug.
Place files — Put the .md files in your static site's content directory (e.g., content/posts/ for Hugo).
Fix internal links — Update internal links to use the new URL structure.
Migrate media — Download images and other media files separately and update image paths in the markdown.

Tips for WordPress migration:

WordPress shortcodes ([gallery], [embed], etc.) need to be removed or converted to standard HTML/markdown equivalents
Plugins like "WordPress to Hugo Exporter" or "wordpress-export-to-markdown" automate much of this process
Start with a small batch (10 posts) to verify the conversion quality before doing the full migration
Budget time for cleanup — automated conversion gets you 80-90% of the way there

Building an Obsidian Knowledge Base from Web Content

Obsidian's power comes from linking ideas across notes. Converting web content to markdown and adding it to your vault creates a growing, interconnected knowledge base.

Workflow:

Find valuable content — Articles, documentation, tutorials, reference material
Convert to markdown — Use the copy-paste method via Craft Markdown or a browser extension like MarkDownload
Add frontmatter metadata:

---
source: https://example.com/article
author: Jane Smith
date_saved: 2026-02-22
tags: [ai, rag, markdown]
---

Save to your vault — Use a consistent folder structure (/web-clips/, /references/, or topic-based folders)
Add internal links — Connect the clipped content to your existing notes using [[wikilinks]]
Tag for retrieval — Add tags that match your existing taxonomy

Best practices:

Always include the source URL so you can verify or revisit the original
Add a one-sentence summary at the top in your own words — this helps with recall and search
Don't clip everything — be selective about what earns a place in your vault
Review and link new clips to existing notes within 24 hours

Preparing Web Content for AI and RAG Systems

When building an AI knowledge base from web content, the conversion quality directly impacts retrieval accuracy and response quality.

Workflow:

Identify target content — Documentation pages, knowledge base articles, FAQ pages, product information
Convert HTML to markdown — Strip all CSS, JavaScript, navigation, and ads. Keep only the article content.
Clean the output — Remove boilerplate, fix broken formatting, verify table structure
Chunk by headings — Split the markdown at ## or ### boundaries for optimal RAG retrieval
Generate embeddings — Process each chunk through your embedding model
Store in vector database — Chroma, Pinecone, Weaviate, Qdrant, or pgvector

Why markdown is better than raw HTML for RAG:

25-75% fewer tokens — no CSS, JavaScript, or HTML tag overhead
Cleaner embeddings — content-focused text produces more accurate vector representations
Better retrieval — heading-based chunks match user queries more accurately than arbitrary HTML splits
Consistent format — all web sources become the same clean structure in your knowledge base

For a deep dive on RAG document preparation, see our PDF to Markdown for RAG Systems guide.

Tips for Better Web Page to Markdown Conversion

Select specific content, not the entire page. If you copy the full page source, you'll get navigation, sidebars, footers, and ads mixed into your markdown. Focus on the <article> or <main> element for cleaner results.
Check table conversion carefully. Complex HTML tables with colspan, rowspan, or nested tables may not convert perfectly. Verify tables and fix formatting if needed.
Handle images deliberately. Markdown references images by URL. If the original page uses relative paths (/images/photo.jpg), those won't work in your local markdown file. Either convert to absolute URLs or download the images locally and update the paths.
Remove tracking parameters from URLs. Links often include ?utm_source=newsletter&utm_medium=email tracking parameters. Strip these for cleaner, shorter links.
Add source metadata. Always record where the content came from — the original URL, author, and date. This is essential for attribution, fact-checking, and finding the original if you need to re-verify.
Test readability. Preview the markdown in a renderer (GitHub, Obsidian, VS Code, or any markdown preview tool) to make sure it reads well as a standalone document without the original page's visual design.

Frequently Asked Questions

Can I convert a URL directly without copying HTML?

Craft Markdown's HTML converter accepts pasted HTML content — you copy the source and paste it in. For direct URL-to-markdown conversion without copying, browser extensions like MarkDownload are the easiest option. Command-line tools like Pandoc with curl can also fetch and convert URLs in one step.

How do I handle pages behind a login?

Log in to the site in your browser, then use the copy-and-paste method (view source or inspect element) or a browser extension to convert the authenticated page. The extension or copy-paste approach works because your browser already has the session. Command-line tools require passing cookies or session tokens, which is more complex.

What about JavaScript-rendered pages (SPAs)?

Pages built with React, Vue, Angular, or other JavaScript frameworks render content dynamically. The raw HTML source may be empty or contain only a loading spinner. For these pages: use the copy-and-paste method from the rendered page (your browser has already executed the JavaScript), use a browser extension (which sees the rendered DOM), or use Puppeteer/Playwright in your code to render the page before conversion.

Can I batch convert an entire website to markdown?

For bulk website conversion, combine wget or curl to download pages with Pandoc for batch conversion. For WordPress sites, the export-and-convert approach is more reliable than scraping. For documentation sites, many static site generators have export tools that produce markdown directly.

Is web scraping legal?

Converting publicly available web pages for personal use, research, or reference is generally acceptable. Republishing converted content may violate copyright. Always respect robots.txt, terms of service, and copyright when converting web content at scale. When in doubt, link to the original rather than republishing.

How do I preserve images from web pages?

Markdown references images via URL — if the original site goes down, the images break. To preserve them locally: download images to a folder alongside your markdown file, then update the image paths in the markdown to point to your local copies. Browser extensions like MarkDownload often handle image downloading automatically.

What's the best method for non-technical users?

The copy-and-paste approach via Craft Markdown requires zero technical skills — just copy the page source and paste it. If you convert pages frequently, install the MarkDownload browser extension for one-click conversion. Both approaches are free and require no setup.

How to Convert Any Web Page to Markdown

Why Convert Web Pages to Markdown?

Common Reasons to Convert Web Pages to Markdown

Method 1: Copy and Paste via Craft Markdown (Easiest)

Step 1: Get the HTML Content

Step 2: Convert with Craft Markdown

Step 3: Clean Up (If Needed)

Method 2: Browser Extensions (Best for Regular Use)

MarkDownload (Chrome, Firefox, Edge)

Obsidian Web Clipper (Chrome, Firefox, Edge, Safari)

Joplin Web Clipper (Chrome, Firefox)

Extension Limitations

Method 3: Command-Line Tools (For Developers)

Pandoc

Turndown (JavaScript)

html2text (Python)

Readability + Turndown (Node.js — Best for Articles)

Use Case Guides

Migrating WordPress to a Static Site Generator

Building an Obsidian Knowledge Base from Web Content

Preparing Web Content for AI and RAG Systems

Tips for Better Web Page to Markdown Conversion

Frequently Asked Questions

Can I convert a URL directly without copying HTML?

How do I handle pages behind a login?

What about JavaScript-rendered pages (SPAs)?

Can I batch convert an entire website to markdown?

Is web scraping legal?

How do I preserve images from web pages?

What's the best method for non-technical users?

Convert web content to clean markdown