Repository avatar
Version Control
v1.0.0
active

extract-llms-docs

io.github.nirholas/extract-llms-docs

Extract llms.txt from any docs site - Mintlify Docusaurus GitBook parser

Documentation

llm.energy

llm.energy - Extract Documentation for AI Agents

Extract llms.txt documentation and install.md instructions from any website for AI agents, LLMs, and automation workflows.

Website Documentation npm version

TypeScript Next.js MIT License PRs Welcome

πŸ“– Overview

llm.energy is a web application and MCP server that fetches, parses, and organizes documentation from websites implementing the llms.txt and install.md standards. It transforms raw documentation into structured, agent-ready formats optimized for large language models, AI assistants, and developer tooling.

StandardDescriptionLearn More
llms.txtMachine-readable documentation for AI systemsllmstxt.org
install.mdLLM-executable installation instructionsinstallmd.org

✨ Key Features

FeatureDescription
πŸ” Smart DetectionAutomatically finds llms.txt, llms-full.txt, and install.md
πŸ“„ Organized OutputSplits content into individual markdown files by section
πŸ€– Agent-ReadyIncludes AGENT-GUIDE.md optimized for AI assistants
πŸ“¦ Multiple FormatsExport as Markdown, JSON, YAML, or ZIP archive
πŸ”Œ MCP ServerIntegrate with Claude Desktop, Cursor, and more
⚑ Batch ProcessingProcess multiple URLs simultaneously
πŸ“š Site DirectoryBrowse 19+ curated llms.txt-enabled websites
✏️ llms.txt GeneratorCreate your own llms.txt files with a guided wizard
πŸš€ install.md GeneratorAI-powered: generate from GitHub repos, docs URLs, or manually

🎯 Use Cases

─────────────────────────────────────────────────────────────────
                                                                 
   πŸ“ Feed docs to AI coding assistants (Cursor, Windsurf)       
   πŸ€– Build context-aware AI agents with up-to-date docs         
   πŸ”„ Create documentation pipelines for RAG systems            
   πŸ“¦ Aggregate docs from multiple sources automatically        
   ✏️  Generate llms.txt/install.md for your own projects       
   πŸš€ Auto-generate install.md from any GitHub repo             
                                                                 
─────────────────────────────────────────────────────────────────

πŸš€ install.md Generator

Generate LLM-executable installation instructions from any source:

ModeDescriptionUse Case
From GitHubAnalyze any public GitHub repo and generate install.mdPerfect for creating install.md for existing projects
From URLExtract from any documentation pageConvert existing docs to install.md format
ManualBuild from scratch with guided wizardFull control over every detail

How It Works

  1. GitHub Mode: Analyzes README, package.json/pyproject.toml/Cargo.toml, GitHub Actions, and releases
  2. URL Mode: Scrapes documentation pages, detects platforms (Mintlify, Docusaurus, GitBook, etc.)
  3. AI Synthesis: Uses Claude to generate a properly formatted install.md

Quick Example

# Generate install.md for any GitHub project
curl -X POST https://llm.energy/api/generate-install \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com/anthropics/anthropic-sdk-python", "type": "github"}'

Visit llm.energy/install-generator to use the web interface.


πŸ“š Documentation

Full documentation is available at llm.energy/docs


πŸš€ Installation

Web Application

Visit llm.energy to use the hosted version.

Local Development

# Clone the repository
git clone https://github.com/nirholas/extract-llms-docs.git
cd extract-llms-docs

# Install dependencies
pnpm install

# Start development server
pnpm dev

🌐 The application runs on http://localhost:3001

MCP Server

πŸ“¦ Click to expand MCP Server setup

Add to your MCP client configuration (Claude Desktop, Cursor, etc.):

{
  "mcpServers": {
    "llm-energy": {
      "command": "npx",
      "args": ["-y", "@llm-energy/mcp-server"]
    }
  }
}

See MCP Server Documentation for detailed setup.


πŸ”Œ API Reference

POST /api/extract - Extract documentation from a URL
curl -X POST https://llm.energy/api/extract \
  -H "Content-Type: application/json" \
  -d '{"url": "docs.anthropic.com"}'

Response includes parsed sections, metadata, and download URLs.

POST /api/validate - Check if URL has llms.txt support
curl -X POST https://llm.energy/api/validate \
  -H "Content-Type: application/json" \
  -d '{"url": "docs.example.com"}'
POST /api/batch - Process multiple URLs
curl -X POST https://llm.energy/api/batch \
  -H "Content-Type: application/json" \
  -d '{"urls": ["docs.anthropic.com", "docs.stripe.com"]}'
GET /api/download - Download in various formats
# Formats: markdown, json, yaml, zip
curl "https://llm.energy/api/download?url=docs.anthropic.com&format=zip"
POST /api/generate-install - AI-generate install.md from GitHub or docs URL
# Generate from a GitHub repository
curl -X POST https://llm.energy/api/generate-install \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com/anthropics/anthropic-sdk-python", "type": "github"}'

# Generate from a documentation URL
curl -X POST https://llm.energy/api/generate-install \
  -H "Content-Type: application/json" \
  -d '{"url": "https://docs.anthropic.com/en/docs/quickstart", "type": "docs"}'

# Analyze only (preview mode)
curl "https://llm.energy/api/generate-install?url=https://github.com/anthropics/anthropic-sdk-python&type=github"

πŸ“– Full API Reference β†’


πŸ€– MCP Server Tools

The MCP server exposes the following tools for AI agents:

ToolDescription
extract_docsExtract documentation from a URL with llms.txt support
validate_urlCheck if a URL has llms.txt available
verify_llms_txtVerify llms.txt exists and get file info
discover_documentation_urlsFind documentation URLs for a domain
list_sitesGet directory of known llms.txt-enabled sites
search_sitesSearch the site directory by category or keyword

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        llm.energy                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚  β”‚   Web App   β”‚  β”‚  REST API   β”‚  β”‚ MCP Server  β”‚               β”‚
β”‚  β”‚  (Next.js)  β”‚  β”‚  /api/*     β”‚  β”‚   (stdio)   β”‚               β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚         β”‚                β”‚                β”‚                      β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β”‚                          β”‚                                       β”‚
β”‚                   β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”                                β”‚
β”‚                   β”‚    Core     β”‚                                β”‚
β”‚                   β”‚  - Parser   β”‚                                β”‚
β”‚                   β”‚  - Extractorβ”‚                                β”‚
β”‚                   β”‚  - Cache    β”‚                                β”‚
β”‚                   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                                β”‚
β”‚                          β”‚                                       β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚         β–Ό                β–Ό                β–Ό                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚  β”‚  llms.txt   β”‚  β”‚ install.md  β”‚  β”‚   Sitemap   β”‚               β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
πŸ“ Project Structure
extract-llms-docs/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ app/              # Next.js pages and API routes
β”‚   β”‚   β”œβ”€β”€ api/          # REST API endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ extract/  # Documentation extraction API
β”‚   β”‚   β”‚   β”œβ”€β”€ generate-install/  # AI-powered install.md generation
β”‚   β”‚   β”‚   └── ...       # Other API endpoints
β”‚   β”‚   β”œβ”€β”€ extract/      # Extraction interface
β”‚   β”‚   β”œβ”€β”€ batch/        # Batch processing page
β”‚   β”‚   β”œβ”€β”€ directory/    # Site directory browser
β”‚   β”‚   β”œβ”€β”€ generator/    # llms.txt generator wizard
β”‚   β”‚   └── install-generator/  # install.md generator with tabs
β”‚   β”œβ”€β”€ components/       # React UI components
β”‚   β”‚   └── install-generator/  # GitHubTab, UrlTab, Preview
β”‚   β”œβ”€β”€ lib/              # Core utilities
β”‚   β”‚   β”œβ”€β”€ github-analyzer.ts  # GitHub repo analysis
β”‚   β”‚   β”œβ”€β”€ docs-analyzer.ts    # Documentation URL scraping
β”‚   β”‚   └── ...           # Parser, extractor, cache
β”‚   β”œβ”€β”€ hooks/            # React hooks
β”‚   └── types/            # TypeScript definitions
β”œβ”€β”€ packages/core/        # Shared parser and types
β”œβ”€β”€ mcp-server/           # MCP server package
└── docs-site/            # MkDocs documentation source

πŸ› οΈ Technology Stack

Next.js TypeScript Tailwind CSS Framer Motion MCP SDK Vitest


βš™οΈ Configuration

Environment variables (optional):

RATE_LIMIT_REQUESTS=100     # Max requests per window
RATE_LIMIT_WINDOW_MS=60000  # Window duration in ms
CACHE_TTL=3600              # Cache time-to-live in seconds
ADMIN_KEY=your-secret-key   # Admin API key (required for cache management)

πŸ’» Development

pnpm dev          # πŸš€ Start development server
pnpm build        # πŸ“¦ Production build
pnpm start        # ▢️  Start production server
pnpm test         # πŸ§ͺ Run tests (163 tests)
pnpm test:coverage # πŸ“Š Tests with coverage report
pnpm typecheck    # βœ… TypeScript validation
pnpm lint         # πŸ” ESLint check

πŸ”— Related Projects

ProjectDescription
llms.txt StandardMachine-readable documentation format
install.md StandardLLM-executable installation format
Model Context ProtocolProtocol for AI tool integration
MCP Servers DirectoryCommunity MCP servers

🌐 Sites with llms.txt Support

Browse 19+ curated websites with verified llms.txt support at llm.energy/directory

Featured sites include:

  • πŸ€– Anthropic Documentation
  • ⚑ Vercel Documentation
  • πŸ’³ Stripe API Reference
  • πŸ—„οΈ Supabase Docs
  • πŸ“˜ Mintlify Documentation
  • ...and more!

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. 🍴 Fork the repository
  2. 🌿 Create a feature branch (git checkout -b feature/amazing-feature)
  3. πŸ’Ύ Commit your changes (git commit -m 'Add amazing feature')
  4. πŸ“€ Push to the branch (git push origin feature/amazing-feature)
  5. πŸ”€ Open a Pull Request

Adding a site to the directory? Edit src/data/sites.ts and submit a PR.


πŸ“„ License

MIT License - see LICENSE for details


πŸ”— Links

🌐 Website β€’ πŸ“š Documentation β€’ πŸ’» GitHub β€’ πŸ“¦ NPM


πŸ‘€ Author

Built by nich

Twitter


Made with ⚑ for the AI community

Author

Built by nich - https://x.com/nichxbt


Keywords: llms.txt, llms-full.txt, documentation extraction, AI documentation, LLM context, Model Context Protocol, MCP server, AI agents, documentation parser, markdown extraction, API documentation, developer tools, AI coding assistant, RAG, retrieval augmented generation, context injection, AI-friendly documentation, vibe coding, cursor, windsurf, claude, chatgpt, copilot