
extract-llms-docs
io.github.nirholas/extract-llms-docs
Extract llms.txt from any docs site - Mintlify Docusaurus GitBook parser
Documentation
llm.energy
Extract llms.txt documentation and install.md instructions from any website for AI agents, LLMs, and automation workflows.
π Overview
llm.energy is a web application and MCP server that fetches, parses, and organizes documentation from websites implementing the llms.txt and install.md standards. It transforms raw documentation into structured, agent-ready formats optimized for large language models, AI assistants, and developer tooling.
| Standard | Description | Learn More |
|---|---|---|
| llms.txt | Machine-readable documentation for AI systems | llmstxt.org |
| install.md | LLM-executable installation instructions | installmd.org |
β¨ Key Features
| Feature | Description |
|---|---|
| π Smart Detection | Automatically finds llms.txt, llms-full.txt, and install.md |
| π Organized Output | Splits content into individual markdown files by section |
| π€ Agent-Ready | Includes AGENT-GUIDE.md optimized for AI assistants |
| π¦ Multiple Formats | Export as Markdown, JSON, YAML, or ZIP archive |
| π MCP Server | Integrate with Claude Desktop, Cursor, and more |
| β‘ Batch Processing | Process multiple URLs simultaneously |
| π Site Directory | Browse 19+ curated llms.txt-enabled websites |
| βοΈ llms.txt Generator | Create your own llms.txt files with a guided wizard |
| π install.md Generator | AI-powered: generate from GitHub repos, docs URLs, or manually |
π― Use Cases
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Feed docs to AI coding assistants (Cursor, Windsurf)
π€ Build context-aware AI agents with up-to-date docs
π Create documentation pipelines for RAG systems
π¦ Aggregate docs from multiple sources automatically
βοΈ Generate llms.txt/install.md for your own projects
π Auto-generate install.md from any GitHub repo
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π install.md Generator
Generate LLM-executable installation instructions from any source:
| Mode | Description | Use Case |
|---|---|---|
| From GitHub | Analyze any public GitHub repo and generate install.md | Perfect for creating install.md for existing projects |
| From URL | Extract from any documentation page | Convert existing docs to install.md format |
| Manual | Build from scratch with guided wizard | Full control over every detail |
How It Works
- GitHub Mode: Analyzes README, package.json/pyproject.toml/Cargo.toml, GitHub Actions, and releases
- URL Mode: Scrapes documentation pages, detects platforms (Mintlify, Docusaurus, GitBook, etc.)
- AI Synthesis: Uses Claude to generate a properly formatted install.md
Quick Example
# Generate install.md for any GitHub project
curl -X POST https://llm.energy/api/generate-install \
-H "Content-Type: application/json" \
-d '{"url": "https://github.com/anthropics/anthropic-sdk-python", "type": "github"}'
Visit llm.energy/install-generator to use the web interface.
π Documentation
Full documentation is available at llm.energy/docs
π Installation
Web Application
Visit llm.energy to use the hosted version.
Local Development
# Clone the repository
git clone https://github.com/nirholas/extract-llms-docs.git
cd extract-llms-docs
# Install dependencies
pnpm install
# Start development server
pnpm dev
π The application runs on http://localhost:3001
MCP Server
π¦ Click to expand MCP Server setup
Add to your MCP client configuration (Claude Desktop, Cursor, etc.):
{
"mcpServers": {
"llm-energy": {
"command": "npx",
"args": ["-y", "@llm-energy/mcp-server"]
}
}
}
See MCP Server Documentation for detailed setup.
π API Reference
POST /api/extract - Extract documentation from a URL
curl -X POST https://llm.energy/api/extract \
-H "Content-Type: application/json" \
-d '{"url": "docs.anthropic.com"}'
Response includes parsed sections, metadata, and download URLs.
POST /api/validate - Check if URL has llms.txt support
curl -X POST https://llm.energy/api/validate \
-H "Content-Type: application/json" \
-d '{"url": "docs.example.com"}'
POST /api/batch - Process multiple URLs
curl -X POST https://llm.energy/api/batch \
-H "Content-Type: application/json" \
-d '{"urls": ["docs.anthropic.com", "docs.stripe.com"]}'
GET /api/download - Download in various formats
# Formats: markdown, json, yaml, zip
curl "https://llm.energy/api/download?url=docs.anthropic.com&format=zip"
POST /api/generate-install - AI-generate install.md from GitHub or docs URL
# Generate from a GitHub repository
curl -X POST https://llm.energy/api/generate-install \
-H "Content-Type: application/json" \
-d '{"url": "https://github.com/anthropics/anthropic-sdk-python", "type": "github"}'
# Generate from a documentation URL
curl -X POST https://llm.energy/api/generate-install \
-H "Content-Type: application/json" \
-d '{"url": "https://docs.anthropic.com/en/docs/quickstart", "type": "docs"}'
# Analyze only (preview mode)
curl "https://llm.energy/api/generate-install?url=https://github.com/anthropics/anthropic-sdk-python&type=github"
π€ MCP Server Tools
The MCP server exposes the following tools for AI agents:
| Tool | Description |
|---|---|
extract_docs | Extract documentation from a URL with llms.txt support |
validate_url | Check if a URL has llms.txt available |
verify_llms_txt | Verify llms.txt exists and get file info |
discover_documentation_urls | Find documentation URLs for a domain |
list_sites | Get directory of known llms.txt-enabled sites |
search_sites | Search the site directory by category or keyword |
ποΈ Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β llm.energy β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Web App β β REST API β β MCP Server β β
β β (Next.js) β β /api/* β β (stdio) β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β
β β β β β
β ββββββββββββββββββΌβββββββββββββββββ β
β β β
β ββββββββΌβββββββ β
β β Core β β
β β - Parser β β
β β - Extractorβ β
β β - Cache β β
β ββββββββ¬βββββββ β
β β β
β ββββββββββββββββββΌβββββββββββββββββ β
β βΌ βΌ βΌ β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β llms.txt β β install.md β β Sitemap β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Project Structure
extract-llms-docs/
βββ src/
β βββ app/ # Next.js pages and API routes
β β βββ api/ # REST API endpoints
β β β βββ extract/ # Documentation extraction API
β β β βββ generate-install/ # AI-powered install.md generation
β β β βββ ... # Other API endpoints
β β βββ extract/ # Extraction interface
β β βββ batch/ # Batch processing page
β β βββ directory/ # Site directory browser
β β βββ generator/ # llms.txt generator wizard
β β βββ install-generator/ # install.md generator with tabs
β βββ components/ # React UI components
β β βββ install-generator/ # GitHubTab, UrlTab, Preview
β βββ lib/ # Core utilities
β β βββ github-analyzer.ts # GitHub repo analysis
β β βββ docs-analyzer.ts # Documentation URL scraping
β β βββ ... # Parser, extractor, cache
β βββ hooks/ # React hooks
β βββ types/ # TypeScript definitions
βββ packages/core/ # Shared parser and types
βββ mcp-server/ # MCP server package
βββ docs-site/ # MkDocs documentation source
π οΈ Technology Stack
βοΈ Configuration
Environment variables (optional):
RATE_LIMIT_REQUESTS=100 # Max requests per window
RATE_LIMIT_WINDOW_MS=60000 # Window duration in ms
CACHE_TTL=3600 # Cache time-to-live in seconds
ADMIN_KEY=your-secret-key # Admin API key (required for cache management)
π» Development
pnpm dev # π Start development server
pnpm build # π¦ Production build
pnpm start # βΆοΈ Start production server
pnpm test # π§ͺ Run tests (163 tests)
pnpm test:coverage # π Tests with coverage report
pnpm typecheck # β
TypeScript validation
pnpm lint # π ESLint check
π Related Projects
| Project | Description |
|---|---|
| llms.txt Standard | Machine-readable documentation format |
| install.md Standard | LLM-executable installation format |
| Model Context Protocol | Protocol for AI tool integration |
| MCP Servers Directory | Community MCP servers |
π Sites with llms.txt Support
Browse 19+ curated websites with verified llms.txt support at llm.energy/directory
Featured sites include:
- π€ Anthropic Documentation
- β‘ Vercel Documentation
- π³ Stripe API Reference
- ποΈ Supabase Docs
- π Mintlify Documentation
- ...and more!
π€ Contributing
Contributions are welcome! Please follow these steps:
- π΄ Fork the repository
- πΏ Create a feature branch (
git checkout -b feature/amazing-feature) - πΎ Commit your changes (
git commit -m 'Add amazing feature') - π€ Push to the branch (
git push origin feature/amazing-feature) - π Open a Pull Request
Adding a site to the directory? Edit src/data/sites.ts and submit a PR.
π License
MIT License - see LICENSE for details
π Links
π Website β’ π Documentation β’ π» GitHub β’ π¦ NPM
π€ Author
Built by nich
Made with β‘ for the AI community
Author
Built by nich - https://x.com/nichxbt
Keywords: llms.txt, llms-full.txt, documentation extraction, AI documentation, LLM context, Model Context Protocol, MCP server, AI agents, documentation parser, markdown extraction, API documentation, developer tools, AI coding assistant, RAG, retrieval augmented generation, context injection, AI-friendly documentation, vibe coding, cursor, windsurf, claude, chatgpt, copilot
No installation packages available.
Related Servers
ai.smithery/Hint-Services-obsidian-github-mcp
Connect AI assistants to your GitHub-hosted Obsidian vault to seamlessly access, search, and analyβ¦
ai.smithery/anirbanbasu-pymcp
Primarily to be used as a template repository for developing MCP servers with FastMCP in Python, Pβ¦
ai.smithery/neverinfamous-memory-journal-mcp
A MCP server built for developers enabling Git based project management with project and personalβ¦