Repository avatar
AI Tools
v1.15.0
active

arjunkmrm-scrapermcp_el

ai.smithery/arjunkmrm-scrapermcp_el

Extract and parse web pages into clean HTML, links, or Markdown. Handle dynamic, complex, or block…

Documentation

Thordata

Thordata MCP Server

Built on a 195+ country proxy network, Thordata MCP breaks through web data barriers, delivering pure, structured, globally unlimited real-time information streams to AI models

Licence



📖 Overview

ScraperMCP server seamlessly bridges AI and web ecosystems, providing one-click access to any website worldwide, real-time JavaScript rendering, intelligent anti-crawling mechanism bypass, and outputting AI-ready structured data content.

🛠️ MCP Tools

Thordata MCP supports dual-channel data acquisition through unlocker and regular proxies, fully compatible with multiple data formats including MarkDown, HTML, and Links.

Web Scraper API Tool

Thordata MCP provides the parse_with_ai_selectors tool, leveraging Thordata Web Scraper API to implement intelligent scraping of any website.

✅ Prerequisites

Before deployment, please ensure you have:

  • Thordata Web Scraper API Account: Visit thordata to obtain your exclusive username and password;

📦 Configuration

Environment Variables

Thordata MCP server supports the following environment variable configurations:

NameDescriptionDefault Value
UNLOCKER_PROXY_LOGINUnlocker username
UNLOCKER_PROXY_PASSWORDUnlocker password
UNLOCKER_PROXY_URLUnlocker proxy address
DEFAULT_PROXY_LOGINRegular proxy username
DEFAULT_PROXY_PASSWORDRegular proxy password
DEFAULT_PROXY_URLRegular proxy address

Using uv Configuration

  • Install uv package manager:

    # macOS and Linux
    curl -LsSf https://astral.sh/uv/install.sh | sh
    

    Or:

    # Windows
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
    
  • Use the following configuration:

    {
    "mcpServers": {
      "Scraper": {
        "command": "uv",
        "args": [
          "--directory",
          "<absolute folder path>", # e.g., D:\\ScraperMcp
          "run",
          "Scraper.py"
        ]
      }
    }
    

}


### Startup Command
fastmcp run Scraper.py:mcp

### 🖥️ Manual Setup Guide

#### Claude Desktop Configuration
1. Open Claude application
2. Navigate to **Settings → Developer → Edit Configuration**
3. Add the above configuration to the `claude_desktop_config.json` file

#### Cursor AI Configuration  
1. Open Cursor editor
2. Navigate to **Settings → Cursor Settings → MCP**
3. Click **Add New Global MCP Server**
4. Configure corresponding parameters

#### Cline Configuration
1. Open Cline settings
2. Navigate to **MCP Server Settings → Installed**
3. Click **Configure MCP Server**
4. Configure corresponding parameters

### Manual Setup: Cline Settings → MCP Server Settings → Installed → Click Configure MCP Server and configure corresponding parameters

## 🛡️ License

Open source distribution under MIT License - see [LICENSE](LICENSE) file for details.

---

## About Thordata

Thordata, as a market-leading web intelligence collection platform, adheres to the highest business ethics and compliance standards, empowering global enterprises to uncover data-driven business insights.

<div align="center">
<sub>
Made by <a href="https://www.thordata.com/">Thordata</a>, if MCP saves you valuable time, we invite you to give ⭐ support.
</sub>
</div>

## ✨ Core Features

<details>
<summary><strong>Global Website Content Scraping</strong></summary>
<br>

- Supports data extraction from any URL, including complex single-page applications
- Complete JavaScript rendering capability, ensuring perfect presentation of dynamic content
- Flexible rendering mode selection: full JS rendering, pure HTML, or no rendering

</details>

<details>
<summary><strong>Intelligent AI Data Preprocessing</strong></summary>
<br>

- Automated HTML cleaning and conversion to highly readable Markdown
- Intelligent extraction of valid and usable links, optimizing data structure
- Native HTML format support, maintaining data integrity

</details>

<details>
<summary><strong>Global Network Barrier-Free Access</strong></summary>
<br>

- Efficiently bypasses complex anti-crawling protection systems
- Stable scraping of high-difficulty website content
- 195+ country IP pool automatic rotation, breaking geographical restrictions

</details>

<details>
<summary><strong>Cross-Platform Flexible Deployment</strong></summary>
<br>

- Customizable rendering and parsing parameter configuration
- Seamless integration with AI models and analysis tools
- Full support for macOS, Windows, and Linux systems

</details>

---

## Why Choose Thordata MCP?&nbsp;🕸️ ➜ 📦 ➜ 🤖

Just tell the LLM *"Summarize the latest discussions about MCP on Hacker News"* and get precise answers immediately.  
MCP (Multi-Client Protocol) handles all the tedious steps for you:

| Thordata MCP Core Value                                           | Benefits for You                           |
|-------------------------------------------------------------------|-------------------------------------------|
| **Thordata global proxy network intelligently bypasses anti-bot detection** | Ensures access availability and identity anonymity |
| **One-click data acquisition solution**                           | Easily handles complex single-page applications |
| **Multi-format output support (MarkDown/HTML/Links)**             | Precisely matches your data requirements |