
mcp-datahub
io.github.txn2/mcp-datahub
MCP server for DataHub data catalogs. Discover datasets, explore lineage, and access metadata.
Documentation
txn2/mcp-datahub
An MCP server and composable Go library that connects AI assistants to DataHub metadata catalogs. Search datasets, explore schemas, trace lineage, and access glossary terms and domains.
Documentation | Installation | Library Docs
Two Ways to Use
1. Standalone MCP Server
Install and connect to Claude Desktop, Cursor, or any MCP client:
Claude Desktop (Easiest) - Download the .mcpb bundle from releases and double-click to install:
- macOS Apple Silicon:
mcp-datahub_X.X.X_darwin_arm64.mcpb - macOS Intel:
mcp-datahub_X.X.X_darwin_amd64.mcpb - Windows:
mcp-datahub_X.X.X_windows_amd64.mcpb
Other Installation Methods:
# Homebrew (macOS)
brew install txn2/tap/mcp-datahub
# Go install
go install github.com/txn2/mcp-datahub/cmd/mcp-datahub@latest
Manual Claude Desktop Configuration (if not using MCPB):
{
"mcpServers": {
"datahub": {
"command": "/opt/homebrew/bin/mcp-datahub",
"env": {
"DATAHUB_URL": "https://datahub.example.com",
"DATAHUB_TOKEN": "your_token"
}
}
}
}
Multi-Server Configuration
Connect to multiple DataHub instances simultaneously:
# Primary server
export DATAHUB_URL=https://prod.datahub.example.com/api/graphql
export DATAHUB_TOKEN=prod-token
export DATAHUB_CONNECTION_NAME=prod
# Additional servers (JSON)
export DATAHUB_ADDITIONAL_SERVERS='{"staging":{"url":"https://staging.datahub.example.com/api/graphql","token":"staging-token"}}'
Use datahub_list_connections to discover available connections, then pass the connection parameter to any tool.
2. Composable Go Library
Import into your own MCP server for custom authentication, tenant isolation, and audit logging:
import (
"github.com/txn2/mcp-datahub/pkg/client"
"github.com/txn2/mcp-datahub/pkg/tools"
)
// Create client and register tools with your MCP server
datahubClient, _ := client.NewFromEnv()
defer datahubClient.Close()
toolkit := tools.NewToolkit(datahubClient, tools.Config{})
toolkit.RegisterAll(yourMCPServer)
See the library documentation for middleware, selective tool registration, and enterprise patterns.
Combining with mcp-trino
Build a unified data platform MCP server by combining DataHub metadata with Trino query execution:
import (
datahubClient "github.com/txn2/mcp-datahub/pkg/client"
datahubTools "github.com/txn2/mcp-datahub/pkg/tools"
trinoClient "github.com/txn2/mcp-trino/pkg/client"
trinoTools "github.com/txn2/mcp-trino/pkg/tools"
)
// Add DataHub tools (search, lineage, schema, glossary)
dh, _ := datahubClient.NewFromEnv()
datahubTools.NewToolkit(dh, datahubTools.Config{}).RegisterAll(server)
// Add Trino tools (query execution, catalog browsing)
tr, _ := trinoClient.NewFromEnv()
trinoTools.NewToolkit(tr, trinoTools.Config{}).RegisterAll(server)
// AI assistants can now:
// - Search DataHub for tables -> Get schema -> Query via Trino
// - Explore lineage -> Understand data flow -> Run validation queries
See txn2/mcp-trino for the companion library.
Available Tools
| Tool | Description |
|---|---|
datahub_search | Search for datasets, dashboards, pipelines by query and entity type |
datahub_get_entity | Get entity metadata by URN (description, owners, tags, domain) |
datahub_get_schema | Get dataset schema with field types and descriptions |
datahub_get_lineage | Get upstream/downstream data lineage |
datahub_get_queries | Get SQL queries associated with a dataset |
datahub_get_glossary_term | Get glossary term definition and properties |
datahub_list_tags | List available tags in the catalog |
datahub_list_domains | List data domains |
datahub_list_data_products | List data products |
datahub_get_data_product | Get data product details (owners, domain, properties) |
datahub_list_connections | List configured DataHub server connections (multi-server mode) |
See the tools reference for detailed documentation.
Configuration
| Variable | Description | Default |
|---|---|---|
DATAHUB_URL | DataHub GraphQL API URL | (required) |
DATAHUB_TOKEN | API token | (required) |
DATAHUB_TIMEOUT | Request timeout (seconds) | 30 |
DATAHUB_DEFAULT_LIMIT | Default search limit | 10 |
DATAHUB_MAX_LIMIT | Maximum limit | 100 |
DATAHUB_CONNECTION_NAME | Display name for primary connection | datahub |
DATAHUB_ADDITIONAL_SERVERS | JSON map of additional servers | (optional) |
See configuration reference for all options.
Development
make build # Build binary
make test # Run tests with race detection
make lint # Run golangci-lint
make security # Run gosec and govulncheck
make coverage # Generate coverage report
make verify # Run tidy, lint, and test
make help # Show all targets
Related Projects
- txn2/mcp-trino (docs) - Composable MCP toolkit for Trino query execution
- DataHub - The open-source metadata platform
Contributing
See CONTRIBUTING.md for guidelines.
License
Open source by Craig Johnston, sponsored by Deasil Works, Inc.
https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_darwin_arm64.mcpb# mcpb: https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_darwin_arm64.mcpbhttps://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_darwin_amd64.mcpb# mcpb: https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_darwin_amd64.mcpbhttps://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_windows_amd64.mcpb# mcpb: https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_windows_amd64.mcpbRelated Servers
ai.cirra/salesforce-mcp
Comprehensive Salesforce administration and data management capabilities
ai.explorium/mcp-explorium
Access live company and contact data from Explorium's AgentSource B2B platform.
ai.smithery/ImRonAI-mcp-server-browserbase
Automate cloud browsers to navigate websites, interact with elements, and extract structured data.…