Repository avatar
Monitoring
v0.1.1
active

mcp-datahub

io.github.txn2/mcp-datahub

MCP server for DataHub data catalogs. Discover datasets, explore lineage, and access metadata.

Documentation

txn2/mcp-datahub

GitHub license Go Reference Go Report Card codecov OpenSSF Scorecard SLSA 3

An MCP server and composable Go library that connects AI assistants to DataHub metadata catalogs. Search datasets, explore schemas, trace lineage, and access glossary terms and domains.

Documentation | Installation | Library Docs

Two Ways to Use

1. Standalone MCP Server

Install and connect to Claude Desktop, Cursor, or any MCP client:

Claude Desktop (Easiest) - Download the .mcpb bundle from releases and double-click to install:

  • macOS Apple Silicon: mcp-datahub_X.X.X_darwin_arm64.mcpb
  • macOS Intel: mcp-datahub_X.X.X_darwin_amd64.mcpb
  • Windows: mcp-datahub_X.X.X_windows_amd64.mcpb

Other Installation Methods:

# Homebrew (macOS)
brew install txn2/tap/mcp-datahub

# Go install
go install github.com/txn2/mcp-datahub/cmd/mcp-datahub@latest

Manual Claude Desktop Configuration (if not using MCPB):

{
  "mcpServers": {
    "datahub": {
      "command": "/opt/homebrew/bin/mcp-datahub",
      "env": {
        "DATAHUB_URL": "https://datahub.example.com",
        "DATAHUB_TOKEN": "your_token"
      }
    }
  }
}

Multi-Server Configuration

Connect to multiple DataHub instances simultaneously:

# Primary server
export DATAHUB_URL=https://prod.datahub.example.com/api/graphql
export DATAHUB_TOKEN=prod-token
export DATAHUB_CONNECTION_NAME=prod

# Additional servers (JSON)
export DATAHUB_ADDITIONAL_SERVERS='{"staging":{"url":"https://staging.datahub.example.com/api/graphql","token":"staging-token"}}'

Use datahub_list_connections to discover available connections, then pass the connection parameter to any tool.

2. Composable Go Library

Import into your own MCP server for custom authentication, tenant isolation, and audit logging:

import (
    "github.com/txn2/mcp-datahub/pkg/client"
    "github.com/txn2/mcp-datahub/pkg/tools"
)

// Create client and register tools with your MCP server
datahubClient, _ := client.NewFromEnv()
defer datahubClient.Close()

toolkit := tools.NewToolkit(datahubClient, tools.Config{})
toolkit.RegisterAll(yourMCPServer)

See the library documentation for middleware, selective tool registration, and enterprise patterns.

Combining with mcp-trino

Build a unified data platform MCP server by combining DataHub metadata with Trino query execution:

import (
    datahubClient "github.com/txn2/mcp-datahub/pkg/client"
    datahubTools "github.com/txn2/mcp-datahub/pkg/tools"
    trinoClient "github.com/txn2/mcp-trino/pkg/client"
    trinoTools "github.com/txn2/mcp-trino/pkg/tools"
)

// Add DataHub tools (search, lineage, schema, glossary)
dh, _ := datahubClient.NewFromEnv()
datahubTools.NewToolkit(dh, datahubTools.Config{}).RegisterAll(server)

// Add Trino tools (query execution, catalog browsing)
tr, _ := trinoClient.NewFromEnv()
trinoTools.NewToolkit(tr, trinoTools.Config{}).RegisterAll(server)

// AI assistants can now:
// - Search DataHub for tables -> Get schema -> Query via Trino
// - Explore lineage -> Understand data flow -> Run validation queries

See txn2/mcp-trino for the companion library.

Available Tools

ToolDescription
datahub_searchSearch for datasets, dashboards, pipelines by query and entity type
datahub_get_entityGet entity metadata by URN (description, owners, tags, domain)
datahub_get_schemaGet dataset schema with field types and descriptions
datahub_get_lineageGet upstream/downstream data lineage
datahub_get_queriesGet SQL queries associated with a dataset
datahub_get_glossary_termGet glossary term definition and properties
datahub_list_tagsList available tags in the catalog
datahub_list_domainsList data domains
datahub_list_data_productsList data products
datahub_get_data_productGet data product details (owners, domain, properties)
datahub_list_connectionsList configured DataHub server connections (multi-server mode)

See the tools reference for detailed documentation.

Configuration

VariableDescriptionDefault
DATAHUB_URLDataHub GraphQL API URL(required)
DATAHUB_TOKENAPI token(required)
DATAHUB_TIMEOUTRequest timeout (seconds)30
DATAHUB_DEFAULT_LIMITDefault search limit10
DATAHUB_MAX_LIMITMaximum limit100
DATAHUB_CONNECTION_NAMEDisplay name for primary connectiondatahub
DATAHUB_ADDITIONAL_SERVERSJSON map of additional servers(optional)

See configuration reference for all options.

Development

make build     # Build binary
make test      # Run tests with race detection
make lint      # Run golangci-lint
make security  # Run gosec and govulncheck
make coverage  # Generate coverage report
make verify    # Run tidy, lint, and test
make help      # Show all targets

Related Projects

Contributing

See CONTRIBUTING.md for guidelines.

License

Apache License 2.0


Open source by Craig Johnston, sponsored by Deasil Works, Inc.

MCPB
https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_darwin_arm64.mcpb
Install Command
# mcpb: https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_darwin_arm64.mcpb
MCPB
https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_darwin_amd64.mcpb
Install Command
# mcpb: https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_darwin_amd64.mcpb
MCPB
https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_windows_amd64.mcpb
Install Command
# mcpb: https://github.com/txn2/mcp-datahub/releases/download/v0.1.1/mcp-datahub_0.1.1_windows_amd64.mcpb