
Search & Data Extraction
v1.0.1
active
source-library
io.github.Embassy-of-the-Free-Mind/source-library
Search rare historical texts with OCR, translations & DOI citations.
Documentation
Source Library v2
A Next.js application for digitizing and translating historical texts. Built for the Embassy of the Free Mind.
Stack
- Framework: Next.js 14 (App Router)
- Database: MongoDB Atlas
- AI: Google Gemini for OCR and translation
- Storage: Vercel Blob for images
- Deployment: Vercel
Getting Started
npm install
npm run dev
Architecture
Image System
All page images go through /api/image for consistent sizing and cropping:
| Tier | Size | Quality | Use Case |
|---|---|---|---|
| Thumbnail | 400px | 70% | Grid views, page navigation |
| Display | 1200px | 80% | Main reading view |
| Full | 2400px | 90% | Magnifier, fullscreen |
Split Pages
Books with two-page spreads can be split. Each page stores:
crop.xStartandcrop.xEnd(0-1000 scale)cropped_photo(optional pre-generated Vercel Blob URL)
Cropping happens on-demand via Sharp. OCR automatically crops inline and saves the result for future use.
Processing Pipeline
- Import - Upload images or import from Internet Archive
- Split - Detect and split two-page spreads (ML or manual)
- OCR - Extract text using Gemini Vision
- Translate - Translate to English using Gemini
- Summarize - Generate summaries and key themes
Key Directories
src/
├── app/
│ ├── api/ # API routes
│ ├── book/ # Book pages (detail, read, pipeline)
│ └── page.tsx # Homepage
├── components/ # React components
└── lib/ # Utilities (mongodb, ai, types)
NPM
@source-library/mcp-serverInstall Command
npm install @source-library/mcp-server