Embedding Service
Chunks text, generates vector embeddings, and optionally translates content to English. Runs both an HTTP API for on-demand operations and a RabbitMQ worker for async processing.
- Tech: Python, FastAPI, SQLAlchemy, pgvector, RabbitMQ (aio-pika)
- Port: 4005
- Auth: JWT, API Key, Public
- Database: Shared document database (writes to chunks and embeddings tables)
Endpoints
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /api/v1/chunk | Public | Chunk text using a specified method |
| GET | /api/v1/chunk/methods | Public | List available chunking methods |
| POST | /api/v1/embed/single | Public | Generate embedding for a single text |
| POST | /api/v1/embed | Public | Batch embed multiple texts (max 100) |
| GET | /api/v1/embed/model | Public | Get current embedding model info |
| GET | /api/v1/embed/{document_id} | Public | Get embedding result status by document |
| POST | /api/v1/process | Public | Full pipeline: chunk + enrich + embed in one call |
| GET | /api/v1/translation/health | Public | Translation service status |
| POST | /api/v1/translation/translate | JWT | Translate single text to English |
| POST | /api/v1/translation/translate-batch | JWT | Translate multiple texts to English |
| GET | /health | Public | Health check |
| GET | /healthz | Public | Kubernetes liveness probe |
| GET | /readyz | Public | Kubernetes readiness probe (checks Azure KV, RabbitMQ) |
| GET | /metrics | Public | Prometheus metrics |
POST /api/v1/chunk
Chunks input text without generating embeddings.
Chunking Methods:
| Method | Description |
|---|---|
langchain_recursive | Recursive character text splitter (LangChain) |
langchain_text_splitter | Character-based text splitter (LangChain) |
semchunk | Semantic chunking |
fixed_size | Fixed character count chunks |
POST /api/v1/embed/single
Generates a vector embedding for a single text string. Used by rag-service to embed queries.
Request:
{
"text": "What is kubernetes?"
}
Response:
{
"embedding": [0.0123, -0.0456, ...],
"model": "text-embedding-3-small",
"dimensions": 1024
}
POST /api/v1/process
Full pipeline endpoint. Takes parsed text, chunks it, optionally translates chunks, generates embeddings, and stores everything in the database.
Worker
Separate process (worker.py) that:
- Consumes from
embedding_jobsRabbitMQ queue - Chunks the parsed text
- Optionally translates non-English chunks to English
- Generates embeddings via OpenAI/Azure OpenAI
- Stores chunks and embeddings in PostgreSQL (pgvector)
- Publishes results to
embedding_resultsqueue - Publishes progress to
embedding_progressqueue
Features:
- DLQ for failed messages
- Retry with exponential backoff
- Configurable prefetch count
- Messages formatted for NestJS compatibility
Embedding Model
Uses OpenAI or Azure OpenAI embedding API. Model and dimensions configurable via environment variables. Default: text-embedding-3-small with 1024 dimensions.
Embeddings stored in PostgreSQL using the pgvector extension with HNSW index for fast cosine similarity search.
Inter-Service Communication
| Target | Protocol | Purpose |
|---|---|---|
| document-service | RabbitMQ | Publish embedding_results and embedding_progress |
| OpenAI / Azure OpenAI | HTTPS | Generate embeddings |
| PostgreSQL (shared document database) | SQL | Store chunks and embeddings |