Skip to main content

Embedding Service

Chunks text, generates vector embeddings, and optionally translates content to English. Runs both an HTTP API for on-demand operations and a RabbitMQ worker for async processing.

  • Tech: Python, FastAPI, SQLAlchemy, pgvector, RabbitMQ (aio-pika)
  • Port: 4005
  • Auth: JWT, API Key, Public
  • Database: Shared document database (writes to chunks and embeddings tables)

Endpoints

MethodPathAuthDescription
POST/api/v1/chunkPublicChunk text using a specified method
GET/api/v1/chunk/methodsPublicList available chunking methods
POST/api/v1/embed/singlePublicGenerate embedding for a single text
POST/api/v1/embedPublicBatch embed multiple texts (max 100)
GET/api/v1/embed/modelPublicGet current embedding model info
GET/api/v1/embed/{document_id}PublicGet embedding result status by document
POST/api/v1/processPublicFull pipeline: chunk + enrich + embed in one call
GET/api/v1/translation/healthPublicTranslation service status
POST/api/v1/translation/translateJWTTranslate single text to English
POST/api/v1/translation/translate-batchJWTTranslate multiple texts to English
GET/healthPublicHealth check
GET/healthzPublicKubernetes liveness probe
GET/readyzPublicKubernetes readiness probe (checks Azure KV, RabbitMQ)
GET/metricsPublicPrometheus metrics

POST /api/v1/chunk

Chunks input text without generating embeddings.

Chunking Methods:

MethodDescription
langchain_recursiveRecursive character text splitter (LangChain)
langchain_text_splitterCharacter-based text splitter (LangChain)
semchunkSemantic chunking
fixed_sizeFixed character count chunks

POST /api/v1/embed/single

Generates a vector embedding for a single text string. Used by rag-service to embed queries.

Request:

{
"text": "What is kubernetes?"
}

Response:

{
"embedding": [0.0123, -0.0456, ...],
"model": "text-embedding-3-small",
"dimensions": 1024
}

POST /api/v1/process

Full pipeline endpoint. Takes parsed text, chunks it, optionally translates chunks, generates embeddings, and stores everything in the database.

Worker

Separate process (worker.py) that:

  1. Consumes from embedding_jobs RabbitMQ queue
  2. Chunks the parsed text
  3. Optionally translates non-English chunks to English
  4. Generates embeddings via OpenAI/Azure OpenAI
  5. Stores chunks and embeddings in PostgreSQL (pgvector)
  6. Publishes results to embedding_results queue
  7. Publishes progress to embedding_progress queue

Features:

  • DLQ for failed messages
  • Retry with exponential backoff
  • Configurable prefetch count
  • Messages formatted for NestJS compatibility

Embedding Model

Uses OpenAI or Azure OpenAI embedding API. Model and dimensions configurable via environment variables. Default: text-embedding-3-small with 1024 dimensions.

Embeddings stored in PostgreSQL using the pgvector extension with HNSW index for fast cosine similarity search.

Inter-Service Communication

TargetProtocolPurpose
document-serviceRabbitMQPublish embedding_results and embedding_progress
OpenAI / Azure OpenAIHTTPSGenerate embeddings
PostgreSQL (shared document database)SQLStore chunks and embeddings