Skip to main content

Document Service

Orchestrates the full document lifecycle: upload to blob storage, trigger parsing and embedding via RabbitMQ, track processing status, serve downloads, manage folders, and stream real-time status updates via SSE.

  • Tech: NestJS 11, TypeORM, PostgreSQL, RabbitMQ, Azure Blob/S3
  • Port: 4000
  • Auth: JWT, API Key, Public
  • Database: Shared document database (shared with parser-service, embedding-service, rag-service)

Document Endpoints

MethodPathAuthDescription
POST/api/v1/documents/initJWTInitialize a new document record (PENDING_UPLOAD)
POST/api/v1/documents/uploadJWT, API KeyStream-upload file to blob storage
POST/api/v1/documents/processJWTTrigger parsing + embedding pipeline
GET/api/v1/documents/treeJWTGet full document/folder tree for current user
GET/api/v1/documents/bulk?ids=JWT, API KeyGet multiple documents by IDs
GET/api/v1/documents/stream?documentIds=JWTSSE stream for real-time processing status
POST/api/v1/documents/accessibleJWTFilter document IDs to only accessible ones
GET/api/v1/documents/:documentIdJWTGet single document
GET/api/v1/documents/:documentId/statusJWTGet processing status
GET/api/v1/documents/:documentId/sas-tokenAPI KeyGenerate SAS token for blob storage
GET/api/v1/documents/:documentId/downloadJWTDownload original file
GET/api/v1/documents/:documentId/base64PublicGet document as base64
PUT/api/v1/documents/:documentIdJWTUpdate document metadata
DELETE/api/v1/documents/:documentIdJWTSoft-delete
PUT/api/v1/documents/bulkJWTBulk update
DELETE/api/v1/documents/bulkJWTBulk soft-delete

Document Status Flow

PENDING_UPLOAD --> UPLOADED --> PROCESSING --> PROCESSED
|
v
FAILED

POST /api/v1/documents/process

Triggers the processing pipeline for an uploaded document:

  1. Publishes a job to the parsing_jobs RabbitMQ queue (or falls back to HTTP POST /parser/parse if RabbitMQ is unavailable)
  2. Parser service worker processes the document and publishes results to parsing_results
  3. On successful parsing, document-service triggers auto-summarization via completion-service
  4. Publishes an embedding job to embedding_jobs
  5. Embedding service worker processes and publishes results to embedding_results
  6. Document status changes to PROCESSED

GET /api/v1/documents/stream

SSE endpoint. Client subscribes with document IDs and receives real-time updates as parsing and embedding progress.

Folder Endpoints

MethodPathAuthDescription
POST/api/v1/foldersJWTCreate folder
GET/api/v1/foldersJWTList folders (filter by parentId, folderType)
GET/api/v1/folders/rootsJWTGet root-level folders
GET/api/v1/folders/bulk?ids=--Get folders by IDs
POST/api/v1/folders/bulkJWTBulk create
DELETE/api/v1/folders/bulk--Bulk soft-delete
DELETE/api/v1/folders/bulk/hard--Bulk hard-delete
PUT/api/v1/folders/bulkJWTBulk update
GET/api/v1/folders/:id--Get folder
GET/api/v1/folders/:parentId/children--Get folder children
PUT/api/v1/folders/:id--Update folder
DELETE/api/v1/folders/:id--Soft-delete

Folder types: document, agent, default.

Chunk Endpoints

MethodPathAuthDescription
GET/api/v1/chunks/:documentId/content-at-index/:chunkIndexJWTGet specific chunk content

Export Endpoints

MethodPathAuthDescription
POST/api/v1/export--Export document as PDF or DOCX (returns binary stream)

Uses Puppeteer for PDF generation and docx library for DOCX generation.

Parsing Technique Endpoints

MethodPathAuthDescription
GET/api/v1/parsing-techniquesJWTList all techniques
GET/api/v1/parsing-techniques/enabledJWTList enabled only
GET/api/v1/parsing-techniques/:idJWTGet by ID
POST/api/v1/parsing-techniques/by-idsJWTGet by IDs

RabbitMQ (Producer and Consumer)

Produces

QueueTriggerPayload
parsing_jobsPOST /documents/processDocument ID, file content, parser method, options
embedding_jobsAfter parsing completesDocument ID, parsed text, chunking config

Consumes

QueueHandlerAction
{prefix}-document-uploadProcesses document upload messages
parsing_resultsUpdates document, stores chunks, triggers embedding
parsing_progressUpdates SSE stream
embedding_resultsMarks document as PROCESSED
embedding_progressUpdates SSE stream

Inter-Service Communication

TargetProtocolPurpose
parser-serviceRabbitMQ (primary), HTTP (fallback)Document parsing
embedding-serviceRabbitMQEmbedding generation
completion-serviceHTTPDocument summarization
user-serviceHTTPAccess control (shared-with-me, bulk users, check-access)
admin-base-msHTTPOrg parsing technique settings