Parser Overview

Before embedding or chunking any document, it must first go through a parsing stage. In our system, parsers are responsible for reading the uploaded document (PDF or Word format) and extracting its textual and visual content into a structured format.

What is a Parser?

A parser in our platform is the first step in the document processing pipeline. It takes raw documents and converts them into a clean, machine-readable format — enabling downstream operations like embedding, chunking, translation, and Q&A generation.

Supported Formats

PDF (.pdf)
Word Documents (.doc, .docx)

How It Works

Upload your document – drag & drop or select from your file system.
Choose a parser – select one of the available parsing strategies depending on your document type or goal.
The parser extracts – we convert the file into structured text (and optionally metadata, tables, images, etc.).
Post-processing options – after parsing, you can:
- Generate embeddings
- Split content into chunks
- Translate to English
- Generate questions and answers

Available Parsers

Parser	Description
Text	Basic raw text extraction from documents. Fast, minimal formatting.
Text & Image	Extracts text and image references for mixed content files.
Marker	Advanced parser that supports tables, forms, images, equations, and chunking-ready output.
Tika	Apache Tika-based parser for extracting text and metadata from a wide variety of file formats.
Semantic	Parser that focuses on semantic structure and logical separation of content.
Flex	Smart parser that auto-selects strategy based on file type and content complexity.

What is a Parser?​

Supported Formats​

How It Works​

Available Parsers​

What is a Parser?

Supported Formats

How It Works

Available Parsers