Skip to main content

Nougat Processor Service (Python)

This document describes the FastAPI application serving as a microservice for converting PDF documents to Markdown using Nougat OCR.

Overview

This FastAPI application provides an API to process PDF files with Nougat and return the extracted Markdown text. It's a simple service that can be used by other services in the ChainAlign ecosystem to extract text from PDF documents.

Key Functionalities Exposed:

  • Receiving a PDF file.
  • Processing the PDF file with Nougat.
  • Returning the extracted Markdown text.

Technology Stack:

  • FastAPI: For creating the web server and API endpoints.
  • Nougat-OCR: The core library for PDF processing.

API Endpoints

POST /process-pdf

Processes a PDF file and returns the extracted Markdown text.

Request Body:

A PDF file.

Responses:

  • 200 OK: A JSON object with the extracted Markdown text.
  • 400 Bad Request: If no file is uploaded.
  • 500 Internal Server Error: If the Nougat processing fails.