Skip to main content

Content Aware Chunking Service (Python)

This document describes the FastAPI application serving as a microservice for content-aware chunking of text.

Overview

This FastAPI application provides a simple API to chunk text using Langchain's RecursiveCharacterTextSplitter. It's designed to be a simple, stateless service that can be called by other services in the ChainAlign ecosystem.

Key Functionalities Exposed:

  • Receiving a block of text.
  • Splitting the text into chunks of a specified size with a 10% overlap.
  • Returning the chunks as a list of strings.

Technology Stack:

  • FastAPI: For creating the web server and API endpoints.
  • Langchain: The core library for the RecursiveCharacterTextSplitter.

API Endpoints

POST /chunk

Receives a block of text and returns a list of chunks.

Request Body:

{
"text": "The text content to be chunked.",
"chunk_size": 1000
}

Responses:

  • 200 OK: A JSON array of strings, where each string is a chunk of the original text.
  • 422 Unprocessable Entity: If the request body is invalid.