Content Aware Chunking Service (Python)
This document describes the FastAPI application serving as a microservice for content-aware chunking of text.
Overview
This FastAPI application provides a simple API to chunk text using Langchain's RecursiveCharacterTextSplitter. It's designed to be a simple, stateless service that can be called by other services in the ChainAlign ecosystem.
Key Functionalities Exposed:
- Receiving a block of text.
- Splitting the text into chunks of a specified size with a 10% overlap.
- Returning the chunks as a list of strings.
Technology Stack:
- FastAPI: For creating the web server and API endpoints.
- Langchain: The core library for the
RecursiveCharacterTextSplitter.
API Endpoints
POST /chunk
Receives a block of text and returns a list of chunks.
Request Body:
{
"text": "The text content to be chunked.",
"chunk_size": 1000
}
Responses:
200 OK: A JSON array of strings, where each string is a chunk of the original text.422 Unprocessable Entity: If the request body is invalid.