FSD: Dual-Engine Hybrid Search System
Version: 2.2 Date: October 15, 2025 Status: Final Draft Author: Pramod Prasanth
1.0 Overview
1.1. Purpose
This document specifies the functional requirements for ChainAlign's Dual-Engine Hybrid Search System. This system is designed to provide a single, unified search experience for the user, powered by two distinct backend engines: a PostgreSQL database for deep, structured analytical queries, and a Typesense index for high-speed, typo-tolerant textual search.
1.2. Core Architecture
The architecture is centered around a Search Orchestrator Service. This service acts as an intelligent router that receives all queries, analyzes their intent, and dispatches them to the appropriate engine(s). To handle potentially long-running analytical queries, the orchestrator will operate asynchronously, providing intermediate results and status updates to the user in real-time via WebSockets.
2.0 Goals and Objectives
- Provide a single, responsive search interface that gives users immediate feedback.
- Leverage PostgreSQL for powerful, accurate queries on structured analytical data (e.g., filtering by
WAPEorBias). - Leverage Typesense for high-speed, typo-tolerant search on unstructured text (e.g., RAG document content and planner notes).
- Ensure the Typesense text index remains synchronized with the PostgreSQL source of truth with minimal latency.
3.0 Proposed Architecture
3.1. Component Roles
| Engine | Primary Role | Data Stored / Indexed | Search Strengths |
|---|---|---|---|
| PostgreSQL | Analytical Engine / Source of Truth | All structured data: forecast_evaluation, sku_context_metadata (metrics, dates, IDs). | Complex joins, real-time filtering on analytical metrics, transactional integrity. |
| Typesense | Textual Search Index / Speed Layer | Unstructured text fields: content from RAG documents, planner notes, descriptive metadata. | Typo-tolerance, faceted search, sub-50ms latency, relevance ranking. |
3.2. Architectural Diagram
4.0 Functional Requirements
FR-1: Search Orchestrator Service (SO)
-
FR-1.1 (Query Intent Analysis): The SO must parse the request body to determine if the search is purely textual (
q), purely analytical (filters), or a hybrid combination. -
FR-1.2 (Asynchronous Execution): For any query involving PostgreSQL or complex merging, the orchestrator must immediately return a 202 Accepted response with a unique
search_idand execute the query as a background job. -
FR-1.3 (Result Merging Strategy): The orchestrator must support dynamic merging based on an optional
merge_strategyparameter (defaulting toINTERSECTION).- INTERSECTION (AND): Results must satisfy both textual search criteria (from Typesense) AND structured filter criteria (from PostgreSQL). The orchestrator will query both engines, find the intersection of the resulting primary keys, and then fetch the final data.
- UNION (OR): Results that satisfy either Typesense OR PostgreSQL criteria are included.
- Ranking Strategy: The default ranking for a UNION is to show all results with a Typesense relevance score first, ordered by that score. These are followed by unique results from the PostgreSQL query, ordered by a default field (e.g.,
last_updated desc). - De-duplication: The orchestrator is responsible for de-duplicating results based on their primary key.
- Ranking Strategy: The default ranking for a UNION is to show all results with a Typesense relevance score first, ordered by that score. These are followed by unique results from the PostgreSQL query, ordered by a default field (e.g.,
-
FR-1.4 (Normalization): The SO must normalize all search results into a consistent list of primary keys (e.g.,
sku_id,document_id) before fetching the final presentable data from PostgreSQL. -
FR-1.5 (Tenant Isolation): The SO must enforce tenant isolation at all stages. It is responsible for retrieving the
tenant_idfrom the user's session and injecting it as a mandatory filter into every query sent to both PostgreSQL and Typesense. A query that cannot be scoped to a single tenant must be rejected.
FR-2: Real-time Status and Result Streaming
- FR-2.1 (WebSocket Communication): The SO must utilize the WebSocket service to stream status and results to the client, using the
search_idfor session identification. - FR-2.2 (Intermediate Results): For hybrid queries, the orchestrator must prioritize pushing the fast results from Typesense to the UI first, providing a rapid initial response.
- FR-2.3 (Final Results): Once all analytical processing and merging is complete, the orchestrator must push a final
COMPLETEevent with the full, merged and ranked result set.
FR-3: Data Synchronization
- FR-3.1 (Mechanism): A continuous, asynchronous data synchronization pipeline must be implemented using PostgreSQL triggers and a background worker (
pgmqis preferred for transactional integrity). - FR-3.2 (Max Latency - NFR): The maximum acceptable latency from a PostgreSQL commit to index update in Typesense must be under 5 seconds for critical tables (Planner Notes, RAG metadata).
- FR-3.3 (Fallback): In the event the synchronization worker is down, the system must degrade gracefully (e.g., fall back to searching PostgreSQL FTS for the keyword, or serve a warning that text search may be stale).
FR-4: Search Engine Scope
- FR-4.1 (Typesense Scope): The Typesense index will be restricted to indexing textual fields only and will not duplicate structured metric values that change frequently.
- FR-4.2 (PostgreSQL Scope): PostgreSQL must use optimized indexing (
B-treeon metrics,pg_trgmor FTS on secondary textual columns) to handle complex filtering quickly.
5.0 Unified Search API
- FR-5.1 (Endpoint): A single endpoint,
POST /api/search, will handle all user-facing search interactions. - FR-5.2 (Request Body Contract):
{
"q": "string | null",
"filters": "object | null",
"merge_strategy": "string",
"pagination": {
"page": 1,
"limit": 25
},
"sorting": {
"sortBy": "relevance",
"order": "desc"
}
}
* `q` (`string | null`): The raw text query for Typesense.
* `filters` (`object | null`): Structured filters for PostgreSQL (e.g., `{ wape_gt: 15, region: 'ATL' }`).
* `merge_strategy` (`string`): Optional, `'INTERSECTION'` (default) or `'UNION'`.
* `pagination` (`object`): Optional. Defines the page and number of results to return. Defaults to `{ "page": 1, "limit": 25 }`.
* `sorting` (`object`): Optional. Defines the final sort order of the merged results. E.g., `{ "sortBy": "wape", "order": "desc" }`. Default sort is by relevance. The available `sortBy` fields will depend on the data context.
- FR-5.3 (Response): The immediate response will be
202 Acceptedwith asearch_id. The full results are delivered via WebSocket.
6.0 Phased Implementation Plan
- Phase 1: Foundation & Synchronization.
- Deploy Typesense and define initial text schemas.
- Implement the
pgmq-based data synchronization pipeline and measure latency against FR-3.2. - Implement PostgreSQL-only analytical queries (using FTS/
pg_trgmas a backup for text in case of TypeSense failure).
- Phase 2: Orchestration & Streaming.
- Build the Search Orchestrator (SO) with intent analysis and asynchronous job logic.
- Integrate the SO with the WebSocket service to push status and partial results (FR-2.2).
- Phase 3: Merging & Advanced UI.
- Implement the full result merging (Intersection/Union) logic (FR-1.3).
- Develop the UI to gracefully handle the streaming results and display partial/loading states.
7.0 Non-Functional Requirements
- Performance: Text-only queries (Typesense) should have a P95 latency of <50ms. Combined queries should have a P95 latency of <500ms.
- Consistency: The data synchronization lag between PostgreSQL and Typesense must be under 5 seconds.
- Security: All search queries must adhere to tenant isolation and role-based access controls, as specified in FR-1.5. The Search Orchestrator is the primary enforcement point for this requirement.
- Reliability: The system must include a fallback mechanism in case the data synchronization worker fails.