AI Quality Feedback (AQF) Loop

Author: ChainAlign Engineering Status: ✅ Implemented Date: November 17, 2025 Feature Owner: Product Priority: High Implementation Date: November 17, 2025

1. 🎯 Executive Summary

This document outlines the AI Quality Feedback (AQF) Loop, an internal system for capturing real-time user sentiment on AI-generated insights. This is not a generic "feedback" tool; it is a core mechanism for generating a labeled dataset to enable Reinforcement Learning from Human Feedback (RLHF).

Its purpose is to directly correlate user satisfaction (e.g., "4 - Excellent") with specific AI reasoning chains, allowing us to programmatically fine-tune models and improve the "Judgment OS."

Why We Built This (Buy vs. Build Decision)

Decision: Build Internally

Third-party tools like Pendo, Hotjar, or Sprig are excellent for generic product analytics, but they create "black boxes" that prevent us from:

Linking feedback to AI state: We need to know which specific reasoningId (CoT chain) a user rated
Creating training datasets: External tools don't export data in formats suitable for model fine-tuning
Real-time model adaptation: We need programmatic access to feedback for our RLHF pipeline

By building internally, we maintain 100% control over the data and can create a high-quality labeled dataset: (reasoning_chain, user_score).

2. 📈 Goals & Objectives

Primary Goal

Create a high-quality, labeled dataset of (reasoning_chain, user_score) pairs for RLHF training.

Objectives

Capture User Sentiment: Record a user's sentiment (1-4 score) in the context of a specific AI interaction
Non-Intrusive UX: Provide a lightweight UI inspired by Claude Code's feedback mechanism
Analytics & Insights: Feed data into analytics to identify high/low-performing AI prompts and models
Auto-Flagging: Automatically flag low-rated content (score ≤ 2) for human review

Non-Goals

❌ This is not a "Contact Support" or bug-reporting feature
❌ This is not a generic NPS survey
❌ This is not for gathering text-based feedback (use existing feedbackRoutes.js for that)

3. 🏛️ Architectural Overview

Build vs. Buy: Internal Implementation

Architecture Pattern: Three-Layer Pattern (Routes → Services → DAL)

┌─────────────────────────────────────────────────────────────┐
│                      Frontend (React)                        │
├─────────────────────────────────────────────────────────────┤
│  FeedbackContext (Global State)                             │
│  └── MicroFeedback Component (shadcn/ui Dialog)             │
│      ├── 1: Bad (ThumbsDown)                                │
│      ├── 2: Fine (Meh)                                      │
│      ├── 3: Good (ThumbsUp)                                 │
│      └── 4: Excellent (Star)                                │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼ POST /api/ai-feedback/rating
┌─────────────────────────────────────────────────────────────┐
│                    Backend (Node.js/Express)                 │
├─────────────────────────────────────────────────────────────┤
│  Routes: aiFeedbackRoutes.js                                │
│  └── POST   /api/ai-feedback/rating                         │
│  └── GET    /api/ai-feedback/analytics?days=30              │
│  └── GET    /api/ai-feedback/reasoning/:reasoningId         │
│                                                              │
│  Service: AiFeedbackService.js                              │
│  ├── recordFeedback(user, score, context)                   │
│  ├── flagReasoningForReview(reasoningId, score) [async]    │
│  ├── getFeedbackAnalytics(user, daysAgo)                   │
│  └── getFeedbackForReasoning(user, reasoningId)            │
│                                                              │
│  Repository: AiFeedbackRatingsRepository.js                 │
│  └── Extends BaseRepository (tenant-scoped)                 │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│               Database (PostgreSQL + pgvector)               │
├─────────────────────────────────────────────────────────────┤
│  Table: ai_feedback_ratings                                 │
│  ├── id (UUID)                                              │
│  ├── tenant_id (UUID, FK)                                   │
│  ├── user_id (UUID, FK)                                     │
│  ├── score (SMALLINT, 1-4)                                  │
│  ├── context (JSONB) ← The key to RLHF                      │
│  └── created_at (TIMESTAMPTZ)                               │
└─────────────────────────────────────────────────────────────┘

4. 📋 Functional Requirements

FR1: UI Component (MicroFeedback.jsx)

Location: frontend/src/components/MicroFeedback.jsx

Requirements:

✅ Non-blocking modal using shadcn/ui Dialog component
✅ Simple question: "How was this insight?"
✅ Four clickable options with icons and colors:
- 1 (Bad) - ThumbsDown icon, red color, "Not helpful"
- 2 (Fine) - Meh icon, yellow color, "Somewhat helpful"
- 3 (Good) - ThumbsUp icon, blue color, "Helpful"
- 4 (Excellent) - Star icon, green color, "Very helpful"
✅ Clicking an option fires the API request and dismisses the modal
✅ Visual feedback during submission (loading state)

Implementation:

<MicroFeedback
  isOpen={isOpen}
  onClose={closeFeedback}
  onSubmit={submitFeedback}
  context={currentContext}
/>

FR2: Triggering Logic

V1 (Implemented - Manual Trigger):

import { useFeedback } from '../context/FeedbackContext.jsx';

const MyComponent = () => {
  const { showFeedback } = useFeedback();

  return (
    <button onClick={() => showFeedback({
      page: '/dashboard/sop',
      componentId: 'critical_insight_001',
      reasoningId: 'cot_uuid_abc123',
    })}>
      How was this insight?
    </button>
  );
};

V2 (Future - Proactive Trigger):

Trigger after user engagement (e.g., "user expands CoT panel for >5 seconds")
Random sampling (e.g., "show to 10% of users on each insight")
Smart timing (e.g., "after user completes a decision workflow")

FR3: Backend Endpoint

Endpoint: POST /api/ai-feedback/rating

Authentication: Bearer Token (Firebase JWT via verifyToken middleware)

Request Schema (Zod Validation):

{
  score: number (1-4, required),
  context: {
    page?: string,
    componentId?: string,
    reasoningId?: string,
    insightId?: string,
    decisionId?: string,
    scenarioId?: string,
  }
}

Response:

{
  "message": "Feedback received. Thank you!",
  "feedbackId": "uuid-here"
}

Status Code: 202 Accepted (async processing)

Implementation Details:

✅ Protected by verifyToken middleware
✅ Zod schema validation
✅ Tenant-scoped data storage
✅ Async background processing for low-rated feedback
✅ Non-blocking response (< 100ms p99)

FR4: Contextual Payload (The "Data Link")

Critical Requirement: The frontend MUST send a context object to link the score to specific AI outputs.

Example Payload:

{
  "score": 4,
  "context": {
    "page": "/dashboard/sop",
    "componentId": "critical_insight_001",
    "reasoningId": "cot_uuid_abc123",  // ← The KEY link for RLHF
    "insightId": "insight_456",
    "decisionId": "decision_789"
  }
}

Context Fields:

page: Current page URL or identifier
componentId: UI component that triggered feedback
reasoningId: CRITICAL - UUID of the CoT reasoning chain
insightId: UUID of the insight being rated
decisionId: UUID of the decision being rated
scenarioId: UUID of the scenario being rated

Why This Matters: Without reasoningId, we cannot:

Identify which AI prompt generated the output
Fine-tune models based on user feedback
A/B test different reasoning approaches
Build a labeled dataset for RLHF

5. 🔒 Non-Functional Requirements (NFRs)

NFR-1: Performance

✅ API Response Time: < 100ms (p99)
✅ Non-Blocking: Async background processing for flagging
✅ Database Indexing: GIN index on JSONB context field

NFR-2: Multi-Tenancy

✅ All data is tenant-scoped via BaseRepository
✅ Queries automatically filtered by tenant_id
✅ No cross-tenant data leakage

NFR-3: Data Retention

✅ Feedback stored indefinitely for RLHF training
🔄 Phase 2: Implement data retention policies (e.g., 2 years)

Status: ⚠️ To Be Implemented

Requirements:

This feature constitutes "profiling" under GDPR as it tracks user behavior to adapt the service
Must be disabled by default for all users
Can only be enabled if user has given explicit consent via Consent Management System
Must check ConsentService.checkConsent(userId, 'profiling') before triggering modal
Users must be able to withdraw consent and request data deletion

Implementation Plan:

Integrate with Consent Management System (Phase 2.1 of Unified Compliance Plan)
Add consent check in FeedbackContext.showFeedback()
Add data deletion endpoint for GDPR Right to Erasure
Add consent banner for "AI Quality Improvement" purpose

Current Workaround:

Feature is opt-in (user must click to trigger)
No automatic/proactive triggering without consent

6. 🗄️ Database Schema

Table: `ai_feedback_ratings`

Migration: 20251117000001_create_ai_feedback_ratings_table.cjs

CREATE TABLE "public"."ai_feedback_ratings" (
    "id" UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    "tenant_id" UUID NOT NULL REFERENCES "public"."tenants"("id") ON DELETE CASCADE,
    "user_id" UUID NOT NULL REFERENCES "public"."users"("id") ON DELETE CASCADE,
    "score" SMALLINT NOT NULL CHECK (score >= 1 AND score <= 4),
    "context" JSONB,
    "created_at" TIMESTAMPTZ DEFAULT now()
);

-- Indexes for performance
CREATE INDEX "idx_feedback_user" ON "public"."ai_feedback_ratings" ("user_id");
CREATE INDEX "idx_feedback_tenant" ON "public"."ai_feedback_ratings" ("tenant_id");
CREATE INDEX "idx_feedback_created_at" ON "public"."ai_feedback_ratings" ("created_at");

-- GIN index for JSONB queries
CREATE INDEX "idx_feedback_context_gin" ON "public"."ai_feedback_ratings" USING GIN ("context");

-- Comment for clarity
COMMENT ON COLUMN "public"."ai_feedback_ratings"."context" IS
'Stores the UI/AI state when feedback was given. E.g., { "page": "/dashboard", "componentId": "critical_insight_001", "reasoningId": "cot_uuid_abc123" }';

Key Design Decisions:

JSONB for Context: Flexible schema for different feedback types
GIN Index: Fast queries on context->>'reasoningId'
Cascade Deletion: Feedback deleted if user/tenant is deleted
Check Constraint: Ensures score is always 1-4

7. 🔌 API Specification

Endpoint 1: Submit Feedback Rating

Method: POST Path: /api/ai-feedback/rating Auth: Bearer Token (Firebase JWT)

Request Body:

{
  "score": 4,
  "context": {
    "page": "/dashboard/sop",
    "componentId": "critical_insight_001",
    "reasoningId": "cot_uuid_abc123",
    "insightId": "insight_456"
  }
}

Response (202 Accepted):

{
  "message": "Feedback received. Thank you!",
  "feedbackId": "550e8400-e29b-41d4-a716-446655440000"
}

Error Responses:

400 Bad Request: Invalid score or missing required fields
401 Unauthorized: Missing or invalid auth token
500 Internal Server Error: Server error

Endpoint 2: Get Feedback Analytics

Method: GET Path: /api/ai-feedback/analytics?days=30 Auth: Bearer Token

Query Parameters:

days (optional): Number of days to look back (default: 30)

Response (200 OK):

{
  "totalCount": 127,
  "avgScore": 3.45,
  "npsScore": 68.5,
  "distribution": [
    { "score": 1, "count": 5 },
    { "score": 2, "count": 12 },
    { "score": 3, "count": 45 },
    { "score": 4, "count": 65 }
  ],
  "promoters": 65,
  "passives": 45,
  "detractors": 17
}

Metrics Explained:

avgScore: Average rating (1.0 - 4.0)
npsScore: Net Promoter Score-like metric (% promoters - % detractors)
promoters: Users who rated 4 (Excellent)
passives: Users who rated 3 (Good)
detractors: Users who rated 1-2 (Bad/Fine)

Endpoint 3: Get Feedback for Reasoning Chain

Method: GET Path: /api/ai-feedback/reasoning/:reasoningId Auth: Bearer Token

Response (200 OK):

{
  "reasoningId": "cot_uuid_abc123",
  "feedbackCount": 8,
  "avgScore": "3.75",
  "feedback": [
    {
      "id": "uuid-1",
      "score": 4,
      "context": { "page": "/dashboard", "componentId": "insight_001" },
      "created_at": "2025-11-17T10:30:00Z"
    },
    {
      "id": "uuid-2",
      "score": 3,
      "context": { "page": "/dashboard", "componentId": "insight_001" },
      "created_at": "2025-11-17T11:15:00Z"
    }
  ]
}

8. 🎨 User Experience Flow

Manual Trigger Flow (V1 - Implemented)

1. User views an AI-generated insight
   └─> ReasoningPanel renders with "How was this insight?" button

2. User clicks "How was this insight?"
   └─> showFeedback({ reasoningId, componentId, page }) triggered

3. MicroFeedback modal appears
   └─> User sees 4 options: Bad, Fine, Good, Excellent

4. User clicks "Excellent" (score: 4)
   └─> POST /api/ai-feedback/rating { score: 4, context: {...} }

5. Modal shows "Submitting feedback..."
   └─> API responds 202 Accepted

6. Modal closes after 500ms
   └─> User returns to their workflow (uninterrupted)

7. [Background] AiFeedbackService processes the feedback
   └─> If score ≤ 2: Flag reasoning chain for review
   └─> Update analytics dashboard

Proactive Trigger Flow (V2 - Future)

1. User expands CoT reasoning panel
   └─> Timer starts: 5 seconds

2. After 5 seconds of engagement
   └─> showFeedback() automatically triggered

3. MicroFeedback modal appears (same as above)
   └─> User rates or dismisses

4. User consent check
   └─> if (!ConsentService.checkConsent(userId, 'profiling'))
       └─> Do NOT trigger (GDPR compliance)

9. 📊 Analytics & RLHF Use Cases

Use Case 1: Fine-Tuning AI Models

Goal: Improve AI reasoning quality using RLHF

Process:

Export feedback data: SELECT reasoning_id, AVG(score) FROM ai_feedback_ratings GROUP BY reasoning_id
Identify high-performing reasoning chains (avg score ≥ 3.5)
Extract prompts and parameters from reasoning_bank table
Use as positive examples for model fine-tuning
Identify low-performing chains (avg score ≤ 2.0)
Analyze failure patterns and adjust prompts

Dataset Format:

[
  {
    "reasoning_chain": "...",
    "prompt": "...",
    "parameters": {...},
    "avg_score": 3.8,
    "feedback_count": 15
  }
]

Use Case 2: Identifying Weak Spots

Query: Which types of insights are performing poorly?

SELECT
  context->>'componentId' AS component,
  AVG(score) AS avg_score,
  COUNT(*) AS feedback_count
FROM ai_feedback_ratings
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY context->>'componentId'
ORDER BY avg_score ASC;

Example Output:

component                | avg_score | feedback_count
-------------------------|-----------|---------------
critical_insight_001     | 2.1       | 45
bottleneck_analysis      | 3.8       | 30
scenario_suggestion      | 3.5       | 22

Action: Improve "critical_insight_001" logic

Use Case 3: A/B Testing AI Models

Scenario: Testing two different reasoning approaches

Setup:

50% of users get Model A (prompt_v1)
50% of users get Model B (prompt_v2)

Query:

SELECT
  context->>'modelVersion' AS model,
  AVG(score) AS avg_score,
  COUNT(*) AS feedback_count
FROM ai_feedback_ratings
WHERE context->>'experimentId' = 'cot_ab_test_001'
GROUP BY context->>'modelVersion';

Result:

model     | avg_score | feedback_count
----------|-----------|---------------
prompt_v1 | 3.2       | 50
prompt_v2 | 3.8       | 48

Decision: Roll out prompt_v2 to all users

Use Case 4: NPS Tracking Over Time

Goal: Track overall AI satisfaction as a KPI

Metric: NPS Score = (% Promoters - % Detractors) × 100

Implementation:

const analytics = await AiFeedbackService.getFeedbackAnalytics(user, 30);
console.log(`NPS Score: ${analytics.npsScore}%`);
// NPS Score: 68.5%

Benchmark:

NPS > 70: Excellent
NPS 50-70: Good
NPS 30-50: Needs improvement
NPS < 30: Critical issue

10. 🚀 Implementation Status

✅ Completed (Phase 1)

Backend:

Database migration (ai_feedback_ratings table)
Repository layer (AiFeedbackRatingsRepository.js)
Service layer (AiFeedbackService.js)
API routes (aiFeedbackRoutes.js)
Auto-flagging for low-rated feedback
Analytics endpoints
Tenant scoping

Frontend:

MicroFeedback component (shadcn/ui)
FeedbackContext (global state)
useFeedback() hook
Integration into App.jsx

Documentation:

Implementation guide (docs/MICRO_FEEDBACK_SYSTEM.md)
Functional specification (this document)

🔄 Planned (Phase 2)

GDPR Compliance:

Integrate with Consent Management System
Add consent check before triggering modal
Implement data deletion endpoint
Add consent banner for "AI Quality Improvement"

Proactive Triggering:

Smart trigger based on user engagement
Random sampling (10% of users)
Cooldown period (don't spam users)

Advanced Analytics:

Feedback trends dashboard
Correlation analysis (feedback vs. user retention)
Automated model retraining based on feedback

Review Queue:

Admin interface for reviewing low-rated feedback
Automated alerts for score < 2
Integration with Linear for task creation

11. 🧪 Testing Strategy

Unit Tests

# Backend service tests
npm test --workspace=backend -- AiFeedbackService.test.js

# Repository tests
npm test --workspace=backend -- AiFeedbackRatingsRepository.test.js

Integration Tests

# API endpoint tests
npm test --workspace=backend -- aiFeedbackRoutes.test.js

Manual Testing Checklist

Run migration: npx knex migrate:latest --env development
Submit feedback via UI modal
Verify data in database: SELECT * FROM ai_feedback_ratings;
Test analytics endpoint: GET /api/ai-feedback/analytics?days=7
Test reasoning endpoint: GET /api/ai-feedback/reasoning/:id
Verify low-score flagging in logs (score ≤ 2)
Test tenant isolation (create feedback from 2 different tenants)

12. 📚 References

Backend:
- backend/src/routes/aiFeedbackRoutes.js
- backend/src/services/AiFeedbackService.js
- backend/src/dal/AiFeedbackRatingsRepository.js
- backend/migrations/20251117000001_create_ai_feedback_ratings_table.cjs
Frontend:
- frontend/src/components/MicroFeedback.jsx
- frontend/src/context/FeedbackContext.jsx
- frontend/src/App.jsx

External Resources

13. 🎯 Success Metrics

KPIs (30-day rolling window)

Participation Rate: % of users who submit at least 1 feedback
- Target: > 20%
Average NPS Score: (% Promoters - % Detractors)
- Target: > 60
Low-Score Flag Rate: % of feedback with score ≤ 2
- Target: < 10%
Feedback Coverage: % of AI insights that receive feedback
- Target: > 30%
Time to Improvement: Days from low-score flag to prompt fix
- Target: < 7 days

14. 🔐 Security Considerations

Authentication & Authorization

✅ All endpoints protected by Firebase JWT
✅ Tenant-scoped queries (no cross-tenant access)
✅ User must be authenticated to submit feedback

Data Privacy

✅ No PII stored in context field (only IDs and metadata)
⚠️ GDPR consent required before triggering (Phase 2)
✅ Feedback is user-specific (can be deleted on user request)

Rate Limiting

🔄 Phase 2: Implement rate limiting (max 10 feedback/user/minute)
🔄 Phase 2: Add cooldown period (1 feedback per insight per user)

15. 📝 Changelog

Date	Version	Author	Changes
2025-11-17	1.0	ChainAlign Engineering	Initial implementation and documentation
2025-11-17	1.1	ChainAlign Engineering	Updated FSD based on actual implementation

16. ✅ Approval & Sign-Off

Role	Name	Status	Date
Engineering Lead	TBD	⏳ Pending	-
Product Owner	TBD	⏳ Pending	-
Legal/Compliance	TBD	⏳ Pending	-
Security	TBD	⏳ Pending	-

End of Document

1. 🎯 Executive Summary​

Why We Built This (Buy vs. Build Decision)​

2. 📈 Goals & Objectives​

Primary Goal​

Objectives​

Non-Goals​

3. 🏛️ Architectural Overview​

Build vs. Buy: Internal Implementation​

4. 📋 Functional Requirements​

FR1: UI Component (MicroFeedback.jsx)​

FR2: Triggering Logic​

FR3: Backend Endpoint​

FR4: Contextual Payload (The "Data Link")​

5. 🔒 Non-Functional Requirements (NFRs)​

NFR-1: Performance​

NFR-2: Multi-Tenancy​

NFR-3: Data Retention​

NFR-4: GDPR Compliance (Phase 2 - Not Yet Implemented)​

6. 🗄️ Database Schema​

Table: ai_feedback_ratings​

7. 🔌 API Specification​

Endpoint 1: Submit Feedback Rating​

Endpoint 2: Get Feedback Analytics​

Endpoint 3: Get Feedback for Reasoning Chain​

8. 🎨 User Experience Flow​

Manual Trigger Flow (V1 - Implemented)​

Proactive Trigger Flow (V2 - Future)​

9. 📊 Analytics & RLHF Use Cases​

Use Case 1: Fine-Tuning AI Models​

Use Case 2: Identifying Weak Spots​

Use Case 3: A/B Testing AI Models​

Use Case 4: NPS Tracking Over Time​

10. 🚀 Implementation Status​

✅ Completed (Phase 1)​

🔄 Planned (Phase 2)​

11. 🧪 Testing Strategy​

Unit Tests​

Integration Tests​

Manual Testing Checklist​

12. 📚 References​

Related Documentation​

Related Code​

External Resources​

13. 🎯 Success Metrics​

KPIs (30-day rolling window)​

14. 🔐 Security Considerations​

Authentication & Authorization​

Data Privacy​

Rate Limiting​

15. 📝 Changelog​

16. ✅ Approval & Sign-Off​