CSV Onboarding FDD
Summary of docs/architecture/CSVOnboarding.md
The document outlines a design for an intuitive and user-friendly CSV data entry and mapping interface for ChainAlign, focusing on a "Three-Step Intelligent Mapping Flow." The core idea is to allow users to upload CSV files and intelligently map their column headers to ChainAlign's database fields, especially for "messy" or non-preformatted files.
Key features and steps include:
- Upload Zone: A clean drag-and-drop area for CSV file uploads, with background AI/ML analysis for column identification and mapping suggestions.
- Mapping Interface: This is the central component, presenting CSV columns one at a time as an interactive card. Each card features a searchable dropdown (Combobox) pre-selected with the AI's best guess, along with a confidence score. Users confirm or adjust the mapping for the current column before proceeding to the next, creating a guided, sequential flow. The document specifically recommends using
shadcn/ui's Combobox for implementation due to its accessibility and styling flexibility with Tailwind CSS. - Validation & Import: A final screen showing a preview of the first few rows with ChainAlign headers applied. It highlights any obvious data errors and provides a clear "Import Data" button to initiate background processing.
Key Enhancements:
- Save Mapping as a Template: Allows users to save successful mappings for future uploads, streamlining repetitive tasks.
- Smart Date Format Detection: Automatically detects and allows correction of date formats.
- Real-time Feedback: Uses animations and loading indicators for a responsive user experience.
The document emphasizes making the process feel like a guided conversation, leveraging AI suggestions while giving users full control for validation and correction.
Detailed Functional Design Document: CSV Data Onboarding
1. Introduction
This document details the functional design for ChainAlign's CSV Data Onboarding feature, focusing on an intelligent, user-friendly interface for mapping external CSV file headers to internal ChainAlign database fields. The goal is to provide a smooth and intuitive experience for users importing their data, minimizing manual effort while ensuring data accuracy.
2. User Stories
- As a user, I want to easily upload my CSV files, regardless of their formatting, so that I can quickly bring my data into ChainAlign.
- As a user, I want the system to intelligently suggest mappings between my CSV headers and ChainAlign fields, so I don't have to manually map every column.
- As a user, I want to easily correct any incorrect mapping suggestions from the system, so I maintain control over my data.
- As a user, I want to preview my data before final import, so I can catch any errors or misinterpretations.
- As a user, I want to save my mapping configurations as templates, so I can reuse them for future uploads of similar files.
- As a user, I want to clear feedback and progress indicators throughout the upload and mapping process.
3. Functional Flow: The Three-Step Intelligent Mapping
The CSV data onboarding process will consist of three distinct steps: Upload, Map Headers, and Validate & Import.
3.1. Step 1: The Upload Zone
3.1.1. Description: This is the initial screen where users will initiate the data import process by uploading their CSV file. The design prioritizes simplicity and ease of use.
3.1.2. UI Elements:
- Top-Level Step Progress: A horizontal stepper component showing "Upload → Map Headers → Validate & Import", with "Upload" highlighted as the current step.
- Drag-and-Drop Area: A prominent, visually distinct area (e.g., a dashed border box) labeled "Drag and drop your CSV file here" or "Click to upload."
- File Input Button: A secondary option, "Browse files," for users who prefer traditional file selection.
- Progress Indicator: A progress bar or spinner to show file upload status.
- Status Messages: Text messages indicating the current state (e.g., "Uploading...", "Analyzing your file...", "Identifying columns...", "Generating mapping suggestions...").
3.1.3. User Interaction:
- File Selection: User drags a CSV file onto the designated area or clicks "Browse files" to select one.
- Upload Initiation: Upload begins immediately upon file drop or selection.
3.1.4. System Response:
- File Upload: The system receives the CSV file.
- Progress Feedback: The UI displays upload progress.
- Background Analysis: Once uploaded, the backend's AI/ML model asynchronously analyzes the CSV file to:
- Extract column headers.
- Infer data types for each column.
- Generate mapping suggestions to ChainAlign database fields based on header names, data patterns, and potentially historical user mappings.
- Calculate a confidence score for each mapping suggestion. (Note: For robust asynchronous processing, the frontend will need to implement a mechanism like WebSockets or polling to receive real-time status updates on the analysis progress and potential failures, rather than relying on a single immediate success response.)
- Transition: Upon completion of analysis, the system automatically transitions the user to Step 2: Map Headers.
3.1.5. Error Handling:
- Invalid File Type: If a non-CSV file is uploaded, display an error message: "Invalid file type. Please upload a CSV file."
- Upload Failure: If the upload fails (e.g., network error), display "Upload failed. Please try again."
- File Size Limit: If the file exceeds a predefined size limit, display "File too large. Maximum allowed size is [X]MB."
3.2. Step 2: The Mapping Interface (Core Design)
3.2.1. Description: This is the most critical screen where users review and adjust the AI's suggested column mappings. To prevent cognitive overload and guide the user effectively, the interface will present CSV columns one at a time as an interactive card. Users will confirm or adjust the mapping for the current column before proceeding to the next, creating a guided, sequential flow. Columns will be prioritized based on AI confidence levels.
3.2.2. UI Elements:
- Top-Level Step Progress: A horizontal stepper component showing "Upload → Map Headers → Validate & Import", with "Map Headers" highlighted as the current step.
- Column Queue Summary: A small, persistent display (e.g., at the top or in a sidebar) showing the total number of columns, how many are mapped, how many are ignored, and how many remain in the queue.
- Single Column Card: The central focus of the screen, displaying details for the current CSV column being reviewed.
- CSV Header Display: Prominently displays the exact header name from the user's uploaded CSV file.
- Sample Data Preview: A small, inline preview of the first few data points from that CSV column to provide context.
- "Map to ChainAlign Field" Combobox: An interactive searchable dropdown (Combobox) embedded directly within the card.
- Pre-selection: Each Combobox will be pre-selected with the AI's best guess for the corresponding ChainAlign database field.
- Confidence Indicator: A visual cue (e.g., a small colored dot, an icon, or a percentage) indicating the AI's confidence in its suggestion, with color coding:
- Green: High confidence match (90%+)
- Yellow: Medium confidence match (60-89%)
- Red: Low confidence or unmapped
- Gray: Ignored columns
- Search Functionality: Typing into the Combobox will filter the list of available ChainAlign fields in real-time.
- Dropdown List: Contains all available ChainAlign database fields.
- "Ignore this column" Option: An explicit option within the Combobox dropdown to allow users to exclude specific CSV columns from the import.
- Data Type & Validation Status: An indicator showing the inferred data type for the CSV column and its validation status against the mapped ChainAlign field (e.g., "Text -> Number (Mismatch)"). This will highlight if the CSV column's data type is incompatible with the mapped ChainAlign field.
- Contextual Help (Tooltip/Info Icon): An info icon or tooltip next to the Combobox or ChainAlign field name that, when hovered/clicked, displays a brief description of what the ChainAlign field represents.
- Confirmation Mechanism: For each column card, a clear action to confirm the mapping and move to the next column.
- Primary Action: A prominent "Confirm Mapping" button.
- Alternative Confirmation (e.g., Swipe-to-Confirm/Long-Press): For high-confidence matches, a quicker interaction like a swipe gesture or a long-press on the confirm button could be offered to speed up the process.
- Navigation/Action Buttons for Current Card:
- "Skip for now" / "Next Column": Allows users to defer a decision or move to the next column in the queue without confirming the current one.
- "Ignore Column": Explicit button to mark the current column as ignored.
- Smart Suggestions Panel (Sidebar/Collapsible Section): A dedicated area (e.g., a sidebar or collapsible section) that provides "Suggestions for unmapped columns" that are not currently in the main card view. This panel will show likely ChainAlign fields for columns the AI couldn't confidently match, allowing users to drag-and-drop or click to apply suggestions to the current card.
- Batch Actions (Contextual): Options to perform actions on multiple columns, accessible from the column queue summary or a dedicated menu (e.g., "Ignore all columns containing 'internal_'", "Apply template to similar columns").
- Overall Completion Progress: A dynamic progress bar showing the percentage of required ChainAlign fields that have been successfully mapped.
- "Continue to Validate" Button: A primary action button to proceed to Step 3, enabled only when mapping requirements are met.
- Column Analysis Summary: A panel providing a statistical overview of the dataset, including counts for high confidence, medium confidence, low confidence/needs attention, and ignored columns. This gives users immediate context about their dataset and progress.
3.2.3. User Interaction:
- Sequential Review: User is presented with one CSV column card at a time, prioritized by AI confidence (e.g., high confidence first, then medium, then low/unmapped).
- Confirm Mapping: User reviews the AI's pre-selected mapping, confidence score, and data type validation status. If correct, they use the confirmation mechanism (e.g., click "Confirm Mapping").
- Correct Mapping: If a suggestion is incorrect, the user clicks the Combobox, types to search for the correct ChainAlign field, and selects it from the filtered list, then confirms.
- Ignore Column: User clicks "Ignore Column" button for any CSV columns they do not wish to import.
- Skip Column: User clicks "Skip for now" to move the current column to the end of its confidence-based queue.
- Utilize Smart Suggestions: User can interact with the "Smart Suggestions Panel" to quickly map unmapped columns.
- Apply Batch Actions: User can select multiple columns and apply a batch action (e.g., ignore).
- Proceed to Validation: User clicks "Continue to Validate" once all mandatory mappings are confirmed and data type issues are resolved.
3.2.4. System Response:
- Dynamic Column Loading: The system loads the next prioritized column into the single column card view after a mapping is confirmed, skipped, or ignored.
- Dynamic Filtering: As the user types in a Combobox, the system filters the list of available ChainAlign fields.
- Mapping Update: The system updates the internal mapping configuration based on user selections.
- Real-time Data Type Validation: As a mapping is selected, the system immediately validates the CSV column's inferred data type against the ChainAlign field's expected data type. Any mismatches are highlighted within the column card.
- Validation (Pre-emptive): The system performs basic validation (e.g., ensuring all mandatory ChainAlign fields are mapped). A single ChainAlign field cannot be mapped to multiple CSV columns; this will be treated as an error.
- Progress Update: The "Column Queue Summary" and "Overall Completion Progress" bar update dynamically with each user action.
- Transition: Upon clicking "Continue to Validate," the system processes the confirmed mappings and prepares a data preview, then transitions to Step 3.
3.2.5. Error Handling:
- Unmapped Mandatory Fields: If a mandatory ChainAlign field remains unmapped and the user attempts to proceed to Step 3, display a warning/error: "Please map all mandatory fields before proceeding." The system will guide the user to the unmapped mandatory fields.
- Conflicting Mappings: If a ChainAlign field is mapped to two different CSV columns, display an error: "A ChainAlign field cannot be mapped to multiple CSV columns. Please resolve the conflicting mappings." This ensures data integrity and avoids ambiguity.
- Data Type Mismatch: If a CSV column's data type is incompatible with the mapped ChainAlign field, display a clear error within the column card (e.g., "Cannot map Text to Number. Please select a compatible field or ignore this column."). This prevents confirmation of the mapping until resolved or ignored.
3.3. Step 3: Validation & Import
3.3.1. Description: This final screen provides a preview of the data with the applied mappings, allowing the user a last chance to verify before initiating the import.
3.3.2. UI Elements:
- Top-Level Step Progress: A horizontal stepper component showing "Upload → Map Headers → Validate & Import", with "Validate & Import" highlighted as the current step.
- Data Preview Table: Displays the first 5-10 rows of the user's data, but with the ChainAlign field headers applied based on the confirmed mappings. (Note: For very large CSV files, this preview will be generated by streaming the first N rows to avoid memory issues.)
- Error Highlighting: Any rows or cells with detected data errors (e.g., type mismatches, format issues) will be highlighted (e.g., red background) with a clear tooltip or inline message explaining the error.
- "Import Data" Button: A prominent, typically green, button to trigger the final data import.
- "Back" Button: Allows the user to return to Step 2 to adjust mappings.
- Import Progress/Status: After clicking "Import Data," a message or small modal indicating "Importing data in the background..." or "Import complete."
3.3.3. User Interaction:
- Review Preview: User visually inspects the data preview for correctness.
- Correct Errors (Optional): If errors are highlighted, the user may choose to go "Back" to Step 2 to adjust mappings, or they may proceed, understanding that erroneous rows might be skipped or flagged post-import.
- Initiate Import: User clicks "Import Data."
3.3.4. System Response:
- Final Data Transformation: The system applies the confirmed mappings and performs any necessary data type conversions or transformations.
- Data Validation: A more comprehensive validation pass is performed on the entire dataset.
- Background Import: The data import process is initiated in the background, allowing the user to continue using the application.
- Notification: Upon completion (success or failure) of the background import, the user receives an in-app notification (e.g., a toast message) and potentially an email notification for longer-running imports. The in-app notification will provide a link to a detailed import log or status page.
- Transition: The UI may transition to a dashboard or a success screen.
3.3.5. Error Handling:
- Data Validation Errors: If significant data validation errors are found during the full import, the system should:
- Log the errors in detail.
- Notify the user of the errors and provide options (e.g., download a report of failed rows, re-upload, or access an inline editor for common corrections like date format adjustments or simple find/replace operations).
- Default Behavior: Import valid rows and skip invalid ones. A comprehensive error report detailing skipped rows and reasons will be made available for download.
- Import Failure: If the import process itself fails, notify the user and provide troubleshooting guidance or contact support.
4. Key Enhancements
4.1. Save Mapping as a Template
- Description: After a successful import, the system will prompt the user to save the current mapping configuration as a named template.
- User Interaction: A modal or banner appears post-import: "Would you like to save this mapping for future uploads?" with input for a template name and "Save" / "No Thanks" buttons.
- System Response: If saved, the mapping configuration (CSV headers to ChainAlign fields) is stored in the user's profile. The next time a file with similar CSV headers (using fuzzy matching and similarity scoring) is uploaded, the system will suggest applying the saved template, pre-filling all mappings. The user will have the option to accept or refine the suggested template application.
4.2. Smart Date Format Detection
- Description: The system will attempt to automatically detect the date format within date columns.
- System Response: During Step 2 (Mapping Interface), for columns mapped to a date/time ChainAlign field, a small indicator or dropdown will appear next to the Combobox, showing the detected format (e.g., "Detected: MM/DD/YYYY").
- User Interaction: The user can click this indicator to select an alternative date format from a predefined list if the detection is incorrect, thereby overriding the system's suggestion.
4.3. Real-time Feedback
- Description: Use subtle animations, loading indicators, and toast notifications to make the application feel responsive and intelligent throughout the process.
- Implementation: Apply loading spinners during file analysis, progress bars during upload, and success/error toasts for import completion.
5. Technical Considerations (UI/UX Component)
- Combobox Implementation: The
shadcn/uiCombobox component is the recommended choice for the "Map to ChainAlign Field" dropdowns in Step 2.- Installation:
npx shadcn-ui@latest add combobox - Integration: The component will be integrated into the React frontend, receiving the list of available ChainAlign fields as props.
- Styling: Leverage Tailwind CSS for full styling control to match ChainAlign's brand identity.
- Installation:
- Backend API: Dedicated API endpoints will be required for:
- File upload.
- Receiving CSV data for AI analysis and mapping suggestion generation.
- Receiving confirmed mappings for data transformation and import.
- Saving and retrieving mapping templates.
API Endpoint Definitions:
-
POST /api/csv/templates: Saves a new mapping template.- Request Body:
{
"templateName": "Monthly Sales Report",
"mappings": {
"Product_SKU": "product_id",
"Q1_Forecast": "demand_forecast",
"Cost_Per_Unit": "cost_per_unit"
}
} - Response (201 - Created):
{
"templateId": "template-uuid-12345",
"message": "Template 'Monthly Sales Report' saved successfully."
}
- Request Body:
-
GET /api/csv/templates: Retrieves all saved mapping templates for the user/organization.-
Response (200 - OK):
[
{
"templateId": "template-uuid-12345",
"templateName": "Monthly Sales Report",
"createdAt": "2025-09-19T10:00:00Z",
"mappings": {
"Product_SKU": "product_id",
"Q1_Forecast": "demand_forecast"
}
},
{
"templateId": "template-uuid-67890",
"templateName": "Weekly Inventory Upload",
"createdAt": "2025-09-18T14:30:00Z",
"mappings": {
"item_code": "product_id",
"on_hand_qty": "inventory_level"
}
}
] -
Asynchronous Processing & Real-time Status: For operations like file upload and background analysis, a robust mechanism (e.g., WebSockets or polling) will be implemented to provide real-time status updates to the frontend, ensuring users are informed of progress and any potential failures.
-
API Payloads (Example for
suggest-mapping):POST /api/csv/suggest-mappingRequest:{
"fileId": "unique-upload-id-123",
"sampleData": [
{"col1": "val1", "col2": "valA"},
{"col1": "val2", "col2": "valB"}
]
}POST /api/csv/suggest-mappingResponse:{
"suggestions": [
{"csvHeader": "col1", "chainAlignField": "product_sku", "confidence": 0.95},
{"csvHeader": "col2", "chainAlignField": "demand_forecast", "confidence": 0.88}
]
}
(Note: Full API documentation with all endpoints and payloads should be maintained separately, but key examples are included here for context.)
-
-
Data Validation: Backend validation will be crucial for data integrity, including type checking, format validation, and business rule enforcement.
-
Memory Considerations for Large Files: For large CSV files, the system will employ streaming techniques for initial parsing and data preview generation (e.g., showing only the first N rows) to prevent memory exhaustion on both frontend and backend. Full file processing will occur in the background during the import phase.
-
Iterative Refinement & Rollback Strategy: The system will allow users to easily return to previous steps (e.g., from Step 3 to Step 2) to adjust mappings or correct errors without re-uploading the entire file. For imports that have already occurred, a clear rollback mechanism (e.g., an "Undo Last Import" option for a limited time, or a detailed import log with options to delete imported batches) will be considered to support iterative refinement and error correction.
6. MVP Simplification Strategy
For the initial release, the following simplifications are strongly recommended to accelerate development while still delivering core value and a smooth user experience:
- Reduce to Two Steps (Strong Recommendation): Combine the "Map Headers" and "Validate & Import" steps into a single, iterative step with a live preview. As users adjust mappings in the single-card-flow, the preview updates in real-time, significantly reducing navigation overhead and providing immediate feedback. This aligns with the enhanced UX recommendations.
- Focus on Common Patterns: Initially, optimize the AI's mapping suggestions and data validation for the most common S&OP data patterns (e.g., products, forecasts, inventory levels) relevant to ChainAlign's primary use cases. This allows for a more focused and accurate AI model in the early stages.
- Smart Defaults for Standard Fields: Pre-configure field mappings for commonly encountered column names (e.g., "Product ID", "SKU", "Forecast", "Date"). The system should automatically detect and suggest these standard mappings with high confidence.
- Phased Rollout of Advanced Features: Features like advanced batch actions, inline error correction editors (beyond basic date format adjustments), and highly sophisticated template management can be introduced in subsequent iterations.