Splitting Light
Changelog for the Splitting Light profile. This section covers rule-based splitting, deterministic behavior, fast heuristics, and zero-dependency processing options.
What’s Included
- Rule-Based Splitting: Configurable heuristics without ML model dependencies
- Deterministic Behavior: Consistent, reproducible splitting results
- Fast Heuristics: Sub-second processing for standard document formats
- Zero Dependencies: Self-contained processing without external model calls
Recent Updates
2024-12-04 — Bank Statement Preset
Released preconfigured rule-based splitter for bank statements. Detects transaction boundaries, date ranges, and account sections without model invocation.
- Impact: Latency
2024-11-20 — YAML Configuration Support
Added inline YAML configuration for defining custom split rules during file upload. Supports page-based, keyword-based, and visual separator rules.
- Impact: UX
2024-11-08 — Deterministic Mode v2
Enhanced deterministic mode ensuring byte-identical output for identical inputs. Required for audit compliance and pipeline reproducibility.
- Impact: Compliance
2024-10-26 — Page Range Splitting
Added simple page range splitting for known document structures. Supports fixed ranges, repeating patterns, and offset-based definitions.
- Impact: UX
2024-10-12 — Region Tagging Integration
Enabled region tagging so light splitters assign metadata to sections during workflow processing. Downstream extractors receive section context.
- Impact: Accuracy
2024-09-28 — Keyword Boundary Detection
Added keyword-based splitting using configurable trigger phrases. Supports case-insensitive matching and proximity-based grouping.
- Impact: Accuracy
2024-09-14 — Performance Optimization
Reduced average splitting time from 180ms to 45ms for single-page documents. Multi-page documents scale linearly with page count.
- Impact: Latency
2024-08-30 — Visual Separator Detection
Added detection for horizontal rules, page breaks, and whitespace gaps as section boundaries. No OCR required for layout-based splits.
- Impact: Reliability
Compatibility Notes
- YAML configuration requires API v2.0 or later
- Deterministic mode adds ~10ms processing overhead
- Rule presets available for: invoices, bank statements, contracts, receipts
Roadmap (Next Quarter)
- Visual template designer in Console for custom rules
- Conditional splitting based on classification results
- CSV export of split metadata for analytics