Splitting Light

Changelog for the Splitting Light profile. This section covers rule-based splitting, deterministic behavior, fast heuristics, and zero-dependency processing options.


What’s Included

  • Rule-Based Splitting: Configurable heuristics without ML model dependencies
  • Deterministic Behavior: Consistent, reproducible splitting results
  • Fast Heuristics: Sub-second processing for standard document formats
  • Zero Dependencies: Self-contained processing without external model calls

Recent Updates

2024-12-04 — Bank Statement Preset

Released preconfigured rule-based splitter for bank statements. Detects transaction boundaries, date ranges, and account sections without model invocation.

  • Impact: Latency

2024-11-20 — YAML Configuration Support

Added inline YAML configuration for defining custom split rules during file upload. Supports page-based, keyword-based, and visual separator rules.

  • Impact: UX

2024-11-08 — Deterministic Mode v2

Enhanced deterministic mode ensuring byte-identical output for identical inputs. Required for audit compliance and pipeline reproducibility.

  • Impact: Compliance

2024-10-26 — Page Range Splitting

Added simple page range splitting for known document structures. Supports fixed ranges, repeating patterns, and offset-based definitions.

  • Impact: UX

2024-10-12 — Region Tagging Integration

Enabled region tagging so light splitters assign metadata to sections during workflow processing. Downstream extractors receive section context.

  • Impact: Accuracy

2024-09-28 — Keyword Boundary Detection

Added keyword-based splitting using configurable trigger phrases. Supports case-insensitive matching and proximity-based grouping.

  • Impact: Accuracy

2024-09-14 — Performance Optimization

Reduced average splitting time from 180ms to 45ms for single-page documents. Multi-page documents scale linearly with page count.

  • Impact: Latency

2024-08-30 — Visual Separator Detection

Added detection for horizontal rules, page breaks, and whitespace gaps as section boundaries. No OCR required for layout-based splits.

  • Impact: Reliability

Compatibility Notes

  • YAML configuration requires API v2.0 or later
  • Deterministic mode adds ~10ms processing overhead
  • Rule presets available for: invoices, bank statements, contracts, receipts

Roadmap (Next Quarter)

  • Visual template designer in Console for custom rules
  • Conditional splitting based on classification results
  • CSV export of split metadata for analytics