Skip to content

Expose internal ICLabel and ASR metrics in QC command output#239

Open
google-labs-jules[bot] wants to merge 1 commit into
developfrom
jules/feat-enhanced-qc-manifest-js0-0a4df4c7-5929-42aa-8bb6-6c70f03f7ccf
Open

Expose internal ICLabel and ASR metrics in QC command output#239
google-labs-jules[bot] wants to merge 1 commit into
developfrom
jules/feat-enhanced-qc-manifest-js0-0a4df4c7-5929-42aa-8bb6-6c70f03f7ccf

Conversation

@google-labs-jules

Copy link
Copy Markdown

Context & Rationale

Currently, automated validation pipelines and AI agents rely on brittle, manually-replicated thresholds because the CLI only provides raw metrics (like SNR and variance). This requires external scripts to hardcode internal logic—such as the 0.9 probability threshold for ICLabel—to determine if a dataset is "clean."

This PR enhances the qc manifest by exposing internal quality indicators that are already computed during the cleaning phase. By providing direct access to these metrics, we reduce the complexity of autonomous validation pipelines and provide researchers with greater transparency into the signal quality auditing process.

Key Changes

1. Enhanced ICLabel Reporting

Added a new helper _iclabel_metrics to qc.py to extract and expose the mean probabilities for all 7 ICLabel artifact classes. These are now structured within metrics["ica"]["iclabel"].

2. ASR & RMS Transparency

Modified clean_channels.py and clean_windows.py to attach internal computed values to the EEG['etc'] structure. This includes:

  • Channel metrics: noisiness and znoise (ASR noisiness Z-scores).
  • Window metrics: w_rms (RMS power) and wz (temporal Z-scores).

3. Structured Manifest Integration

Implemented _asr_metrics in qc.py to securely merge these internal values into the machine-readable output under metrics["data_quality"]["asr"].

4. Machine-Readable Formatting

To ensure JSON compliance and compatibility with downstream agentic tools:

  • All NumPy arrays are transformed via .tolist().
  • Utilized existing _as_list and _float_or_none utilities to maintain consistent data types.
  • Preserved all existing raw metrics to ensure backward compatibility.

Success Criteria

  • The qc command output contains a structured section for all 7 ICLabel class probabilities.
  • ASR noisiness Z-scores are reported for both channels and temporal windows.
  • JSON output remains valid and parsable by existing automation scripts.
  • No performance regressions, as metrics are retrieved from existing internal computations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants