Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
209 commits
Select commit Hold shift + click to select a range
e8dd955
Start of LC Collection classes
kheal Aug 12, 2024
6b264e9
Improve handling of molecular formula parsing
kheal Aug 12, 2024
113aeb9
Merge peak shape metrics into collection
kheal Aug 12, 2024
e8fe790
Merge master into lcms_dev
kheal Aug 13, 2024
08447b6
Add validation steps to lcms collection parser
kheal Aug 14, 2024
150ad60
Add LCMSCollection class
kheal Aug 14, 2024
8e0de4d
Add calculations for lcms collections to find consensus mass features
kheal Aug 15, 2024
5fead06
Start retention time alignment functionality
kheal Aug 15, 2024
04d3a5b
Add work towards aligning mass features
kheal Aug 16, 2024
60202c2
Add lcms collection attributes, plotting functions
kheal Aug 19, 2024
2885357
Merge branch 'peak_shape_metrics' into lcms_dev_collection
kheal Aug 19, 2024
020501b
Add functionality for alignment within an LCMS collection
kheal Aug 19, 2024
8753e4f
Continue to add functionality for getting consensus mass features
kheal Aug 20, 2024
db5f8f1
Merge branch 'peak_shape_metrics' into lcms_dev_collection
kheal Aug 20, 2024
c4f3d08
Start building consensus mass feature functionality
kheal Aug 20, 2024
e33d330
Begin adding more dimensions to agglomerative_clustering, move to KDt…
kheal Aug 26, 2024
5d3e9fc
Add memory saving and multiprocess loading
kheal Sep 12, 2024
f483f34
Add partitioning to consensus mass feature step
kheal Sep 12, 2024
bdcd4f8
Add calculations for getting consensus mass features
kheal Sep 17, 2024
aab0306
Start to add functionality for accept reject rt alignment
kheal Sep 17, 2024
3e10066
Start to add functionality for accept reject rt alignment
kheal Sep 17, 2024
384e28f
Merge branch 'peak_shape_metrics' into lcms_dev_collection
kheal Sep 17, 2024
ab96d02
Add small mods on workflow
kheal Sep 18, 2024
430ddc9
Merge peak_shape_metrics into lcms_dev_collection
kheal Sep 18, 2024
e5240a6
Add remove unprocessed data from parser
kheal Sep 18, 2024
f1fede2
Merge branch 'peak_shape_metrics' into lcms_dev_collection
kheal Sep 18, 2024
818133c
Merge branch 'peak_shape_metrics' into lcms_dev_collection
kheal Sep 18, 2024
a94ae13
Start to add parameters for LCMSCollections class
kheal Sep 19, 2024
24424e8
Add anchoring technique to lcms collection alignment
kheal Sep 19, 2024
7ad7a39
Add alignment acceptable parameters to lcms collection
kheal Sep 19, 2024
d30ce86
Add documentation for lcms collection settings
kheal Sep 20, 2024
ea8a0ab
Merge branch 'peak_shape_metrics' into lcms_dev_collection
kheal Sep 20, 2024
8c6acb4
Add option for dropping isotopologues from downstream for lcms collec…
kheal Sep 20, 2024
7c6fff7
Change mass feature clustering to roll up approach
kheal Sep 23, 2024
af005c1
Merge branch 'peak_shape_metrics' into lcms_dev_collection
kheal Sep 23, 2024
e1baaa5
Merge branch 'peak_shape_metrics' into lcms_dev_collection
kheal Sep 24, 2024
bf2afdc
Add functionality for multi-batch alignment to lcms collection
kheal Sep 25, 2024
8536c34
Make partioning relative or not
kheal Sep 26, 2024
419eaa3
Add method for summarizing clusters
kheal Sep 26, 2024
13f948a
Merge peak_shape metrics into lcms_dev_collection
kheal Sep 26, 2024
7315895
Add ability to add basic chromotography to lcmscollection
kheal Sep 27, 2024
4d1002f
Add functionality for merging appropriate consensus mass features aft…
kheal Sep 30, 2024
6cdb2ca
Clean up lc_calc module
kheal Sep 30, 2024
79bcf9b
Merge master into lcms_dev_collection
kheal Nov 18, 2024
50033a5
Merge branch 'master' into lcms_dev_collection
kheal Feb 12, 2025
72f3b0a
Clean up collection branch for distribution
kheal Feb 13, 2025
83e3133
Prepare collection for distribution
kheal Feb 27, 2025
cd98ecb
remove dask, remove extra dataframe step, block runtime warnings, set…
Mar 13, 2025
b04dba7
added 2 basic plots, needs complex plot
Apr 14, 2025
d3f1303
temp working files for complex plot
Apr 14, 2025
0e56d84
added consensus plot
Apr 14, 2025
ae844e1
replacing workflow file
Apr 14, 2025
135a37b
adding to docstrings, change mass feat to mz feat
Apr 16, 2025
3259172
modifying cluster_summary_dataframe
Apr 16, 2025
11f6dee
Merge master into lcms_dev_collection
kheal Apr 17, 2025
1994778
Move cluster_summary_df to be a property
kheal Apr 17, 2025
f367493
update show_all
Apr 17, 2025
42ab4a9
moved plotting functions after consensus features are added
Apr 17, 2025
64e3df5
Merge branch 'lcms_dev_collection_consensusplot' into 'lcms_dev_colle…
Apr 17, 2025
9cde135
Merge master into feature branch
kheal Apr 30, 2025
4f1efba
Merge main into lcms collection branch
kheal May 27, 2025
a10deaa
scaled distances used for clustering
Jun 5, 2025
2ef770d
Add scaling to clustering distance matrix, generate figure to evaluat…
Jun 19, 2025
1df9713
Merge branch 'ldc_distfigs' into 'lcms_dev_collection'
Jun 19, 2025
ca02ea0
Cluster outlier frequency plot
Jul 14, 2025
bba86ed
Merge branch 'ldc_distfigs' into 'lcms_dev_collection'
Jul 14, 2025
2289c22
Convert spectra_parser to a property within MassSpectraBase
kheal Jul 18, 2025
6e8bcef
Fix raw path file finder, add test to check original parser functiona…
kheal Jul 18, 2025
5b4abdf
Fix lcms spectra parser re-instantiation with example
kheal Jul 28, 2025
4156015
Add examplar of new parser capability
kheal Jul 28, 2025
f3d2083
Add examplar of new parser capability
kheal Jul 28, 2025
fe756e0
Merge master into features, v3.8
kheal Jul 29, 2025
28a26a1
Merge branch 'lcms_dev_collection_parser_fix' into 'lcms_dev_collection'
kheal Jul 29, 2025
901fd71
Add methods and demo for loading and dropping spectra data within a c…
kheal Jul 29, 2025
4782d73
Merge branch 'lcms_dev_collection_raw_data_handle' into 'lcms_dev_col…
kheal Jul 30, 2025
e480866
Remove chromatography from Collection instantiation
kheal Aug 4, 2025
441becf
Merge branch 'ldc_remove_chroma' into 'lcms_dev_collection'
kheal Aug 4, 2025
f2e8fb9
WIP exporter, importer for LCMSCOllection
kheal Aug 13, 2025
754d76e
Basic export/import functionality for hdf5 collection working
kheal Aug 13, 2025
b9acb23
Inducing mass features
Sep 2, 2025
268a3fa
Merge branch 'ldc_distfigs' into 'lcms_dev_collection'
Sep 2, 2025
acd17ac
Merge master into branch
kheal Sep 11, 2025
fd85bd0
Fix on integrations
kheal Sep 16, 2025
96446f8
Modify integration calculation for speed
kheal Sep 16, 2025
2976366
Merge ldc into feature
kheal Sep 16, 2025
f748046
Add functionality for saving attribute of retention time alignment
kheal Sep 16, 2025
7414a3d
Add functionality for saving retention time alignment on collection,
kheal Sep 16, 2025
af6c278
Add saving and loading of retention time to export/import classes for…
kheal Sep 16, 2025
0ec4dcf
Add code for saving / loading cluster assignments
kheal Sep 16, 2025
2e567a6
Refactor to_hdf function for LCMSExport class to deal with different …
kheal Sep 17, 2025
2e89224
Add to test for lcms_metabolomics for import / export
kheal Sep 17, 2025
0ac073f
WIP exporter for collection
kheal Sep 18, 2025
fa44efc
initiating new branch for mr
Oct 2, 2025
364f8d6
initiating new branch for mr
Oct 2, 2025
6644b4d
Allow induced feature search to run in parallel
Oct 2, 2025
322c7d9
Merge branch 'ldc_parallel_mf_search' into 'lcms_dev_collection'
Oct 2, 2025
1b40c78
Merge remote-tracking branch 'origin/lcms_dev_collection' into ldc_pi…
Oct 2, 2025
dbcb6cc
first pass at pivot tables
Oct 6, 2025
b4b79cf
update docstring for mass_features_to_df
Oct 6, 2025
8044893
Merge remote-tracking branch 'origin/lcms_dev_collection' into ldc_co…
Oct 8, 2025
5e67090
working through issues processing on 1 core
Oct 10, 2025
894474e
mostly working
Oct 13, 2025
19a78f3
found the last bug, should be ready
Oct 13, 2025
46e8dbb
remove debugging linews
Oct 13, 2025
32e47a3
remove more debugging lines
Oct 13, 2025
3b41b16
Merge branch 'ldc_pivottables' of code.emsl.pnl.gov:mass-spectrometry…
Oct 13, 2025
99be90f
updating workflow with pivot tables
Oct 13, 2025
2d4e9d9
Merge branch 'master' into lcms_dev_collection
kheal Oct 14, 2025
c63d922
Fix peak metric check and filtering after merge conflicts
kheal Oct 14, 2025
e291fe8
Fix test fixture
kheal Oct 14, 2025
50329f3
Merge branch 'lcms_dev_collection' of code.emsl.pnl.gov:mass-spectrom…
Oct 14, 2025
0a9812a
Merge ldc into branch, resolve conflicts
kheal Oct 14, 2025
9d74a89
Fix test fixture
kheal Oct 14, 2025
2874419
first pass at consensus report, fix bug with merge
Oct 15, 2025
2360c81
update mean mode for consensus report
Oct 16, 2025
ed2c172
update workflow, small fix, and beginning of cluster_dictionary brain…
Oct 16, 2025
4ebd6a5
Merge master into ldc fork
kheal Dec 2, 2025
8cdacec
Merge fork into feature
kheal Dec 2, 2025
4e8d4b0
Add error handling and functionality for moving raw data files
kheal Dec 2, 2025
7ff3988
Add functionality for anchor features by relative intensity
kheal Dec 2, 2025
1713785
Improve plot_cluster_outlier_frequency plot to use vectorized
kheal Dec 2, 2025
ad53b7c
Optimize fill_missing_cluster_features step
kheal Dec 3, 2025
8ba6292
Encapsulate the minimum samples per cluster parameter
kheal Dec 3, 2025
00888bb
Optimize a bit more and clean up docstrings
kheal Dec 3, 2025
8782e3a
Merge branch 'ldc_consensusreport' into 'lcms_dev_collection'
kheal Dec 3, 2025
654e862
Fix merge conflicts between fork and feature
kheal Dec 3, 2025
8c614c3
Add docstrings to exporter classes
kheal Dec 3, 2025
bb53f91
Add updating raw data file locations to collection exporter
kheal Dec 4, 2025
886d4a8
Add functionality for re-adding clusters
kheal Dec 4, 2025
e680c83
Add parameter export/import functionality for the collection class
kheal Jan 16, 2026
34da91f
Add functionality for saving and loading gap filled features
kheal Jan 16, 2026
b13d7aa
Update lipidomics collection test script
kheal Jan 16, 2026
79a2d97
Fix induced mass features re-loading and add auto-manifest creation o…
kheal Jan 16, 2026
47d81fa
Add resource manager for hdf5 mass spectra reader
kheal Jan 16, 2026
fc481f9
Merge branch 'ldc_collection_export_import' into 'lcms_dev_collection'
kheal Jan 16, 2026
bbb3f60
Add function and encapsulated params for getting representative sampl…
kheal Jan 16, 2026
08ffa4f
Add start of re-adding mass features and associating MS2s
kheal Jan 17, 2026
e4fd39c
Add piped operations for post-consensus feature processing
kheal Jan 17, 2026
d75b15b
Add functionality for post-consensus molecular formula search
kheal Jan 19, 2026
e0b9a68
Add functionality for performing MS2 spectral search on consensus mas…
kheal Jan 19, 2026
d9d2611
Add MS2 searching to collection framework
kheal Jan 20, 2026
47e63fd
Add EIC extraction to sample operations, clean up clustering bugs
kheal Jan 20, 2026
ff2cd05
Add ms2 scan association to find features optionally.
kheal Jan 20, 2026
1245b53
Prioritize features with MS2
kheal Jan 20, 2026
c841b81
Made plotting work for cluster id
kheal Jan 21, 2026
cfedd67
Fix test fixture and add plotting to example functionality
kheal Jan 21, 2026
c8f5ced
Fix test fixtures
kheal Jan 21, 2026
bb3e884
Merge branch 'ldc_visulaizations' into 'lcms_dev_collection'
kheal Jan 21, 2026
d665015
Add targeted search functionality
kheal Jan 23, 2026
4d94e7c
Add test for targeted search functionality
kheal Jan 23, 2026
b67809e
Update test assertions for future proofing
kheal Jan 23, 2026
b6a53d0
Merge branch 'targeted_search_functionality' into 'lcms_dev_collection'
kheal Jan 23, 2026
1eaef4d
Lots of improvements to EIC data management, consolodating plotting f…
kheal Jan 26, 2026
37eca90
Save / reload EICs complete
kheal Jan 26, 2026
b9933c2
Add functionality for saving mass spectra to mass features when savin…
kheal Jan 27, 2026
8a22e43
Add collection_consensus_report functionality to use representative m…
kheal Jan 27, 2026
c70c530
Add annotation report for collection
kheal Jan 27, 2026
1d6c754
Add reporting functions for collection level LCMS analysis
kheal Jan 27, 2026
27ee771
Add tests and adjust functionality for aligning with no mass feature …
kheal Jan 27, 2026
cf1117c
Test bug in alignment with perfect matches
kheal Jan 27, 2026
7e1e63e
Fix clustering logic for exact matches
kheal Jan 27, 2026
a44dda4
Add working tests for collection, leave TODO to fix remaining
kheal Jan 27, 2026
14dcb76
Merge lcms_collection_report into monet_mods
kheal Jan 29, 2026
ef6bdf6
Merge branch 'lcms_collection_report' into 'lcms_dev_collection'
kheal Jan 29, 2026
564754b
Add flexibility with database interface for molecular identifier
kheal Jan 29, 2026
561d305
Add MS2 mirror plot and associated test
kheal Jan 29, 2026
020480a
Fix gap filling test
kheal Jan 30, 2026
122c7fe
Fix pivot table creation with no mass feature samples
kheal Jan 30, 2026
4f0a845
Add test for collection-level MS2 annotation application and reporting
kheal Jan 30, 2026
cdc4a1b
Fix collection file mover test
kheal Jan 30, 2026
33ece32
Add error handling for no MS1 annotations, add test for molecular for…
kheal Jan 30, 2026
14c7b73
Final clean up for tests for mass spectra collection
kheal Jan 30, 2026
9611e83
Merge main into feature
kheal Jan 30, 2026
c067006
Small bug fixes on mirror plot
kheal Jan 31, 2026
4e6ed24
Add LCMS Collection notebook and fix mirror plot issue
kheal Jan 31, 2026
95c7956
Add targeted search example
kheal Jan 31, 2026
d7015b6
Modify ms2 mirror plotting to accept a list of spectral libraries
kheal Feb 4, 2026
8fb03c2
Merge master into feature
kheal Feb 9, 2026
3d427e9
Merge master into feature
kheal Feb 9, 2026
c6d8938
Merge branch 'monet_mods' into 'lcms_dev_collection'
kheal Feb 9, 2026
a0a6dde
Add [M]+ to ion type dictionary
kheal Feb 19, 2026
0d62037
Better handle scenarios with no gaps to fill for collection processing
kheal Feb 25, 2026
b7c4144
Improve handling of LCMSCollections parameters
kheal Feb 25, 2026
020eb73
Add parameter to be able to do multiple rounds of finding mass features
kheal Feb 26, 2026
f531a39
Add parameter to be able to do multiple rounds of finding mass features
kheal Feb 26, 2026
c4e0ba1
Better handling low number of peaks for alignment
kheal Feb 26, 2026
9371c00
Skip autogeneration of manifest if it already exists
kheal Feb 26, 2026
81740cc
Add option for accumulation of MS2 search results for sequential sear…
kheal Feb 26, 2026
5bb0b22
Add time range filtering on LCMS parsers and use in gap filling
kheal Feb 26, 2026
cdb7637
Better handle empty MS1 annotation reports
kheal Feb 26, 2026
7298d64
Better deal with accumulating mass features and MS2 search results
kheal Feb 27, 2026
b0c2d49
Add handling for multiple calls of process_consensus_features for seq…
kheal Feb 27, 2026
6c030c2
Add and process MS1 for gap-filled samples
kheal Mar 1, 2026
d086be4
Fix tqdm for multiprocessing in process_consensus_mass_features
kheal Mar 1, 2026
15fd1fb
Add better handling for exporting with mixed TF confidence scores
kheal Mar 2, 2026
8982c0d
Handle ms2 scans that are outside the bounds of integration
kheal Mar 3, 2026
696831d
Merge branch 'lcms_dev_collection' into 'corems_4_0'
corilo Mar 16, 2026
91755d7
Clean up exports for collection-level
kheal Apr 1, 2026
a0cd5ea
Improve tests by setting scope of fixture and add defensive getitem f…
kheal Apr 1, 2026
d3d3c08
Update conftest fixture's scope
kheal Apr 1, 2026
ac09042
Add decoding when reading back in spectral search results
kheal Apr 1, 2026
1f2f339
Add parameter to not update individual lcms objects
kheal Apr 1, 2026
e925659
Better handle skipping adding eics to fix notebook plotting
kheal Apr 1, 2026
a308df6
Small changes to notebook for fix
kheal Apr 1, 2026
133219e
Merge branch 'monet_bug_fixes_corems40' into 'corems_4_0'
kheal Apr 7, 2026
78cc829
Refactor alignment of mass features for more robustness
kheal Apr 30, 2026
86bbca0
Merge master into feature
kheal Apr 30, 2026
88f6d00
Merge branch 'corems4_0_alignment_refactor' into 'corems_4_0'
kheal Apr 30, 2026
ffd1d86
error with how mag lab predator files are read
deweycw Jun 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def bruker_transient(ftms_file_location):
return bruker_transient


@pytest.fixture
@pytest.fixture(scope="module")
def lcms_obj():
"""Returns an LCMS object for the tests"""
file_raw = (
Expand Down
196 changes: 196 additions & 0 deletions corems/chroma_peak/calc/subset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# This file contains functions for subsetting dataframes that contain mass feature data.
# This is based on the deimos package, found here: https://github.com/pnnl/deimos/blob/master/deimos/subset.py with some modifications.

import multiprocessing as mp
from functools import partial

import numpy as np
import pandas as pd

class MultiSamplePartitions:
'''
Generator object that will lazily build and return each partition constructed
from multiple samples.

Attributes
----------
features : :obj:`~pandas.DataFrame`
Input feature coordinates and intensities.
split_on : str
Dimension to partition the data.
size : int
Target partition size.
tol : float
Largest allowed distance between unique `split_on` observations.
n_partitions : int
Number of partitions in the data.

'''

def __init__(self,
features,
split_on: str = 'mz',
size: int = 500,
tol: float = 25E-6,
relative: bool = False):
'''
Initialize :obj:`~deimos.subset.Partitions` instance.

Parameters
----------
features : :obj:`~pandas.DataFrame`
Input feature coordinates and intensities.
split_on : str
Dimension to partition the data.
size : int
Target partition size.
tol : float
Largest allowed distance between unique `split_on` observations.

'''
if not isinstance(split_on, str):
raise TypeError(f"Expected 'split_on' to be a string, got {type(split_on).__name__}")
if not isinstance(size, int):
raise TypeError(f"Expected 'size' to be an integer, got {type(size).__name__}")
if not isinstance(tol, float):
raise TypeError(f"Expected 'tol' to be a float, got {type(tol).__name__}")
if not isinstance(relative, bool):
raise TypeError(f"Expected 'relative' to be a boolean, got {type(relative).__name__}")

self.features = features
self.split_on = split_on
self.size = size
self.tol = tol
self.relative = relative

self._compute_splits()

def _compute_splits(self):
'''
Determines data splits for partitioning.

'''

self.counter = 0

idx = self.features.groupby(by=self.split_on).size().sort_index()

counts = idx.values
idx = idx.index

if self.relative:
dxs = np.diff(idx) / idx[:-1]
else:
dxs = np.diff(idx)

# if relative, convert tol to absolute
bins = []
current_count = counts[0]
current_bin = [idx[0]]
self._counts = []

for i, dx in zip(range(1, len(idx)), dxs):
if (current_count + counts[i] <= self.size) or (dx <= self.tol):
current_bin.append(idx[i])
current_count += counts[i]

else:
bins.append(np.array(current_bin))
self._counts.append(current_count)

current_bin = [idx[i]]
current_count = counts[i]

# Add last unadded bin
bins.append(np.array(current_bin))
self._counts.append(current_count)

self.bounds = np.array([[x.min(), x.max()] for x in bins])

# Number of partitions in the data
self.n_partitions = len(bins)

def __iter__(self):
return self

def __next__(self):
if self.counter < len(self.bounds):
q = '({} >= {}) & ({} <= {})'.format(self.split_on,
self.bounds[self.counter][0],
self.split_on,
self.bounds[self.counter][1])

subset = self.features.query(q)

self.counter += 1
if len(subset.index) > 1:
return subset
else:
return None

raise StopIteration

def map(self, func, processes=1, **kwargs):
'''
Maps `func` to each partition, then returns the combined result.

Parameters
----------
func : function
Function to apply to partitions.
processes : int
Number of parallel processes. If less than 2, a serial mapping is
applied.
kwargs
Keyword arguments passed to `func`.

Returns
-------
:obj:`~pandas.DataFrame`
Combined result of `func` applied to partitions.

'''

# Serial
if processes < 2:
result = [func(x, **kwargs) for x in self]

# Parallel
else:
with mp.Pool(processes=processes) as p:
result = list(p.imap(partial(func, **kwargs), self))

# Add partition index
for i in range(len(result)):
if result[i] is not None:
result[i]['partition_idx'] = i

# Combine partitions
return pd.concat(result, ignore_index=True)

def multi_sample_partition(features, split_on='mz', size=500, tol=25E-6, relative=True):
'''
Partitions data along a given dimension. For use with features across
multiple samples, e.g. in alignment.

Parameters
----------
features : :obj:`~pandas.DataFrame`
Input feature coordinates and intensities.
split_on : str
Dimension to partition the data.
size : int
Target partition size.
tol : float
Largest allowed distance between unique `split_on` observations.
relative : bool
If `True`, the `tol` parameter is interpreted as a relative tolerance.

Returns
-------
:obj:`~deimos.subset.Partitions`
A generator object that will lazily build and return each partition.

'''

return MultiSamplePartitions(features, split_on, size, tol, relative)
Loading