Add optional reference FASTA for non-SARS-CoV-2 genomes#614
Add optional reference FASTA for non-SARS-CoV-2 genomes#614jeromekelleher wants to merge 1 commit into
Conversation
Make the reference genome an optional input so inference can run on genomes other than SARS-CoV-2, as a non-breaking change: all new parameters default to the built-in SARS-CoV-2 reference. - import-alignments gains a --reference option, sizing/labelling the dataset from the supplied FASTA (Dataset.new gains sequence_length and contig_id kwargs). - infer reads an optional reference_fasta config key, threaded through initial_ts; genome length and identity metadata are derived from the FASTA and the length is checked against the dataset contig length. - match_path_ts takes the sequence length from the working tree sequence rather than the hardcoded constant. - Non-SARS-CoV-2 runs use a generic time-zero epoch for the reference.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #614 +/- ##
==========================================
+ Coverage 88.07% 88.18% +0.11%
==========================================
Files 12 12
Lines 4134 4165 +31
Branches 585 593 +8
==========================================
+ Hits 3641 3673 +32
Misses 365 365
+ Partials 128 127 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
WIP I need to look through this properly |
jeromekelleher
left a comment
There was a problem hiding this comment.
Looks OK, will change the API slightly.
|
|
||
|
|
||
| def initial_ts(problematic_sites=None): | ||
| def initial_ts(problematic_sites=None, reference_fasta=None): |
There was a problem hiding this comment.
make this generic, like
def initial_ts(*, reference, genbank_id, reference_date, problematic_sites=None,)
and update the existing calling sites to use the constants. Require a reference date by adding another top-level key that is required, if reference_fasta is supplied.
|
|
||
| @pytest.mark.parametrize( | ||
| ["num_samples", "chunk_size"], | ||
| [ |
There was a problem hiding this comment.
Add some tests in this file that perform a few days of inference over a very small genome and predictable mutations, explicitly checking the results.
Make the reference genome an optional input so inference can run on genomes other than SARS-CoV-2, as a non-breaking change: all new parameters default to the built-in SARS-CoV-2 reference.
Closes #609