Skip to content

Prolif class implementation#88

Open
talagayev wants to merge 17 commits into
mainfrom
prolif_class_implementation
Open

Prolif class implementation#88
talagayev wants to merge 17 commits into
mainfrom
prolif_class_implementation

Conversation

@talagayev
Copy link
Copy Markdown
Collaborator

First draft of the ProLIF implementation into OpenFE Analysis.

Changes made:

  • creation of ProLIFAnalysis class for calculation of interactions using ProLIF
  • Addition of first tests in test_prolif.py, which mainly look into if interactions were calculated sucessfully and protein/ligand conversion from MDAnalysis into ProLIF worked sucessfully.

Currently that is the first draft and PR is to see that the tests work etc.

Some notes:

  • Currently this is just calculating and returning the results in pd.DataFrame format, since that is easy to see if it worked. We can see what the best way is to present the results, be it 2D visualization of the interactions, dataframes or some other format.
  • As was mentioned by Irfan, the recognition and conversion of protein in MDAnalysis is sometimes tricky, which is the case here. So the code currently needs to guess bonds, which it does here:
    universe.select_atoms("protein").guess_bonds(vdwradii=vdwradii)
    But for the vdwradii it wants some values for Cl, Br etc. that we need to provide, so we need to see what defaults we select there.
  • For interactions we need to check what the default option should be, if default == all interactions or some interactions etc. and if the water-bridge interactions are default and if so, what water-bridge interaction order we make default. the current one is 3, which means it can be something like:
    Ligand -- Water -- Water -- Water -- Protein
  • Same goes with ProLIFAnalysis.run() there we can also look into what the defaults have to be, how many cores and if the interactions should be atom based (Ligand with Protein atom) or residue based (Ligand with Protein residue).

This is the first draft, will be updated during further development :)

PS. A small thing I also noticed, since I tested it on different PCs with one being with Windows currently somehow python 3.14 has module loading Issues with rdkit and thus I had to downgrade there to python 3.13.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 72.22222% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.15%. Comparing base (c1bdddc) to head (15e8620).

Files with missing lines Patch % Lines
src/openfe_analysis/prolif.py 72.22% 30 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #88      +/-   ##
==========================================
- Coverage   98.05%   93.15%   -4.90%     
==========================================
  Files           9       10       +1     
  Lines         462      570     +108     
==========================================
+ Hits          453      531      +78     
- Misses          9       39      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

import prolif as plf


class ProLIFAnalysis:
Copy link
Copy Markdown
Member

@IAlibay IAlibay Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you subclass AnalysisBase here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup should be doable, will look into it :)

Wanted to wait and see how RMSD will look like to use an identical structure and have this initial PR to have bulletpoints to discuss tomorrow during the meeting and also have an easier overview on what was changed etc. :)

@jthorton
Copy link
Copy Markdown
Contributor

I am not sure if this should be handled in the analysis class or in a protocol before running this class but there is an issue when doing this kind of analysis on trajectories from our hybrid topology protocol, where the connectivity of the end state ligands and identities of the atoms can get confused see this example on end stateB when looking at torsions. Should we add an option to provide and rdkit molecule for the ligand and use this to correct the ligand atom group automatically or should we have users fix this beforehand? This will be needed in the torsion analysis as well so a general function which could fix an atom group might be a good idea?

Comment thread src/openfe_analysis/prolif.py Outdated

# Currently adding here but maybe as args?
self.protein_ag = self.universe.select_atoms(
"protein and byres around 12 group ligand",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree I think it would be great to have these (also for water below) as args.

Comment thread src/openfe_analysis/prolif.py Outdated
and "WaterBridge" in fp_interactions
):
if self.water_ag.n_atoms == 0:
raise ValueError("WaterBridge selected but water selection is empty.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead raise a warning, or what would happen if there are individual frames that don't have water molecules?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes warining would be better in that case, since that is quite a possible scenario, best approach would be truly warning + remove the water_bridge interaction so that it doesn't run when there are no waters, since I would need to check, but I think currently it would possibly still try to run it 🤔

Comment thread src/openfe_analysis/prolif.py Outdated
Comment on lines +62 to +69
vdwradii = {
"Cl": 1.75,
"CL": 1.75,
"Br": 1.85,
"BR": 1.85,
"Na": 2.27,
"NA": 2.27,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@hannahbaumann hannahbaumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @talagayev , this looks good! The main part I'm not sure about is the run function, maybe we can discuss that tomorrow!

Comment thread src/openfe_analysis/prolif.py Outdated
self.ligand_ag.guess_bonds(vdwradii=vdwradii)

# Water: only if you care about water-mediated interactions
if guess_bonds:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should always be true since the outer loop goes through the same. Or did you mean to put something else here?

Comment thread src/openfe_analysis/prolif.py Outdated
Comment thread src/openfe_analysis/prolif.py Outdated
Comment thread src/openfe_analysis/prolif.py
# Cover case of false interaction
missing = [i for i in interactions if i not in available]
if missing:
raise ValueError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like ProLIF would also raise an error for missing interactions, so this may not be necessary.
_check_valid_interactions

Comment thread src/openfe_analysis/prolif.py Outdated
Comment on lines +144 to +147
if fp_interactions is None:
self.fp = plf.Fingerprint(parameters=self._parameters)
elif len(fp_interactions) == 0:
self.fp = plf.Fingerprint(parameters=self._parameters)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could also use a if fp_interactions: to handle both cases, None and empty list.

self.ifp_df = None

# --- Guess bonds once on stable selections so RDKit/ProLIF can detect HBonds ---
if guess_bonds:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot happening in init here, maybe it would make sense to put some of these into private functions (e.g. to guess bonds, to build the fingerprints) so that they are easier to test separately and for a better overview of the different things that are going on.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, can split it up a little bit, since it is quite crowded currently

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talagayev i created a small utils function for the guessing of bonds here :

as part of this PR #92
Please let me know what you think about it!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannahbaumann looks good to me, will adjust the code to import it from there :)

@hannahbaumann hannahbaumann self-assigned this May 21, 2026
This was linked to issues May 21, 2026
Comment thread environment.yml
- openff-units
- pip
- tqdm
- prolif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add the dependency to the pyproject.toml?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes can do :)

guess_bonds: bool = True,
vdwradii: Optional[Dict[str, float]] = None,
) -> None:
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other analysis classes we ended up going with the doc string style of putting the doc string at the function level instead of having it in init. Could you maybe change that here as well?

show_interaction_data=show_interaction_data,
)

plot_2d = plot_lignetwork
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could also rename the function above to plot_2d or similar and remove the alias here?

"""
2D ProLIF ligand-network visualization.
"""
if not self.ifp:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a small helper function since the check is done for all three plots.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True true :)

Copy link
Copy Markdown
Contributor

@hannahbaumann hannahbaumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @talagayev , this looks good! Just left a few more comments, please let me know if you have any questions!

Transform fingerprint results to pd.DataFrame.
"""
df = self.fp.to_dataframe(**kwargs)
self.ifp_df = df
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's necessary to store it here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm seing the dataframe and with the dataframe being also kinda a quite utils style representation of results I am thinking if it makes sense if we would have other cases where we would want to show results in dataframes just as an utils function accesible for all other functions 🤔

@talagayev
Copy link
Copy Markdown
Collaborator Author

Thanks @talagayev , this looks good! Just left a few more comments, please let me know if you have any questions!

Yes probably a short one. Didn't have time to push the commits that I worked on yesterday, but for the plotting I wanted to ask about the plot_3d and plot_lignetwork. With the plot_barcode being quite generic I rewrote it a little bit to be possible to be used with other inputs and moved it to plotting.py, but the plot_lignetwork is quite ProLIF unique so should we leave it in prolif.py and for plot_3d currently it does uses again a function directly from ProLIF, but overall if we will use py3Dmol to visualize in 3D in other functions I can also rewrite it to be more generic to be accesible for other functions 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prolif hook Protein-ligand contact analysis

4 participants