Protein-Ligand Interaction Clustering
During the protein-ligand simulations, both protein and ligand undergoes for conformational change, and therefore interaction between them is also evolve and changes during the simulations. Sometime, protein and ligand conformation changes such that interaction remains constant. Capturing similarities in protein-ligand interactions while both are undergoing for changes is challenging. Therefore, clustering of protein-ligand interactions will provide collection of complex conformations that are grouped together by the interactions.
This tutorial provides the steps with examples to perform conformational clustering besed on the protein-ligand interactions. This method is employed in this publication. For more details about the theory of the method, please follow the method section of this publication.
Instructions
Tutorial files: The tutorial files can be downloaded from here.
Extract the files:
tar -zxvf protein-ligand-interaction.tar.gzGo to directory:
cd protein-ligand-interactionCopy the Jupyter Notebook: This notebook is available in the GitHub repo. Download and copy it from the github.
Final result

Required Tools
GROMACS
gmx_clusterByFeatures
Steps
Modification of ligand topology file
Calculation of protein-ligand reciprocal-distance-matrix trajectory
PCA of the reciprocal-distance-matrix trajectory
Calculations of projection of first five PCs on reciprocal-distance-matrix trajectory
Pre-Clustering scan for Cluster-Metrics
Clustering with pre-determined number of clusters
Analysis
MM/PBSA Analysis
1. Modification of ligand topology file
In general, whole ligand is considered as a single residue. At first, ligand’s atoms are grouped as several residues and these are marked in the topology file as shown below in example. Idea is to divide ligand into virtual residues.

Above right side, ligand is colored by the assigned virtual residues.
Notes:
The atoms in a residue should be continuous and its ID and Name should be unique.
Do not reorder the atoms in topology after simulation is performed because same order is present in trajectory.
If required, always reorder atoms during ligand topology generation at the very begining of the simulation setup. In this way, atoms in generated trajectory will have same order as in topology and tpr files.
After, modifying the topology, rerun gmx grompp to regenerate tpr file with ligand having virtual residues. Also, create new index files to include make new groups depending on the requirements.
2. Calculation of protein-ligand reciprocal-distance-matrix trajectory
Now, we calculate reciprocal-distance matrix using distmat sub-command. Since, ligand is now made-up of virtual residues, reciprocal-distance-matrix between ligand and protein will contain reciprocal of minimum-distance between ligand’s virtual residues and protein’s residues.
Notes:
-power -1option is used to calulate 1/(minimum-distance) in place of minimum-distance.Selected index group:
1is protein and12is ligand (Other)
[1]:
%%bash
echo 1 12 | gmx_clusterByFeatures distmat -f inputs/trajectory.xtc -s inputs/complex_res_segments.tpr \
-gx 1 -power -1 -var var_lig_protein.dat -cmap cmap_lig_protein.dat -pca pca.xtc
:-) GROMACS - gmx_clusterByFeatures distmat, 2025.0-dev-20250210-6949615-local (-:
Executable: gmx_clusterByFeatures distmat
Data prefix: /project/external/gmx_installed
Working dir: /home/raj/workspace/gmx_clusterByFeatrues/tutorials/protein-ligand-interaction
Command line:
'gmx_clusterByFeatures distmat' -f inputs/trajectory.xtc -s inputs/complex_res_segments.tpr -gx 1 -power -1 -var var_lig_protein.dat -cmap cmap_lig_protein.dat -pca pca.xtc
:-) gmx_clusterByFeatures distmat (-:
Author: Rajendra Kumar
Copyright (C) 2018-2019 Rajendra Kumar
gmx_clusterByFeatures is a free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
gmx_clusterByFeatures is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with gmx_clusterByFeatures. If not, see <http://www.gnu.org/licenses/>.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Reading file inputs/complex_res_segments.tpr, VERSION 5.1.2 (single precision)
Reading file inputs/complex_res_segments.tpr, VERSION 5.1.2 (single precision)
Select first group:
Group 0 ( System) has 8393 elements
Group 1 ( Protein) has 8325 elements
Group 2 ( Protein-H) has 4217 elements
Group 3 ( C-alpha) has 542 elements
Group 4 ( Backbone) has 1626 elements
Group 5 ( MainChain) has 2169 elements
Group 6 ( MainChain+Cb) has 2660 elements
Group 7 ( MainChain+H) has 2668 elements
Group 8 ( SideChain) has 5657 elements
Group 9 ( SideChain-H) has 2048 elements
Group 10 ( Prot-Masses) has 8325 elements
Group 11 ( non-Protein) has 68 elements
Group 12 ( Other) has 68 elements
Group 13 ( L1) has 4 elements
Group 14 ( L2) has 3 elements
Group 15 ( L3) has 8 elements
Group 16 ( L4) has 8 elements
Group 17 ( L5) has 14 elements
Group 18 ( L6) has 2 elements
Group 19 ( L7) has 3 elements
Group 20 ( L8) has 1 elements
Group 21 ( L9) has 10 elements
Group 22 ( L10) has 10 elements
Group 23 ( L11) has 1 elements
Group 24 ( L12) has 4 elements
Select a group: Select second group:
Group 0 ( System) has 8393 elements
Group 1 ( Protein) has 8325 elements
Group 2 ( Protein-H) has 4217 elements
Group 3 ( C-alpha) has 542 elements
Group 4 ( Backbone) has 1626 elements
Group 5 ( MainChain) has 2169 elements
Group 6 ( MainChain+Cb) has 2660 elements
Group 7 ( MainChain+H) has 2668 elements
Group 8 ( SideChain) has 5657 elements
Group 9 ( SideChain-H) has 2048 elements
Group 10 ( Prot-Masses) has 8325 elements
Group 11 ( non-Protein) has 68 elements
Group 12 ( Other) has 68 elements
Group 13 ( L1) has 4 elements
Group 14 ( L2) has 3 elements
Group 15 ( L3) has 8 elements
Group 16 ( L4) has 8 elements
Group 17 ( L5) has 14 elements
Group 18 ( L6) has 2 elements
Group 19 ( L7) has 3 elements
Group 20 ( L8) has 1 elements
Group 21 ( L9) has 10 elements
Group 22 ( L10) has 10 elements
Group 23 ( L11) has 1 elements
Group 24 ( L12) has 4 elements
Select a group: There are 542 residues with 8325 atoms in first group
There are 12 residues with 68 atoms in second group
Reading frame 22000 time 440000.000
GROMACS reminds you: "Chemistry: It tends to be a messy science." (Gunnar von Heijne, former chair of the Nobel Committee for chemistry)
Selected 1: 'Protein'
Selected 12: 'Other'
Number of distance-matrix elements for PCA trajectory: 6505
Number of distance-matrix coordinates in PCA trajectory: 2169
3. PCA of the reciprocal-distance-matrix trajectory
Now, we will use pca.xtc and pca_dummy.pdb generated in the above command as input files to GROMACS tool gmx covar to perform PCA. This step will calculate covariance matrix, eigenvectors and eigenvalues. By default, the eigenvectors are written in eigenvec.trr while eigenvalues are written in eigenval.xvg files.
-nofit, -nomwa and -nopbc options are used because xtc file does not contain cartesian-coordinates and these option has no meanings in the case of the distance-matrix trajectory.
[2]:
%%bash
echo 0 0 | gmx covar -f pca.xtc -s pca_dummy.pdb -nofit -nomwa -nopbc
:-) GROMACS - gmx covar, 2025.2 (-:
Executable: /opt/gromacs-2025/bin/gmx
Data prefix: /opt/gromacs-2025
Working dir: /home/raj/workspace/gmx_clusterByFeatrues/tutorials/protein-ligand-interaction
Command line:
gmx covar -f pca.xtc -s pca_dummy.pdb -nofit -nomwa -nopbc
Group 0 ( System) has 2169 elements
Group 1 ( Protein) has 2169 elements
Group 2 ( Protein-H) has 1102 elements
Group 3 ( C-alpha) has 140 elements
Group 4 ( Backbone) has 419 elements
Group 5 ( MainChain) has 558 elements
Group 6 ( MainChain+Cb) has 682 elements
Group 7 ( MainChain+H) has 683 elements
Group 8 ( SideChain) has 1486 elements
Group 9 ( SideChain-H) has 544 elements
Select a group: Calculating the average structure ...
Reading frame 22000 time 440000.000
Constructing covariance matrix (6507x6507) ...
Reading frame 22000 time 440000.000
Read 22501 frames
Trace of the covariance matrix: 140.004 (nm^2)
Diagonalizing ...
Sum of the eigenvalues: 140.004 (nm^2)
Writing eigenvalues to eigenval.xvg
Writing average structure & eigenvectors 1--6507 to eigenvec.trr
Wrote the log to covar.log
GROMACS reminds you: "Can I have everything louder than everything else?" (Deep Purple)
WARNING: Masses and atomic (Van der Waals) radii will be guessed
based on residue and atom names, since they could not be
definitively assigned from the information in your input
files. These guessed numbers might deviate from the mass
and radius of the atom type. Please check the output
files if necessary. Note, that this functionality may
be removed in a future GROMACS version. Please, consider
using another file format for your input.
Choose a group for the covariance analysis
Selected 0: 'System'
4. Projection of first five PCs on reciprocal-distance-matrix trajectory
We will use eigenvectors written in eigenvec.trr, pca.xtc and pca_dummy.pdb as input files to GROMACS tool gmx anaeig to calculate projection of first 5 eigenvectors on distance-matrix trajectory. These projections will be written into proj.xvg file.
[3]:
%%bash
echo 0 0 | gmx anaeig -f pca.xtc -s pca_dummy.pdb -first 1 -last 5 -proj
:-) GROMACS - gmx anaeig, 2025.2 (-:
Executable: /opt/gromacs-2025/bin/gmx
Data prefix: /opt/gromacs-2025
Working dir: /home/raj/workspace/gmx_clusterByFeatrues/tutorials/protein-ligand-interaction
Command line:
gmx anaeig -f pca.xtc -s pca_dummy.pdb -first 1 -last 5 -proj
trr version: GMX_trn_file (single precision)
Eigenvectors in eigenvec.trr were determined without fitting
Read non mass weighted average/minimum structure with 2169 atoms from eigenvec.trr
Read 6507 eigenvectors (for 2169 atoms)
WARNING: If there are molecules in the input trajectory file
that are broken across periodic boundaries, they
cannot be made whole (or treated as whole) without
you providing a run input file.
Group 0 ( System) has 2169 elements
Group 1 ( Protein) has 2169 elements
Group 2 ( Protein-H) has 1102 elements
Group 3 ( C-alpha) has 140 elements
Group 4 ( Backbone) has 419 elements
Group 5 ( MainChain) has 558 elements
Group 6 ( MainChain+Cb) has 682 elements
Group 7 ( MainChain+H) has 683 elements
Group 8 ( SideChain) has 1486 elements
Group 9 ( SideChain-H) has 544 elements
Select a group: 5 eigenvectors selected for output: 1 2 3 4 5
Reading frame 22000 time 440000.000
GROMACS reminds you: "Mathematics is a game played according to certain rules with meaningless marks on paper." (David Hilbert)
Select an index group of 2169 elements that corresponds to the eigenvectors
Selected 0: 'System'
5. Pre-Clustering scan for Cluster-Metrics
Before performing final clutering, we would first generate cluster-mterics that can be used to make a decision on the number of clusters. One of the drawback of K-Means clustering is that the number of clusters should be known beforehand. Although, gmx_clusterByFeatures implements several cluster metrics and also automatated way to decide
number of clusters, here, we first calculate cluster-metrics and final clustering can be performed later.
Following command will perform the clustering of conformations using first 5 PCs projection. Explanation of options are as follows:
-method kmeans: Use K-Means clustering algorithm-ncluster 15: K-Means clustering will be performed for 1 upto 15 clusters times each time. Finally, based on-ssrchangeoption, final number of clusters will be automatically selected.-cmetric ssr-sst: Use Elbow method to decide final number of clusters. It is not for final purpose.-nfeature 5: Take 5 feature fromfeat proj.xvginput file. Here it is projection of first 5 eigenvectors on the trajectory.-ssrchange 1.0: Threshold (percentage) of change in SSR/SST ratio in Elbow method to automatically decide the number of clusters.-g clusters_scan.log: Log file containing clustering information and cluster-mterics.
index group order
First index group - Output group of atoms in the central structures and clustered trajectories
Second index group - Group of atoms to calculate RMSD between central conformations of clusters as RMSD matrix, which is dumped in the log file with
-goption. Here, it is Ligand.Third index group - Used for Superimposition by least-square fitting. ONLY used in separate clustered trajectories to superimpose conformations on the central structure. Here, it is protein’s C-alpha atoms.
Content of the output ``-g clusters_scan.log`` file
It contains the command summary, and for each input cluster-numbers, number of frames in each clusters. At the end it dumps the Cluster Metrics Summary, which is important for deciding final number of clusters.
[4]:
%%bash
echo 0 12 26 | gmx_clusterByFeatures cluster -f inputs/trajectory.xtc -s inputs/complex_res_segments.tpr -n inputs/index.ndx \
-method kmeans -feat proj.xvg -nfeature 10 -cmetric ssr-sst -ncluster 15 -ssrchange 1.0 \
-g clusters_scan.log
:-) GROMACS - gmx_clusterByFeatures cluster, 2025.0-dev-20250210-6949615-local (-:
Executable: gmx_clusterByFeatures cluster
Data prefix: /project/external/gmx_installed
Working dir: /home/raj/workspace/gmx_clusterByFeatrues/tutorials/protein-ligand-interaction
Command line:
'gmx_clusterByFeatures cluster' -f inputs/trajectory.xtc -s inputs/complex_res_segments.tpr -n inputs/index.ndx -method kmeans -feat proj.xvg -nfeature 10 -cmetric ssr-sst -ncluster 15 -ssrchange 1.0 -g clusters_scan.log
:-) gmx_clusterByFeatures cluster (-:
Author: Rajendra Kumar
Copyright (C) 2018-2019 Rajendra Kumar
gmx_clusterByFeatures is a free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
gmx_clusterByFeatures is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with gmx_clusterByFeatures. If not, see <http://www.gnu.org/licenses/>.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Reading file inputs/complex_res_segments.tpr, VERSION 5.1.2 (single precision)
Reading file inputs/complex_res_segments.tpr, VERSION 5.1.2 (single precision)
Group 0 ( System) has 8393 elements
Group 1 ( Protein) has 8325 elements
Group 2 ( Protein-H) has 4217 elements
Group 3 ( C-alpha) has 542 elements
Group 4 ( Backbone) has 1626 elements
Group 5 ( MainChain) has 2169 elements
Group 6 ( MainChain+Cb) has 2660 elements
Group 7 ( MainChain+H) has 2668 elements
Group 8 ( SideChain) has 5657 elements
Group 9 ( SideChain-H) has 2048 elements
Group 10 ( Prot-Masses) has 8325 elements
Group 11 ( non-Protein) has 68 elements
Group 12 ( Other) has 68 elements
Group 13 ( L1) has 4 elements
Group 14 ( L2) has 3 elements
Group 15 ( L3) has 8 elements
Group 16 ( L4) has 8 elements
Group 17 ( L5) has 14 elements
Group 18 ( L6) has 2 elements
Group 19 ( L7) has 3 elements
Group 20 ( L8) has 1 elements
Group 21 ( L9) has 10 elements
Group 22 ( L10) has 10 elements
Group 23 ( L11) has 1 elements
Group 24 ( L12) has 4 elements
Group 25 ( Protein_Other) has 8393 elements
Group 26 (C-alpha_Other_&_!H*) has 574 elements
Select a group: Group 0 ( System) has 8393 elements
Group 1 ( Protein) has 8325 elements
Group 2 ( Protein-H) has 4217 elements
Group 3 ( C-alpha) has 542 elements
Group 4 ( Backbone) has 1626 elements
Group 5 ( MainChain) has 2169 elements
Group 6 ( MainChain+Cb) has 2660 elements
Group 7 ( MainChain+H) has 2668 elements
Group 8 ( SideChain) has 5657 elements
Group 9 ( SideChain-H) has 2048 elements
Group 10 ( Prot-Masses) has 8325 elements
Group 11 ( non-Protein) has 68 elements
Group 12 ( Other) has 68 elements
Group 13 ( L1) has 4 elements
Group 14 ( L2) has 3 elements
Group 15 ( L3) has 8 elements
Group 16 ( L4) has 8 elements
Group 17 ( L5) has 14 elements
Group 18 ( L6) has 2 elements
Group 19 ( L7) has 3 elements
Group 20 ( L8) has 1 elements
Group 21 ( L9) has 10 elements
Group 22 ( L10) has 10 elements
Group 23 ( L11) has 1 elements
Group 24 ( L12) has 4 elements
Group 25 ( Protein_Other) has 8393 elements
Group 26 (C-alpha_Other_&_!H*) has 574 elements
Select a group: Group 0 ( System) has 8393 elements
Group 1 ( Protein) has 8325 elements
Group 2 ( Protein-H) has 4217 elements
Group 3 ( C-alpha) has 542 elements
Group 4 ( Backbone) has 1626 elements
Group 5 ( MainChain) has 2169 elements
Group 6 ( MainChain+Cb) has 2660 elements
Group 7 ( MainChain+H) has 2668 elements
Group 8 ( SideChain) has 5657 elements
Group 9 ( SideChain-H) has 2048 elements
Group 10 ( Prot-Masses) has 8325 elements
Group 11 ( non-Protein) has 68 elements
Group 12 ( Other) has 68 elements
Group 13 ( L1) has 4 elements
Group 14 ( L2) has 3 elements
Group 15 ( L3) has 8 elements
Group 16 ( L4) has 8 elements
Group 17 ( L5) has 14 elements
Group 18 ( L6) has 2 elements
Group 19 ( L7) has 3 elements
Group 20 ( L8) has 1 elements
Group 21 ( L9) has 10 elements
Group 22 ( L10) has 10 elements
Group 23 ( L11) has 1 elements
Group 24 ( L12) has 4 elements
Group 25 ( Protein_Other) has 8393 elements
Group 26 (C-alpha_Other_&_!H*) has 574 elements
Reading frame 4 time 80.000
=======================
Cluster Log output
=======================
Command:
=======================
'gmx_clusterByFeatures cluster' -f inputs/trajectory.xtc -s inputs/complex_res_segments.tpr -n inputs/index.ndx -method kmeans -feat proj.xvg -nfeature 10 -cmetric ssr-sst -ncluster 15 -ssrchange 1.0 -g clusters_scan.log
=======================
Choose a group for the output:
Selected 0: 'System'
Choose a group for clustering/RMSD calculation:
Selected 12: 'Other'
Choose a group for fitting or superposition:
Selected 26: 'C-alpha_Other_&_!H*'
Input Trajectory dt = 20 ps
###########################################
########## NUMBER OF CLUSTERS : 1 ########
###########################################
===========================
Cluster-ID TotalFrames
1 22501
===========================
###########################################
########## NUMBER OF CLUSTERS : 2 ########
###########################################
===========================
Cluster-ID TotalFrames
1 15000
2 7501
===========================
###########################################
########## NUMBER OF CLUSTERS : 3 ########
###########################################
===========================
Cluster-ID TotalFrames
1 7510
2 7501
3 7490
===========================
###########################################
########## NUMBER OF CLUSTERS : 4 ########
###########################################
===========================
Cluster-ID TotalFrames
1 7501
2 7484
3 5309
4 2207
===========================
###########################################
########## NUMBER OF CLUSTERS : 5 ########
###########################################
===========================
Cluster-ID TotalFrames
1 7501
2 7485
3 3872
4 2205
5 1438
===========================
###########################################
########## NUMBER OF CLUSTERS : 6 ########
###########################################
===========================
Cluster-ID TotalFrames
1 7485
2 5829
3 3872
4 2205
5 1672
6 1438
===========================
###########################################
########## NUMBER OF CLUSTERS : 7 ########
###########################################
===========================
Cluster-ID TotalFrames
1 5829
2 5075
3 3872
4 2415
5 2200
6 1672
7 1438
===========================
###########################################
########## NUMBER OF CLUSTERS : 8 ########
###########################################
===========================
Cluster-ID TotalFrames
1 5830
2 5082
3 2408
4 2386
5 2199
6 1671
7 1491
8 1434
===========================
###########################################
########## NUMBER OF CLUSTERS : 9 ########
###########################################
===========================
Cluster-ID TotalFrames
1 5083
2 4605
3 2407
4 2398
5 2199
6 1552
7 1479
8 1434
9 1344
===========================
###########################################
########## NUMBER OF CLUSTERS : 10 ########
###########################################
===========================
Cluster-ID TotalFrames
1 4900
2 4615
3 2387
4 2195
5 2188
6 1542
7 1489
8 1434
9 1344
10 407
===========================
###########################################
########## NUMBER OF CLUSTERS : 11 ########
###########################################
===========================
Cluster-ID TotalFrames
1 4692
2 4290
3 2387
4 2188
5 2177
6 1489
7 1473
8 1434
9 1336
10 803
11 232
===========================
###########################################
########## NUMBER OF CLUSTERS : 12 ########
###########################################
===========================
Cluster-ID TotalFrames
1 4692
2 4290
3 2189
4 2177
5 1682
6 1473
7 1434
8 1336
9 1118
10 1075
11 803
12 232
===========================
###########################################
########## NUMBER OF CLUSTERS : 13 ########
###########################################
===========================
Cluster-ID TotalFrames
1 3029
2 2944
3 2332
4 2271
5 2189
6 1866
7 1682
8 1434
9 1250
10 1118
11 1075
12 951
13 360
===========================
###########################################
########## NUMBER OF CLUSTERS : 14 ########
###########################################
===========================
Cluster-ID TotalFrames
1 4290
2 2969
3 2357
4 2188
5 2177
6 1432
7 1384
8 1240
9 1092
10 1082
11 935
12 803
13 320
14 232
===========================
###########################################
########## NUMBER OF CLUSTERS : 15 ########
###########################################
===========================
Cluster-ID TotalFrames
1 4339
2 3051
3 2367
4 2248
5 2143
6 1394
7 1250
8 1153
9 952
10 924
11 790
12 787
13 537
14 333
15 233
===========================
===========================================================================================================
Cluster Metrics Summary
-----------------------------------------------------------------------------------------------------------
Clusters SSR/SST D(SSR/SST) (Psuedo)F-stat. Silhouette-score Davies-bouldin-score
2 50.45 50.455 22911.836 0.530 0.729
3 84.67 34.215 62129.371 0.689 0.503
4 89.04 4.374 60944.555 0.652 0.591
5 92.08 3.039 65406.023 0.701 0.477
6 94.15 2.069 72417.812 0.644 0.538
7 95.13 0.974 73154.625 0.533 0.738
8 95.52 0.395 68511.797 0.487 0.908
9 95.76 0.237 63453.414 0.407 1.049
10 96.10 0.345 61606.973 0.399 1.024
11 96.28 0.182 58261.570 0.374 1.043
12 96.48 0.201 56103.879 0.371 1.070
13 96.64 0.152 53840.742 0.327 1.200
14 96.80 0.165 52342.699 0.351 1.065
15 96.91 0.111 50402.191 0.302 1.157
===========================================================================================================
#####################################
Final number of cluster selected: 6
#####################################
Calculating central structure for cluster-6 ...
Reading frame 5 time 188780.000
GROMACS reminds you: "No, no, you're not thinking, you're just being logical." (Niels Bohr)
===========================================
Cluster-ID Central Frame Total Frames
1 18119 7485
2 3515 5829
3 10774 3872
4 12874 2205
5 833 1672
6 9439 1438
===========================================
Extracting coordinates of the central structure...
Calculating RMSD between central structures...
=====================================
Central structurs - RMSD matrix
=====================================
c1 c2 c3 c4 c5 c6
0.000 0.414 0.512 0.479 0.458 0.536
0.414 0.000 0.605 0.624 0.214 0.600
0.512 0.605 0.000 0.253 0.587 0.221
0.479 0.624 0.253 0.000 0.587 0.266
0.458 0.214 0.587 0.587 0.000 0.583
0.536 0.600 0.221 0.266 0.583 0.000
=====================================
6. Clustering with pre-determined number of clusters
As can be seen above in Cluster-Metrics table obtained in clusters_scan.log file, best scores (both Silhouette and Davies-Bouldin) were obtained for 5 clusters. Therefore, now we will do final clustering with 5 clusters and extract trajectories of these clusters.
Explanation of options are as follows:
-method kmeans: Use K-Means clustering algorithm-ncluster 5: Five clusters will be generated using K-Means clustering.-cmetric prior: No cluster-mterics is used, number of clusters is already known.-nfeature 5: Take 5 feature fromfeat proj.xvginput file. Here it is projection of first 5 eigenvectors on the trajectory.cluster-5: Cluster log file-cpdb clustered_trajs/central.pdb: Central structures of each cluster as PDB file-fout clustered_trajs/cluster.xtc: Trajectory file of each cluster-outframe 1000: Only first 1000 frame after sorting to be written in cluster trajectory file-sort features: Sort the conformation in each cluster trajectory file based on the distance between central structure and current conformation in feature sub-space.
This command could take a long time to execute!
This command could take a long time to execute because it is writing output trajectory file for each cluster sorted by distance in feature-space. Therefore, it needs to read input trajectory back-and-forth many time to extract the conformations in sorted manner. XTC format is fast for back-and-forth reading, and it still could take long time to dump the output trajectories.
[5]:
%%bash
mkdir clustered_trajs
echo 0 12 26 | gmx_clusterByFeatures cluster -f inputs/trajectory.xtc -s inputs/complex_res_segments.tpr -n inputs/index.ndx \
-feat proj.xvg -nfeature 5 -method kmeans -cmetric prior -ncluster 5 -g cluster-5 \
-cpdb clustered_trajs/central.pdb -fout clustered_trajs/cluster.xtc -outframe 1000 \
-plot pca-cluster-5.png -sort features
:-) GROMACS - gmx_clusterByFeatures cluster, 2025.0-dev-20250210-6949615-local (-:
Executable: gmx_clusterByFeatures cluster
Data prefix: /project/external/gmx_installed
Working dir: /home/raj/workspace/gmx_clusterByFeatrues/tutorials/protein-ligand-interaction
Command line:
'gmx_clusterByFeatures cluster' -f inputs/trajectory.xtc -s inputs/complex_res_segments.tpr -n inputs/index.ndx -feat proj.xvg -nfeature 5 -method kmeans -cmetric prior -ncluster 5 -g cluster-5 -cpdb clustered_trajs/central.pdb -fout clustered_trajs/cluster.xtc -outframe 1000 -plot pca-cluster-5.png -sort features
:-) gmx_clusterByFeatures cluster (-:
Author: Rajendra Kumar
Copyright (C) 2018-2019 Rajendra Kumar
gmx_clusterByFeatures is a free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
gmx_clusterByFeatures is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with gmx_clusterByFeatures. If not, see <http://www.gnu.org/licenses/>.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Reading file inputs/complex_res_segments.tpr, VERSION 5.1.2 (single precision)
Reading file inputs/complex_res_segments.tpr, VERSION 5.1.2 (single precision)
Group 0 ( System) has 8393 elements
Group 1 ( Protein) has 8325 elements
Group 2 ( Protein-H) has 4217 elements
Group 3 ( C-alpha) has 542 elements
Group 4 ( Backbone) has 1626 elements
Group 5 ( MainChain) has 2169 elements
Group 6 ( MainChain+Cb) has 2660 elements
Group 7 ( MainChain+H) has 2668 elements
Group 8 ( SideChain) has 5657 elements
Group 9 ( SideChain-H) has 2048 elements
Group 10 ( Prot-Masses) has 8325 elements
Group 11 ( non-Protein) has 68 elements
Group 12 ( Other) has 68 elements
Group 13 ( L1) has 4 elements
Group 14 ( L2) has 3 elements
Group 15 ( L3) has 8 elements
Group 16 ( L4) has 8 elements
Group 17 ( L5) has 14 elements
Group 18 ( L6) has 2 elements
Group 19 ( L7) has 3 elements
Group 20 ( L8) has 1 elements
Group 21 ( L9) has 10 elements
Group 22 ( L10) has 10 elements
Group 23 ( L11) has 1 elements
Group 24 ( L12) has 4 elements
Group 25 ( Protein_Other) has 8393 elements
Group 26 (C-alpha_Other_&_!H*) has 574 elements
Select a group: Group 0 ( System) has 8393 elements
Group 1 ( Protein) has 8325 elements
Group 2 ( Protein-H) has 4217 elements
Group 3 ( C-alpha) has 542 elements
Group 4 ( Backbone) has 1626 elements
Group 5 ( MainChain) has 2169 elements
Group 6 ( MainChain+Cb) has 2660 elements
Group 7 ( MainChain+H) has 2668 elements
Group 8 ( SideChain) has 5657 elements
Group 9 ( SideChain-H) has 2048 elements
Group 10 ( Prot-Masses) has 8325 elements
Group 11 ( non-Protein) has 68 elements
Group 12 ( Other) has 68 elements
Group 13 ( L1) has 4 elements
Group 14 ( L2) has 3 elements
Group 15 ( L3) has 8 elements
Group 16 ( L4) has 8 elements
Group 17 ( L5) has 14 elements
Group 18 ( L6) has 2 elements
Group 19 ( L7) has 3 elements
Group 20 ( L8) has 1 elements
Group 21 ( L9) has 10 elements
Group 22 ( L10) has 10 elements
Group 23 ( L11) has 1 elements
Group 24 ( L12) has 4 elements
Group 25 ( Protein_Other) has 8393 elements
Group 26 (C-alpha_Other_&_!H*) has 574 elements
Select a group: Group 0 ( System) has 8393 elements
Group 1 ( Protein) has 8325 elements
Group 2 ( Protein-H) has 4217 elements
Group 3 ( C-alpha) has 542 elements
Group 4 ( Backbone) has 1626 elements
Group 5 ( MainChain) has 2169 elements
Group 6 ( MainChain+Cb) has 2660 elements
Group 7 ( MainChain+H) has 2668 elements
Group 8 ( SideChain) has 5657 elements
Group 9 ( SideChain-H) has 2048 elements
Group 10 ( Prot-Masses) has 8325 elements
Group 11 ( non-Protein) has 68 elements
Group 12 ( Other) has 68 elements
Group 13 ( L1) has 4 elements
Group 14 ( L2) has 3 elements
Group 15 ( L3) has 8 elements
Group 16 ( L4) has 8 elements
Group 17 ( L5) has 14 elements
Group 18 ( L6) has 2 elements
Group 19 ( L7) has 3 elements
Group 20 ( L8) has 1 elements
Group 21 ( L9) has 10 elements
Group 22 ( L10) has 10 elements
Group 23 ( L11) has 1 elements
Group 24 ( L12) has 4 elements
Group 25 ( Protein_Other) has 8393 elements
Group 26 (C-alpha_Other_&_!H*) has 574 elements
Reading frame 4 time 80.000
=======================
Cluster Log output
=======================
Command:
=======================
'gmx_clusterByFeatures cluster' -f inputs/trajectory.xtc -s inputs/complex_res_segments.tpr -n inputs/index.ndx -feat proj.xvg -nfeature 5 -method kmeans -cmetric prior -ncluster 5 -g cluster-5 -cpdb clustered_trajs/central.pdb -fout clustered_trajs/cluster.xtc -outframe 1000 -plot pca-cluster-5.png -sort features
=======================
Choose a group for the output:
Selected 0: 'System'
Choose a group for clustering/RMSD calculation:
Selected 12: 'Other'
Choose a group for fitting or superposition:
Selected 26: 'C-alpha_Other_&_!H*'
Input Trajectory dt = 20 ps
###########################################
########## NUMBER OF CLUSTERS : 5 ########
###########################################
===========================
Cluster-ID TotalFrames
1 7501
2 7485
3 3872
4 2205
5 1438
===========================
#####################################
Final number of cluster selected: 5
#####################################
Calculating central structure for cluster-5 ...
Reading frame 4 time 188780.000 <string>:127: MatplotlibDeprecationWarning: The non_interactive_bk attribute was deprecated in Matplotlib 3.9 and will be removed in 3.11. Use ``matplotlib.backends.backend_registry.list_builtin(matplotlib.backends.BackendFilter.NON_INTERACTIVE)`` instead.
Back Off! I just backed up clid.xvg to ./#clid.xvg.1#
Reading frame 4000 time 188780.000
GROMACS reminds you: "Those people who think they know everything are a great annoyance to those of us who do." (Isaac Asimov)
===========================================
Cluster-ID Central Frame Total Frames
1 2786 7501
2 18119 7485
3 10774 3872
4 12874 2205
5 9439 1438
===========================================
Extracting coordinates of the central structure...
Calculating RMSD between central structures...
=====================================
Central structurs - RMSD matrix
=====================================
c1 c2 c3 c4 c5
0.000 0.418 0.603 0.619 0.599
0.418 0.000 0.512 0.479 0.536
0.603 0.512 0.000 0.253 0.221
0.619 0.479 0.253 0.000 0.266
0.599 0.536 0.221 0.266 0.000
=====================================
Writing central structure to pdb-files...
Writing trajectory for each cluster...
7. Analysis
Now, we will perform following analysis on obtained clusters:
Comparison of RMSDs within and between the clusters: It will show the qualitative measure of separation of conformations of clusters in term of RMSD.
Plotting PC vs PC cluster-wise. In fact, this is already plotted in the above obtained
pca-cluster-5.pngfile. However, we will focus on first three PCs to demonstrate the distribution of conformation in PC space.Cluster-ID with time: We will plot cluster-id as a function of time to analyze, how conformation is changing between clusters as a function of time.
At first, we will load Python modules and define some functions as follows:
[2]:
import re
import sys
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import csv
[3]:
def read_xvg(filename):
''' Read any XVG file and return the data as 2D array where data is row-wise with respect to time.
'''
fin = open(filename, 'r')
data = []
for line in fin:
line = line.rstrip().lstrip()
if not line:
continue
if re.search('^#|^@', line) is not None:
continue
temp = re.split('\s+', line)
data.append(list(map(float, temp)))
data = np.asarray(data)
return data.T
1a. Calculation of RMSDs within and between the clusters
At first, we need to calculate RMSDs of complex within and between the clusters using gmx rms command as follows.
Note: The complex structure is already superimposed when separated cluster-trajectory were written in the previous step, therefore, we are not performing fitting in RMSD calculations below,
Note: Remove %%capture --no-stdout and %%capture --no-stderr to populate all the output generated from gmx rms commands.
[8]:
%%capture --no-stdout
%%capture --no-stderr
%%script bash
# make directory for rmsd files
mkdir clustered-rmsd
for i in `seq 1 5`
do
for j in `seq 1 5`
do
echo 26 | gmx rms -f clustered_trajs/cluster_c${j}.xtc -s clustered_trajs/central_c${i}.pdb -n inputs/index.ndx -o clustered-rmsd/c${i}_c${j} -nopbc -fit none
done
done
1b. Comparison of ligand RMSDs within and between the clusters
We will use Python to plot all the obtained RMSDs above. Conformations in cluster trajectory is sorted by distance in feature-space and therefore, RMSD will be randomly fluctuate. RMSDs within the same cluster is expected to be lowest.
[9]:
mpl.rcParams['font.size']=12
fig = plt.figure(figsize=(12,16))
fig.subplots_adjust(top=0.98, bottom=0.05, hspace=0.25)
for i in range(1,6):
ax = fig.add_subplot(6, 2, i)
for j in range(1,6):
filename = 'clustered-rmsd/c{0}_c{1}.xvg'.format(i, j)
label = "{0} vs {1}".format(i, j)
data = read_xvg(filename) # read file
if i==j:
ax.scatter(data[0]/1000, data[1], label=label, lw=0, s=2)
else:
ax.scatter(data[0]/1000, data[1], label=label, lw=0, s=0.5)
ax.set_ylim(0, 0.5)
ax.set_xlabel('Time (ns)')
ax.set_ylabel('RMSD (nm)')
plt.legend(loc='upper center', ncol=4, markerscale=10, borderaxespad=0.1, columnspacing=1, handlelength=1, handletextpad=0.4)
plt.savefig('rmsd-comparison.png', dpi=300)
plt.show()
3. Plotting PC vs PC cluster-wise
It will be done in two steps:
An input file will be prepared containing information about feature searial and their labels.
gmx_clusterByFeatures featuresplotwill be used to generate the plot.
[1]:
%%bash
# First step - preparation of input file
echo "1,2,PC-1,PC-2" > features-label.txt
echo "2,3,PC-2,PC-3" >> features-label.txt
echo "1,3,PC-1,PC-3" >> features-label.txt
echo "1,4,PC-1,PC-4" >> features-label.txt
cat features-label.txt
# Second step - plotting
gmx_clusterByFeatures featuresplot -i features-label.txt -feat proj.xvg -clid clid.xvg -lcols 6 -o protein-ligand-interaction-PCs-vs-PCs.png
gmx_clusterByFeatures featuresplot -i features-label.txt -feat proj.xvg -clog cluster-5.log -hist -bins 20 -lcols 6 -o protein-ligand-interaction-PCs-vs-PCs-hist.png
1,2,PC-1,PC-2
2,3,PC-2,PC-3
1,3,PC-1,PC-3
1,4,PC-1,PC-4
Conformations are highlighted in the feature sub-space:

Histogram of conformations with central structures in the feature sub-space:

3. Cluster-ID with time
We will use clid.xvg obtained from the cluster subcommand to plot both cluster-id and also highlight the occurance of the given cluster.
[11]:
mpl.rcParams['font.size']=12
data=read_xvg('clid.xvg')
fig = plt.figure(figsize=(8,6))
fig.subplots_adjust(right=0.85)
ax = fig.add_subplot(111)
ax.scatter(data[0]/1000, data[1], marker='x', color='k')
ax.set_ylabel('Cluster')
ax.set_xlabel('Time (ns)')
plt.savefig('clid.png', dpi=300)
plt.show()
8. MM/PBSA Analysis
There are advantages of performing MM/PBSA after the clustering:
Speed-up in MM/PBSA calculations - In this particular case, if we have used whole trajectory of 450 ns with a time-difference of 100ps, a total of 4500 frames will be used for the calcualtions. However, if we restrict MM/PBSA calculations to central structure and near-by conformations in feature-space of each cluster, we could use only few frames for MM/PBSA calculations. For example, if we use only 10 frames per cluster, in total only 50 frames will be considered for the calculations. Therefore, we will gain 90 times speed-up without losing the accuracy in the final average binding energy (shown below).
By calculating residues contribution to Binding Energy, we could easily discriminate residues that are forming interactions between diffrent clusters with their interaction strength as energy value.
1. MM/PBSA calculations
Now, we will perform MM/PBSA calculations using g_mmpbsa. This will highlight binding energy difference between clusters and also difference in interacting residues that are contributing to the binding.
At first, MM/PBSA calculation is performed as follows. For each cluster, only 10 frames will be used with a virtual time-diffrence of 100 ps for the MM/PBSA calculations. This setup starts calculation with central structure and slowly moves away from it in feature-space and picks 9 more structure from the same cluster.
Options for MM/PBSA calculations
-dt 100- consider frame at 100ps time-difference-e 1000- consider frames upto 1000ps-unit1 1- First atoms-group for binding energy - here it is protein-unit2 12- Second atoms-group for binding energy - here it is ligand-pdie 1- Dielectric constant during MM Electrostatic energy calculation-ddc- Enable distance-depenedent dielectric constant during MM Electrostatic calculations. It means that farther the atoms from each other, larger the decrease in electrostatic interaction. It particulalry reduces the interaction energy of charged ligand with charged protein-residues that are far apart in the complex, thereby highlighting contribution of charged residues that are only nearby to the ligand. Therefore, it is particulalry useful for protein and charges ligand where residues contribution to binding need to be studied. It does not affect MM van der Waals interaction. This option should not be used when comparing with experimental binding energy or for final binding energy as the results are not yet validated.-mmeEnable MM calculations-pbsaEnable PBSA calcualtions-decompEnable binding energy decomposition over residues to calculate residues contribution to binding energy-silentDo not display output from APBS
This command could take a long time to execute!
Also following command will utilize all the CPU cores to speed-up the calculations
Note: Remove %%capture --no-stdout and %%capture --no-stderr to populate all the output generated from gmx rms commands.
[1]:
%%capture --no-stderr
%%capture --no-stdout
%%script bash
mkdir mmpbsa
for i in `seq 1 5`
do
g_mmpbsa run -s inputs/complex_res_segments.tpr -f clustered_trajs/cluster_c${i}.xtc -n inputs/index.ndx \
-i inputs/pbsa.mdp -mm mmpbsa/energy_MM_c${i}.xvg -pol mmpbsa/polar_c${i}.xvg \
-apol mmpbsa/apolar_c${i}.xvg -mmcon mmpbsa/residues_MM_c${i}.dat \
-pcon mmpbsa/residues_polar_c${i}.dat -apcon mmpbsa/residues_apolar_c${i}.dat \
-o mmpbsa/binding_energy_c${i}.xvg -os mmpbsa/energy_summary_c${i}.csv \
-ores mmpbsa/residues_energy_summary_c${i}.csv -opdb mmpbsa/energy_c${i}.pdb \
-dt 100 -e 1000 -unit1 1 -unit2 12 -pdie 1 -ddc -mme -pbsa -decomp -silent
done
2. Extract and plot binding-energy cluster-wise
We will extract the binding energy from the output CSV file and plot the binding energy for all the clusters. We will also calculate weighted average binding energy (using cluster population) representing final binding energy from full trajectory.
For comparison, we also plot binding energy obtained from full trajectory with a time-difference of 500ps (900 frames). The output files of these calculations are provided in mmpbsa-full folder.
[51]:
# Population fraction of each cluster, manually calculated from cluster log file
population_weights = [0.333362962, 0.332651882, 0.172081241, 0.097995645, 0.063908271]
# First we extract both average binding energy and its standard deviation for each cluster from respective CSV file
average, stddev = [], []
for i in range(1, 6):
with open(f'mmpbsa/energy_summary_c{i}.csv') as csvfile:
reader = csv.DictReader(csvfile, quotechar='"')
for row in reader:
if row['Energy'] == 'Total':
average.append(float(row['Average']))
stddev.append(float(row['Standard-Deviation']))
# Now calculate weighted avearge and combined standard-deviation
weighted_average = sum([av*wt for av, wt in zip(average, population_weights)])
combined_stddev = 0
for sd, wt in zip(stddev, population_weights): # here we combine SD one-by-one (https://www.statstodo.com/CombineMeansSDs.php)
if combined_stddev == 0:
combined_stddev = sd
else:
combined_stddev = (combined_stddev**2 + sd**2)**0.5
# Plotting the results
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.bar(np.arange(2, 7), average, yerr = stddev, error_kw={'elinewidth':5})
ax.bar([0, 1], [-132.386, weighted_average], yerr = [45, combined_stddev], error_kw={'elinewidth':5}) # plot energy from full trajectory and weighted average energy from clusters
ax.set_ylabel('Energy (kJ/mol)')
ax.set_xticks(np.arange(0, 7))
ax.set_xticklabels(['Full\nTrajectory', 'Weighted\nAverage\nClusters'] + [f'Cluster-{cid}' for cid in list(np.arange(1, 6))])
ax.tick_params(bottom=False, labelbottom=False, top=True, labeltop=True)
plt.savefig('mmpbsa-binding-energy.png', dpi=300)
plt.show()
3. Extract and plot protein residues contribution to the binding
Here, we extracts residues contribution towards binding that have at least -4 kJ/Mol of energy. This analysis highights the differences in interaction residue-wise between the clusters.
[31]:
# Population fraction of each cluster, manually calculated from cluster log file
population_weights = [0.333362962, 0.332651882, 0.172081241, 0.097995645, 0.063908271]
# Read CSV files and extract the energy
residues, average_clusters, stddev_clusters = [], [], []
for i in range(1, 6):
average, stddev = [], []
with open(f'mmpbsa/residues_energy_summary_c{i}.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
average.append(float(row['total']))
stddev.append(float(row['total-stddev']))
if i == 1:
residues.append(row['Residue'])
average_clusters.append(average)
stddev_clusters.append(stddev)
# Calculate weighted-average and add data as first column
weighted_average = np.average(average_clusters, axis=0, weights=population_weights)
average_clusters = [weighted_average, *average_clusters]
average_clusters = np.array(average_clusters)
# First, only consider protein residues, and
# filter out residues that have equal/less than -4 kJ/mol energy in at least one cluster
average_clusters_only_protein = average_clusters[1:,:-12] # there are 12 ligand residues, so discarded last 12
min_values_by_clusters = np.amin(average_clusters_only_protein, axis=0) # Extract minimum energy values among all clusters
indices_residues_energy_threshold = np.nonzero(min_values_by_clusters < -4) # Determine the indices of residues that are under the thershold value
filtered_residues = np.asarray(residues)[indices_residues_energy_threshold] # Extract the residues that are under the thershold value
## Now plot the energies (adapted from here: https://matplotlib.org/stable/gallery/lines_bars_and_markers/barchart.html#sphx-glr-gallery-lines-bars-and-markers-barchart-py)
x = np.arange(len(filtered_residues)) # the label locations
width = 0.1 # the width of the bars
multiplier = 0
fig = plt.figure(figsize=(12,6))
ax = fig.add_subplot(111)
ax.grid()
for values in average_clusters[:,indices_residues_energy_threshold[0]]:
offset = width * multiplier
if multiplier == 0:
label = 'All'
else:
label = f'C{multiplier}'
rects = ax.bar(x + offset, values, width, label=label, zorder=10)
multiplier += 1
ax.set_ylabel('Energy (kJ/mol)')
ax.tick_params(bottom=False, labelbottom=False, top=True, labeltop=True)
ax.set_xticks(x + 0.25, filtered_residues, rotation=90)
ax.legend(loc='lower right', ncols=6)
plt.savefig('mmpbsa-residues-binding-energy.png', dpi=300)
plt.show()
4. Extract and plot ligand residues contribution to the binding
We have divided ligand as collection of virtual-residues. It means, their contribution towards binding energy is also calculated automatically during MM/PBSA calculation. Here, we extracts ligand residues contribution towards binding. This analysis highights the differences in interaction residue-wise between the clusters. This also highlights which part of the ligand is favorable or unfavorable for the binding.
[34]:
average_clusters_only_ligand = average_clusters[:,-12:]
filtered_residues = np.asarray(residues)[-12:]
x = np.arange(len(filtered_residues)) # the label locations
width = 0.1 # the width of the bars
multiplier = 0
fig = plt.figure(figsize=(12,6))
ax = fig.add_subplot(111)
ax.grid()
for values in average_clusters_only_ligand:
offset = width * multiplier
if multiplier == 0:
label = 'All'
else:
label = f'C{multiplier}'
rects = ax.bar(x + offset, values, width, label=label)
multiplier += 1
ax.set_ylabel('Energy (kJ/mol)')
ax.tick_params(bottom=False, labelbottom=False, top=True, labeltop=True)
ax.set_xticks(x + 0.25, filtered_residues, rotation=90)
ax.legend(loc='lower right', ncols=6)
plt.savefig('mmpbsa-ligand-residues-binding-energy.png', dpi=300)
plt.show()