JavaScript Menu, DHTML Menu Powered By Milonic

Molecular Modeling Practical

This tutorial introduces the student to the practice of Molecular Dynamics (MD) simulations of proteins. The protocol used is a suitable starting point for investigation of proteins, provided that the system does not contain non-standard groups. At the end of the tutorial, the student should know the steps involved in setting up and running a simulation, including some reflection on the choices made at different stages. Besides, the student should know how to perform quality assurance checks on the simulation results and have a feel for methods of analysis to retrieve information.

Structural analysis: properties derived from configurations

When asked for a selection choose "Protein" if no selection is specifically stated or does not follow logically from the text.

To get rid off the noise, please use the 'Running Average' method in 'Data->Transformation' to smooth your graphs with xmgrace

Having assured that the simulation has converged to an equilibrium state, it is time to do some real analysis. Analysis of simulation data can be divided into several categories. The first category comprises of the interpretation of single configurations according to some function to obtain a value, or a number of values, for each time point. The RMSD and the radius of gyration are examples. Such properties can be called configurational, dependent or instantaneous properties. Next to that, one can analyze processes in the time domain, e.g. through averaging, as (auto)correlations or fluctutations. In this section, a number of common analyses will be performed, each of which yields a time series of values directly derived from the trajectory (the coordinates over time). The questions may refer to the screen output from the program or to graphs.

Solvent accessible surface area

One property which can be of interest is the surface area of the protein which is accessible to solvent, commonly referred to as the solvent accessible surface (SAS) or the solvent accessible surface area (SASA). This can be further divided into a hydrophilic SAS and a hydrophobic SAS. In addition, the SAS can be used together with some empirical parameters to obtain an estimate for the free energy of solvation. All four of these parameters are calculated by the program g_sas. This program also allows to calculate the average SAS over time per residue and/or per atom. Issue the following command, specifying Protein both for the group to calculate the SAS for and for the output group, and have a look at the output files.

g_sas -f traj_nojump.xtc -s topol.tpr -o solvent-accessible-surface.xvg -oa atomic-sas.xvg -or residue-sas.xvg

==Q== Focusing on the loop1 (residues 53-62), which residues are the most accessible to the solvent?

Hydrogen bonds

Another property which can be informative is the number of hydrogen bonds, either internally (Protein - Protein) or between the protein and the surrounding solvent. The presence or not of a hydrogen bond is inferred from the distance between a donor-H - acceptor pair and the donor - H - acceptor angle. To calculate the hydrogen bonds issue the following commands, and have a look at the output files using xmgrace:

echo 1 1 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-intra-protein.xvg

echo 1 12 | g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-protein-water.xvg

==Q== Discuss the relation between the number of hydrogen bonds for both cases and the fluctuations in each.

Specific hydrogen bonds can be investigated using an index file that contains the numbers of the atoms to be included. The plot of the RMSF obtained during the first part of the analysis and the b-factors showed high values for the loops 2 and 3 (around the helix 2). The information we got from experiments shows also that the loop 1 may play a role in the different behaviour displayed by UbcH6 and UbcH8. Have a look at the hydrogen bonding involing these loops. The first command will create 3 new entries in the menu for the groups selection, one per loop. The second command is a generic command to run g_hbond with an index file. You may want to look at the hydrogen bonds within each loop, at the hydrogen bonds between loop1 and loop2 for example, at the hydrogen bonds between a specific loop and the rest of the protein (you will need to modify the index file for this purpose, ask for some help) or with the water, etc ...

echo "r 53-62\nname 16 loop1\nr 87-94\nname 17 loop2\nr 110-117\nname 18 loop3\nq" | make_ndx -f confout.gro -o my_index.ndx

g_hbond -f traj_nojump.xtc -s topol.tpr -num hydrogen-bonds-loop.xvg -n my_index.ndx

==Q== What can you say about the stability of the hydrogen bonds for these loop regions?

==Q== On the hydrogen bonding basis, which loop is the more, respectively less, stable? Did you detect H-bonding between loops?

Salt bridges

Besides hydrogen bonds, proteins also often form salt bridges between oppositely charged residues. These can have an important stabilizing effect on the structure of the protein, in particular when they are embedded in a hydrophobic environment, such as the core of the protein. But saltbridges can also be observed at the exposed surface of the proteins, often important then to mediate recognition processes. The existence of salt bridges can be investigated with g_saltbr. When requested (through the flag -sep) this program will generate an output file for every pair of oppositely charged residues which at some point in the trajectory are within a certain cut off distance from each other (here 0.5 nm, specified through the option -t). This gives a large number of files, so it is best to make a separate directory for running this analysis. Execute the following commands:

mkdir saltbridge

cd saltbridge

g_saltbr -f ../traj_nojump.xtc -s ../topol.tpr -t 0.5 -sep

Just to make things a bit clearer, remove the files involving sodium and chloride ions:

rm -f *CL-* *NA+* \#*

Now come back to your working directory

cd ../

Secondary structure

Among the most common parameters to judge protein structure is the assignment of secondary structure elements, such as α-helices and β-sheets. One such assignment is provided by dssp. Follow the instructions on how to load it for your session.

export DSSP=/home/p162691/PROGRAMS/dssp

Gromacs provides an interface to dssp, to allow the calculation of secondary structure for each frame of a trajectory.

do_dssp -f traj_nojump.xtc -s topol.tpr -o secondary-structure.xpm -sc secondary-structure.xvg -dt 10

The file secondary-structure.xvg contains a time series, listing the numbers of residues associated with each type of secondary structure per frame. More detailed information is in the .xpm file, which gives a color coded assignment of the secondary structure per residue over time. The .xpm file can be viewed with e.g. the Gimp, but some useful metadata can be added with the gromacs tool xpm2ps and the result can be viewed with gview or a similar program.

xpm2ps -f secondary-structure.xpm -o secondary-structure.eps

gv secondary-structure.eps

==Q== Discuss some of the changes in the secondary structure, if any.

==Q== Compare the stability of the secondary structures in the different proteins.

Ramachandran (phi/psi) plots

Phi, psi and omega angles, and the ramachandran plot

The phi and psi torsion angles of the protein backbone are two parameters that are useful to gain insight into the structural properties of a protein. The plot of phi against psi is called a Ramachandran plot and certain regions of this plot are characteristic of secondary structure elements or of amino acids. Other regions are considered forbidden (inaccessible). Projection of the phi/psi angles over time may give insight in structural transitions. These angles can be calculated by the program g_rama, albeit in a bit of a crude way, namely put all together in a single file. To investigate single residues, the linux progam 'grep' can be used to select them out of the plot.

g_rama -f traj_nojump.xtc -s topol.tpr -o ramachandran.xvg

This file contains all phi/psi angles for all residues. To extract the angles for a single residue, e.g. LYS-60, type:

grep LYSH-60 ramachandran.xvg | awk '{print $1 " " $2'} > rama-LYSH-60.xvg

Do this to extract the following ramachandran plots, and view them using xmgrace. In the structures deposited in the PDB, residues 82-86 were classified helical. Check if it is still true for your simulation?

==Q== What can you say about the conformation of these residues, based on the ramachandran plots (see the plot given above)?

Analysis of dynamics and time-averaged properties

Root mean square deviations again

The root mean square deviation (RMSD) was already calculated to check the convergence of the simulation. But it can also be used for further analysis. The RMSD is a measure relating two structures. If we calculate the RMSD for each combination of structures in a trajectory file, then we can see if there are groups of structures belonging together, sharing structural features. Such structures will have lower RMSD values within the group and higher RMSD values with other structures. Representing the RMSD values in a matrix, this also allows identifying transitions.

To build an RMSD matrix, g_rms is called with two trajectories. In our case, we will first use the same trajectory for both. We will only take every second frame in the trajectory file. Select the group "Mainchain+Cb".

==Q== What is interesting by choosing the group "Mainchain+Cb" for this analysis?

g_rms -s topol.tpr -f traj_nojump.xtc -f2 traj_nojump.xtc -m rmsd-matrix.xpm -dt 10

The resulting matrix is in grayscale. To make it more clear, we are going to apply a rainbow gradient.

xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps -rainbow blue

gv rmsd-matrix.eps

==Q== How many transitions do you see?

It is also really interesting to compare your trajectory with the ones of your colleagues.

Cluster analysis

On the basis of the distances between structures, the RMSD, the these can also be assigned to a set of clusters that reflects the range of conformations accessible and the relative weight of each of these. This can be done using a clustering algorithm, of which several are implemented in g_cluster. This program generates a number of output files. Check the programs help file to see what each means. Then run the program. Note that we already calculated the RMSD matrix and can use it as input to g_cluster.

g_cluster -h

echo 6 6 | g_cluster -s topol.tpr -f traj_nojump.xtc -dm rmsd-matrix.xpm -dist rmsd-distribution.xvg -o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg -clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.1 -method gromos -dt 10

==Q== How many clusters were found and what were the sizes of the largest two?

Change the cutoff value to obtain a decent number of clusters (10 < Nclusters < 100).

Open the file clusters.pdb with pymol and compare the structures from the first two clusters.

disable all

split_states clusters

delete clusters

/for i in range(3,100): cmd.delete( "clusters_%04d" %i )


show cartoon

util.cbam clusters_0002

align clusters_0001 and ss h, clusters_0002 and ss h

==Q== Are there notable differences between the two structures?

Distance RMSD

One drawback of using the RMSD for comparing structures is that it involves least squares fitting, which will influence the results. But the structure of a protein can also be represented by the set of interatomic distances. These can be used to obtain a measure of comparison that does not depend on fitting: the distance RMSD (dRMSD). This can be obtained using the program g_rmsdist.

g_rmsdist -s topol.tpr -f traj_nojump.xtc -o distance-rmsd.xvg

==Q== At what time and value does the dRMSD converge and how does this graph compare to the standard RMSD?

The end

his concludes our practical. Thank you for following it!

This tutorial is to a large extent derived from a previous work of Dr. Tserk A. Wassenaar.