JavaScript Menu, DHTML Menu Powered By Milonic

Molecular Modeling Practical

This tutorial introduces the student to the practice of Molecular Dynamics (MD) simulations of peptides. The protocol used is a suitable starting point for investigation of peptides, provided that the system does not contain non-standard groups. At the end of the tutorial, the student should know the steps involved in setting up and running a simulation, including some reflection on the choices made at different stages. Besides, the student should know how to perform quality assurance checks on the simulation results and have a feel for methods of analysis to retrieve information.

Analysis of data from molecular dynamics simulation

When the simulation has finished, it is time to continue with the analysis of the data. The analysis comprises of three stages. First, it is necessary to perform some standard checks to assess the quality of the simulations. If the results from these analyses show that the simulations are fine, then the analyses required to answer the research question asked can be performed per simulation. Finally, the results from different simulations can be combined.

NOTE: The actual names of files depend on the choices made during the preparation/run. Here we assume that the default names have been used, which means that there are the following files:

  • topol.tpr: The run input file containing a complete description of the system at the start of the simulation
  • traj.trr: The full precision trajectory containing the positions, velocities and forces over time
  • traj_comp.xtc: A light weight trajectory, containing only coordinates in low precision (0.001 nm)
  • ener.edr: Energy related parameters over time
  • md.log: A file containing information about the simulation.

As a side note, many of the analysis tools produce graphs in the form of .xvg files. these files can be viewed with the program xmgr or xmgrace, but they can also be viewed in a terminal using the python script xvg2ascii.py.

Contents

  1. Integrity of results
  2. Visualization of results
  3. Quality assurance
  4. Structural analysis
  5. Ensembles, dynamics and time-averaged properties

A brief check of results

Before all else, it has to be verified that the simulations finished properly. There are many things that may cause simulations to crash, especially problems with the force field or insufficient equilibration of the system. To check whether the simulation finished properly, use the gromacs program check:

gmx check -f traj_comp.xtc

You can see how long the simulation has run for. Although the simulation may not have finished, the analysis tools will work with the available frames, so you can practise the analysis and re-run the tools later when the simulation is complete, or on other simulation results provided.

==Q== How many frames are in the trajectory file and what is the time resolution? ( T )

Another important source of information about the simulation is the log file. At the end of the file md.log, or md0.log in case the simulation ran on multiple processors, statistics about the simulation are given. This includes information about the use of CPU and memory resources and about the simulation time. Have a look at the end of the log file. (Note that this information is only available when the simulation is finished.) If you use 'less' you may skip all the way to the end of the file by pressing 'G' (shift-g)

==Q== How long did the simulation run in real time (hours), what was the simulation speed (ns/day) and how many years would the simulation take to reach a second? ( T )

Visualization of results

Now it's time for the fun part. Although much of the analysis will come down to extracting graphs from the trajectories, molecular dynamics is of course foremost about the motions in the system. Time to have a look at the trajectory.

First have a look with the trajectory viewer supplied with gromacs, gmx view. Although rudimentary and visually less appealing than other viewers, it has the advantage that it draws bonds based on the information in the topology file. Other viewers may infer bonds from distances, which can cause bonds that are considered too long not te be drawn or will add bonds between atoms that come too close together. This is a common source of misinterpretation of simulation results.

In classical simulations it is impossible for bonds to form or break.

Use gmx view to load the topology and trajectory:

gmx view -f traj_comp.xtc

Have a look at the menus and try out the main options. First, select what part of the system to visualize. You can change this at any later time, by pressing the menu "Display-Filter...". To play the movie, press the menu "Display-Animate", then release. In the window that appears below, you can take single steps, go quickly through all frames, rewind to the beginning, and stop. The view is controlled by the options on the right. For each of these options right- or left-click the mouse to change the view. To exit the program, press the menu "File-Quit" and release.

==Q== What happens if the peptide diffuses over the boundary of the box?

For nicer visualization, we will now extract frames at every 10 ps from the trajectory (-dt 10), leaving out the water (select Protein when asked for a selection). Moreover, we will remove the translation and rotation of the peptide as a whole, thereby focusing on the internal structure. For these things there's the Swiss army knife gromacs tool trjconv, with its 1001 possible combinations of options. We use it to write a multi-model pdb file, to allow visualization in Pymol or VMD.

When asked for a selection choose "Protein" if no selection is specifically stated or does not follow logically from the text.

For a multi-chain protein, the chains may not be held together when using the standard fitting procedure. A robust manner is a two-step fitting:

gmx trjconv -f traj_comp.xtc -o cluster.xtc -pbc cluster -dt 10

Select protein for clusering and protein for output. Then:

gmx trjconv -f cluster.xtc -o peptide.pdb -fit rot+trans

When prompted to select groups, select the backbone for the fitting and the protein for the output.

We will first have a quick look at the peptide using VMD.

vmd peptide.pdb

After all frames have loaded, you can show the frames by clicking the forward button in the main window. You will see the dynamics of the peptide.

The default view is a stick representation. You can change the representation (or actually view multiple representations) through opening the menu "Graphics - Representation". Pick, for example, ResID for Coloring Method and "New Cartoon" for Drawing Method.

To get a view of the conformational space visited, all frames can be shown at the same time. We want to show all of the atoms again. First, create a second representation, by clicking "Create Rep"; then choose "Licorice" as the Drawing Method and "Name" as the Coloring Method. Now, click on this representation in the menu. To show all frames at the same time, click on "Trajectory" tab in the Graphical Representations menu. Instead of the word "now (near the bottom), type 1:11, which will show the first 11 frames, so if you want to see all of them, figure out how many frames there are... For a good view, you may need to decrease the thickness of the bonds a bit - you can do that from the "Draw style" tab.

Quality assurance

After a first visual inspection of the trajectory, it's time to perform some more thorough checks regarding quality of the simulations. This quality assurance (QA) involves tests for the convergence of thermodynamic parameters, such as the temperature, the pressure, the potential and the kinetic energy. More commonly phrased, QA tries to assess whether an equilibrium was reached. Convergence is also checked in terms of the structure, through the root mean square deviation (RMSD) against the starting structure and the average structure. Next to that, it has to be checked that there have not been interactions between adjacent periodic images, as such interactions could lead to unphysical effects. A final QA test involves the calculation of the root mean square fluctuations of atoms, which can be compared to crystallographic b-factors.

Convergence of energy terms

We start off with the extraction of some thermodynamic parameters from the energy file. The following properties will be investigated: temperature, pressure, potential energy, kinetic energy, unit cell volume, density, and the box dimensions. Most of these have been checked already for some of the preparation steps. Energy analysis is performed with the tool gmx energy. This program reads in the energy file, which is produced during the simulation and has the extension .edr. gmx energy will ask for a term to extract from the energy file and will produce a graph from that. Give the following command to run gmx energy:

gmx energy -f ener.edr -o temperature.xvg

This will give a list of the energy and related terms which are stored in the .edr energy file. In our case, there are probably more than 60 terms in the energy file, each of which may be extracted and graphed. The first set of entries correspond to different energy terms in the force field. Also note the group of entries which list the energy terms split over the groups Protein and Non-Protein, including the interactions between the two. To extract the temperature, you can type "Temperature" and press Enter twice, or enter the number for the temperature from the list, then enter 0 and press Enter. Have a look at the graph with the program xmgrace and see how the temperature fluctuates around the value specified (300 K).

xmgrace -nxy temperature.xvg

==Q== What is the average temperature of the system? ( T )

Have a look also at the kinetic and potential energies and see how the values converge. Also, check the volume. If the values have not converged, this indicates that the simulation has not yet reached thermal equilibrium and that it should be extended before doing further analysis. Moreover, the period over which equilibration takes place should not be included in the analysis. Here, for the sake of simplicity, we will neglect these considerations and use the results from the simulations as they are.

==Q== Estimate the plateau values for the kinetic and potential energies and the volume. ( T )

The equilibration of some terms takes longer than that of other terms. In particular, the temperature readily converges to its equilibrium value, whereas the interaction between different parts of the system may take much longer. In general, you would need to investigate the properties of interest for their convergence.

Minimum distances between periodic images

One of the most important things to check in terms of quality assurance is that there have not been direct interactions between periodic images. Since periodic images are identical, such interactions are unphysical self-interactions. Imagine that a peptide with a dipole would have direct interactions. Then the attraction between the two ends of the same molecule over the periodic boundary would affect and invalidate the results relating to the native behaviour of the peptide. To verify that such interactions have not taken place, we calculate the minimal distance between periodic images at each time. This is done using mindist.

gmx mindist -f traj_comp.xtc -s topol.tpr -od minimal-periodic-distance.xvg -pi

==Q== What was the minimal distance between periodic images and at what time did that occur? ( T )

Not only direct interactions are of concern, but indirect effects, mediated through the water can also pose a problem. For instance, the peptide can cause water ordening in the first four shells of water. That corresponds to about one nanometer. Ideally, the minimal distance should therefore not be less than two nanometer.

Root mean square fluctuations

Next to inspection of energies and such, convergence of the simulation towards equilibrium should also be inspected from the relaxation of the structure. Usually, such relaxation is only considered in terms of the Euclidean distance from the structure to a reference structure, e.g. the crystal structure. This distance is termed the root mean square deviation (RMSD). However, it is recommended also to investigate the relaxation towards the average structure, i.e. the RMSD with respect to the average. The reasons for this will be set out in the next paragraph. But to calculate the RMSD against the average structure, requires first to obtain the average. This structure can be obtained as a side product from calculation of the root mean square fluctuations (RMSF).

The RMSF captures, for each atom, the fluctuation about its average position. This gives insight into the flexibility of regions of the peptide. The RMSF (and the average structure) are calculated with gmx rmsf. We are most interested in the fluctuations on a per residue basis, which is controlled by the flag -res.

gmx rmsf -f traj_comp.xtc -s topol.tpr -o rmsf-per-residue.xvg -ox average.pdb -res

Have a look at the graph of the RMSF with xmgrace and identify the flexible and rigid regions.

==Q== Are there regions that are clearly more flexible than others? Can you identify these regions also by looking at the superimposed structures? ( T )

Convergence of RMSD

As the calculation of the RMSF also gave the average structure, we can now commence with calculating the RMSD. The RMSD is commonly used as an indicator of convergence of the structure towards an equilibrium state. As mentioned above, the RMSD is merely a distance measure and is most meaningful for low values. The RMSD is calculated using the program gmx rms. First calculate the RMSD for all peptide atoms, using the starting structure as a reference:

gmx rms -f traj_comp.xtc -s topol.tpr -o rmsd-all-atom-vs-start.xvg

==Q== At what time and value does the RMSD reach a plateau? ( T )

Now calculate the RMSD again, but only select the backbone atoms:

gmx rms -f traj_comp.xtc -s topol.tpr -o rmsd-backbone-vs-start.xvg

This time the RMSD settles at a lower value, which is the result of excluding the, often flexible, side chain atoms. In both cases the RMSD increases to a plateau value. This means that the structure of the peptide reaches a certain distance from the reference structure and then keeps that distance more or less. However, with increasing distance, the amount of conformations available also increases. This means that, although the RMSD has reached a plateau value, the structure may still be progressing towards its equilibrium state. For this reason, it is advisable also to check the convergence towards the average structure.

echo 1 | gmx trjconv -f traj_comp.xtc -s topol.tpr -o peptide.xtc

gmx rms -f peptide.xtc -s average.pdb -o rmsd-all-atom-vs-average.xvg

gmx rms -f peptide.xtc -s average.pdb -o rmsd-backbone-vs-average.xvg

Compare the resulting graphs to the previous ones. Note at which point the RMSD values level off.

==Q== Briefly discuss two differences between the graphs against the starting structure and against the average structure. Which is a better measure for convergence?

Convergence of radius of gyration

As a final part of the QA, we calculate the radius of gyration. This measure gives an indication of the shape of the molecule at each time. The radius of gyration compares to the experimentally obtainable hydrodynamic radius. It can be calculated using gmx gyrate. This program will also give the individual components, which correspond to the eigenvalues of the matrix of inertia. This means that the first individual component corresponds to the longest axis of the molecule, while the last corresponds to the smallest. In effect, the three axes give a global indication of the shape of the molecule. Issue the command

gmx gyrate -f traj_comp.xtc -s topol.tpr -p -o radius-of-gyration.xvg

Have a look at the radius of gyration and the individual components and note how each of these progress to an equilibrium value.

==Q== At what time and value does the radius of gyration converge? ( T )

Structural analysis: properties derived from configurations

At this point we have ended the first part of the analysis, involving a visual inspection and quality assurance checks. Now time has come to dig in a bit deeper and have a look at what's going on inside the peptide. This second part of the analysis covers structural properties that can be calculated from peptide configurations, such as the number of hydrogen bonds and the presence of salt bridges.

When asked for a selection choose "Protein" if no selection is specifically stated or does not follow logically from the text.

Having assured that the simulation has converged to an equilibrium state, it is time to do some real analysis. Analysis of simulation data can be divided into several categories. The first category comprises of the interpretation of single configurations according to some function to obtain a value, or a number of values, for each time point. The RMSD and the radius of gyration are examples. Such properties can be called configurational, dependent or instantaneous properties. Next to that, one can analyze processes in the time domain, e.g. through averaging, as (auto)correlations or fluctutations, and across simulations. In this section, a number of common analyses will be performed, each of which yields a time series of values directly derived from the trajectory (the coordinates over time). The questions may refer to the screen output from the program or to graphs.

Hydrogen bonds

A property which can be informative is the number of hydrogen bonds, either internally (Protein - Protein) or between the peptide and the surrounding solvent. The presence or not of a hydrogen bond is inferred from the distance between a donor-H - acceptor pair and the donor - H - acceptor angle. To calculate the hydrogen bonds issue the following commands, and have a look at the output files using xmgrace:

gmx hbond -f traj_comp.xtc -s topol.tpr -num hydrogen-bonds-intra-peptide.xvg

gmx hbond -f traj_comp.xtc -s topol.tpr -num hydrogen-bonds-peptide-water.xvg

==Q== Estimate the number of hydrogen bonds within the peptide and between peptide and solvent. ( T )

==Q== Discuss the relation between the number of hydrogen bonds for both cases and the fluctuations in each.

Salt bridges

Besides hydrogen bonds, peptides also often form salt bridges between oppositely charged residues. These can have an important stabilizing effect on the structure of the peptide, in particular when they are embedded in a hydrophobic environment, such as the core of the peptide. The existence of salt bridges can be investigated with the gromacs program saltbr. When requested (through the flag -sep) this program will generate an output file for every pair of oppositely charged residues which at some point in the trajectory are within a certain cut off distance from each other (here 0.5 nm, specified through the option -t). This gives a large number of files, so it is best to make a separate directory for running this analysis. Execute the following commands:

mkdir saltbridge

cd saltbridge

gmx saltbr -f ../peptide.xtc -s ../protein-EM-vacuum.tpr -t 0.5 -sep

Just to make things a bit clearer, remove the files involving sodium and chloride ions:

Be careful using the asterisk in Linux. rm * will delete everything in the directory (or folder) you are currently in!

rm *CL* *NA* \#*

Investigate the salt bridge(s) found in your simulations. Which residues are involved? Can you find the salt bridge(s) in the visualized structures?

==Q== What can you say about the interactions, stability and structural role of the salt bridge(s) in the Trp-cage?

If you are located in the 'saltbridge' directory, go back to the directory above that. You can do this by typing:

cd ..

Secondary structure

Among the most common parameters to judge peptide structure is the assignment of secondary structure elements, such as α-helices and β-sheets. One such assignment is provided by dssp. This program, which is not part of the gromacs distribution, but can be obtained at the CMBI, Radboud University, and is also available here. Gromacs does provide an interface to dssp, to allow the calculation of secondary structure for each frame of a trajectory. Be CAREFUL here, because the earlier versions of 2020 fail in this analysis. It was fixed from 2020.3. If that version is not available to you, find an alternative to perform the analysis, for example using the 'Timeline' in VMD (under the menu Extensions->Analysis).

With gromacs, first set an environment variable DSSP, which points to the location of the program. Then run do_dssp as follows:

export DSSP=your_dssp_exectuable

On the FSE computers at Groningen, you can use a version downloaded by one of your TAs:

export DSSP=/home/p162691/PROGRAMS/dssp-v2

You should also use a different version of GROMACS for the next part to work properly. You need to take the following steps (NOTE also that some commands later on are slightly changed). First, you need to change your version of GROMACS.

It is recommended that you load this other version in a separate terminal. You then need to take care that you use this version only for a few cases: (1) DSSP analysis (directly below) (2) converting xpm files to eps files

source /home/p162691/PROGRAMS/gromacs-5.1/bin/GMXRC

Next, you need to make a .tpr file for this version. Because this version is older, it is not able to work with the 2020 .tpr file.

gmx grompp -p protein.top -c protein-NPT.gro -f md.mdp -o v5.tpr

Now do the DSSP analysis (if you are using the older GROMACS version use the second command!!):

gmx do_dssp -f traj_comp.xtc -s topol.tpr -ver 2 -o secondary-structure.xpm -sc secondary-structure.xvg -dt 10

gmx do_dssp -f traj_comp.xtc -s v5.tpr -ver 2 -o secondary-structure.xpm -sc secondary-structure.xvg -dt 10

The file secondary-structure.xvg contains a time series, listing the numbers of residues associated with each type of secondary structure per frame. More detailed information is in the .xpm file, which gives a color coded assignment of the secondary structure per residue over time. The .xpm file can be viewed with e.g. the Gimp , but some useful metadata can be added with the gromacs tool xpm2ps and the result can be viewed with an appropriate program.

The next command does not work with the 2020 version!

gmx xpm2ps -f secondary-structure.xpm -o secondary-structure.eps

You can view the .eps file by clicking on it in a file browser - the system will find the right tool to show it.

==Q== Discuss some of the changes in the secondary structure, if any.

Ramachandran (phi/psi) plots

Phi, psi and omega angles, and the ramachandran plot

The phi and psi torsion angles of the peptide backbone are two parameters that are useful to gain insight into the structural properties of a peptide. The plot of phi against psi is called a Ramachandran plot and certain regions of this plot are characteristic of secondary structure elements or of amino acids. Other regions are considered forbidden (inaccessible). Projection of the phi/psi angles over time may give insight in structural transitions. These angles can be calculated by the program rama, albeit in a bit of a crude way, namely put all together in a single file.

If you'd wish to see what GROMACS can do, type the following:

gmx rama -f traj_comp.xtc -s topol.tpr -o ramachandran.xvg

xmgrace ramachandran.xvg

Analysis of dynamics and time-averaged properties

Root mean square deviations again

The root mean square deviation (RMSD) was already calculated to check the convergence of the simulation. But it can also be used for further analysis. The RMSD is a measure relating two structures. If we calculate the RMSD for each combination of structures in a trajectory file, then we can see if there are groups of structures belonging together, sharing structural features. Such structures will have lower RMSD values within the group and higher RMSD values with other structures. Representing the RMSD values in a matrix, this also allows identifying transitions.

To build an RMSD matrix, the gromacs tool rms is called with two trajectories. In our case, the same trajectory will be used for both.

gmx rms -s topol.tpr -f traj_comp.xtc -f2 traj_comp.xtc -m rmsd-matrix.xpm -dt 10

The next command does not work with the 2020 version!

gmx xpm2ps -f rmsd-matrix.xpm -o rmsd-matrix.eps

View the file rmsd-matrix.eps. How many transitions can you distinguish?

Cluster analysis

On the basis of the distances between structures, the RMSD, the these can also be assigned to a set of clusters that reflects the range of conformations accessible and the relative weight of each of these. This can be done using a clustering algorithm, of which several are implemented in gmx cluster. This program generates a number of output files. Check the programs help file to see what each means. Then run the program.

gmx cluster -h

gmx cluster -s topol.tpr -f traj_comp.xtc -dist rmsd-distribution.xvg -o clusters.xpm -sz cluster-sizes.xvg -tr cluster-transitions.xpm -ntr cluster-transitions.xvg -clid cluster-id-over-time.xvg -cl clusters.pdb -cutoff 0.25 -method gromos -dt 10

==Q== How many clusters were found and what were the sizes of the largest two? ( T )

Open the file clusters.pdb with VMD and compare the structures from the first two clusters.

==Q== What are the most notable differences between the first two structures?

Distance RMSD

One drawback of using the RMSD for comparing structures is that it involves least squares fitting, which will influence the results. But the structure of a peptide can also be represented by the set of interatomic distances. These can be used to obtain a measure of comparison that does not depend on fitting: the distance RMSD (dRMSD). This can be obtained using the program gmx rmsdist.

gmx rmsdist -s topol.tpr -f traj_comp.xtc -o distance-rmsd.xvg

==Q== At what time and value does the dRMSD converge and how does this graph compare to the standard RMSD?

Analysis of interatomic distances, NOE's from simulations

The program rmsdist used above can also be used for further distance analysis. In particular, it may be useful to look at the mean distances between atoms and the fluctuations in these distances to get an idea of the structure and stability. Calculate the matrices containing the mean distances and distance fluctuations for each pair of atoms in the peptide.

gmx rmsdist -s topol.tpr -f traj_comp.xtc -rms rmsdist.xpm -mean rmsmean.xpm -dt 10

==Q== Briefly explain the images: rmsmean in terms of structure and rmsdist in terms of flexibility/stability. Recall the information from earlier analysis and viewing the structure.

These distances are also important from an experimental point of view. Bounds on distances between atoms can be derived from NMR experiments, exploiting the Nuclear Overhauser Effect (NOE), and are the main driving force between NMR structure calculations. The NOE signals that should be expected for a peptide if the model is correct, can be calculated from a MD simulation. These are intimately linked to the distances, in particular to the r-3 and r-6 weighted distances. These can also be calculated using gmx rmsdist.

gmx rmsdist -s topol.tpr -f traj_comp.xtc -nmr3 nmr3.xpm -nmr6 nmr6.xpm -noe noe.dat -dt 10

Have a look at the data and form an opinion regarding the information contained and its relation to NMR data (see figure below, taken from Neidigh et al.). Note that the numbering in the experimental paper is different from the one in the simulation (the central Trp in the experimental paper is residue number 25; in our simulation it is residue number 6).

==Q== Which are the residues that will most probably lead to NOE signals? How do the simulation results compare to the experimental data? ( T )

Experimental NOE-data taken from Neidigh et al.

There is much more analysis that can be done, but we will leave it here for now. However, if you want to analyze a simulation that has run for a bit longer (50 ns), you can download it 50ns.xtc. Perhaps the amount of water and ions is slightly different from your own set-up and then you need to make a .tpr file to match, so the .top file is 50ns.top.