JavaScript Menu, DHTML Menu Powered By Milonic

Molecular Modeling Practical Nerd Page

This tutorial introduces the student to the practice of Molecular Dynamics (MD) simulations of proteins. The protocol used is a suitable starting point for investigation of proteins, provided that the system does not contain non-standard groups. At the end of the tutorial, the student should know the steps involved in setting up and running a simulation, including some reflection on the choices made at different stages. Besides, the student should know how to perform quality assurance checks on the simulation results and have a feel for methods of analysis to retrieve information.

Install Gromacs with MPI support

For those who know how to get around in Linux, it may be worth installing Gromacs yourself. In fact, you might want to install it from source. It's really one of the few programs that will compile with little else than ./configure and make.

But then again, it does pay off to install a library for doing Fourier transformations. So, first get FFTW and install that:

wget ftp://ftp.fftw.org/pub/fftw/fftw-3.2.tar.gz

tar xvzf fftw-3.2.tar.gz

cd fftw-3.2 && ./configure --enable-float --prefix=/home/YOU/fftw && make && make install

Now it's time to get Gromacs. For this tutorial, we use version 3.3.1. Feel free to (also) install another (newer) version, but be aware that some steps in the tutorial might need to be changed a bit then.

wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-3.3.1.tar.gz

tar xvzf gromacs-3.3.1.tar.gz

cd gromacs-3.3.1

Time for business! Set the environment variables CPPFLAGS and LDFLAGS to point to the FFTW directories. Then compile Gromacs. Setting the environment variables in the following way will work for bash/sh as well as for (t)csh, but since you went to this page, you'll know which part of the command to take.

setenv CPPFLAGS -I/home/YOU/fftw/include || export CPPFLAGS=-I/home/YOU/fftw/include

setenv LDFLAGS -L/home/YOU/fftw/lib || export LDFLAGS=-L/home/YOU/fftw/lib

./configure --prefix=/home/YOU/GMX331 && make && make install

Well, there you have it! But then again, this doesn't give you MPI support yet. To achieve that, you'll also have to fetch an MPI library and compiler.

wget http://www.lam-mpi.org/download/files/lam-7.1.4.tar.bz2

tar xvjf lam-7.1.4.tar.bz2

cd lam-7.1.4 && ./configure --prefix=/home/YOU/lam714 && make && make install

That should've done the trick. If it didn't, you'll probably have to resolve some dependencies. Assuming you got this one right (in the end), let's compile gromacs' mdrun with MPI support. Go to the gromacs directory and type

make clean

setenv MPICC /home/YOU/lam714/bin/mpicc || export MPICC=/home/YOU/lam714/bin/mpicc

./configure --prefix=/home/YOU/GMX331 --enable-mpi --program-suffix="_mpi"

make mdrun && make install-mdrun

So, that should do it. Now you ought to be able to run your simulations on all two/four/eight or dedicate your home-made network to something else than FPSs. You can run a simulation on four processors/cores with

grompp -f parameters.mdp -c structure.pdb -p topology.top -o run.tpr -np 4

/home/YOU/lam714/lamboot

/home/YOU/lam714/bin/mpirun -np 4 mdrun_mpi -np 4 -s run.tpr

A bit of SED trickery

Sometimes it's just convenient to modify text files without actually opening them. Especially when writing scripts to automate tasks, including setting up simulations, it will frequently occur that you have to modify the contents of some file. Now, if you know what to change and how to change it, there's a good chance you can do it with just one or a few lines of code. Changing a single parameter in the simulation parameter file is one of these cases. For instance, we can change the pbc setting from 'none' to 'xyz' in the file minim.mdp using

sed -i /^pbc/s/no/xyz/ minim.mdp

This calls the program 'sed', the "stream editor", which is a powerful tool to handle text files or streams and offers many options for editing automatically. In the present case, the flag -i calls sed in interactive mode, which means that the file(s) on the command line are changed, rather than giving the edited stream as output to stdout. The next part tells sed what to do:
/^pbc/ is an address specifier. It tells sed to look for lines starting with the string 'pbc'. On those lines, the following will be executed:
s/none/xyz/ substitutes the part matching the first regexp, simply 'no' in this case, with the string specified in the second part, 'xyz'.

More SED trickery

The previous sed trick showed how to replace a word in a line starting with "pbc". sed can also be used to operate on blocks of text, by using two address specifiers; one for start and one for stop. The objective is simple: print the part of the file that starts with "Total NODE time". To indicate the end of the file, the special address specifier "$" is used. The command for printing is "p", so that makes the sed line '/Total NODE time/,$p'. But the common behaviour of sed is to print every line. To diasable that, the command line option -n is used:

sed -ne '/^Total NODE time'/,$p' md0.log

As a side note, we could also want a couple of lines following a line containing a specific piece of text. For instance, we are mainly interested in the performance. This is summarized in the four lines following "NODE (s)   Real (s)      (%)". To get these lines and print them, we seek the line containing "NODE (s)", print it (p) and read in the next line (n) and print it, repeating the last two commands three times. To group a set of commands to be executed at one address, use curly braces and semicolons:

sed -ne '/NODE (s)/{p;n;p;n;p;n;p;n;p}' md0.log

Pretty pictures with POV-Ray

The built-in raytracer of Pymol does a fair job producing images. In fact, it is now widely used for generating images for publications. In most cases the general settings are kept default and with a bit of training one will easily recognize which images were created with Pymol, VMD or molmol. Okay, but in most cases not standing out in terms of graphics. To make more from an image, it pays off to export the file to a format that can be read with a ray-tracing engine such as POV-Ray. One might argue that it makes no sense to do so, but an attractive image may draw attention to your work, especially if it stands out in one sense or the other.

For the following POV-Ray should be installed on your computer. Try the command 'povray' and if it doesn't show you a page of help, but rather "command not found", first install it. For other reasons than this tutorial to install POV-Ray, have a look at the sample page

Now download the Pymol script make_pov.py and start Pymol, loading your protein. Then, load the script using:

run make_pov.py

Set up the scene just the way you want it, changing colours, representation, etc. A lot of settings relating to the view can be changed in POV-Ray, but it doesn't allow you to view the results as quickly, since you have to ray-trace before you can get it on the screen. Also, colours can be changed in POV-Ray, but requires a bit of work (using sed). Best is to use Pymol to set everything up as nicely as you can and use POV-Ray for the polishing. If you're satisfied with the scene, export it to POV-Ray format using the command "make_pov" that was added to Pymol by the script, and then quit.

make_pov myprion.pov

quit

Now open the file myprion.pov in your favourite editor. It will probably look something like this:

camera {direction<0.0,0.0,  -2.835>
 location <0.0 , 0.0 , 0.0>
 right 2*x up y 
 }
#default { finish{phong   -1.000 ambient    0.500 diffuse    0.450 
phong_size 13.750000}}
light_source{<4000.0001,4000.0001,9978.9431>  rgb<1.0,1.0,1.0>}
plane{z , -182.9770 
 pigment{color rgb<0.0000,0.0000,0.0000>}
 finish{phong 0 specular 0 diffuse 0 ambient 1.0}}


// Uncomment the following lines if you have the pymolmacro.inc include file and want to use it.
/*
#include "pymolmacro.inc"
PYMOL_VIEW(   -0.48392,   -0.48639,   -0.72749,
               0.49048,   -0.83922,    0.23483,
	      -0.72474,   -0.24319,    0.64468,
	      -0.00008,   -0.00003,  -52.02350,
	      35.34558,   -4.81726,   11.94539,
	      21.05691,   82.97701,    0.00000 )
*/

#include "myprion.inc"

Without further editing, this file can be raytraced as a 1000x500 pixel, 8 bits/colour PNG image file using:

povray +Imyprion.pov +Onormal.png +FN8 +W700 +H350 +Q9 +A0.3

The option +Q sets the quality (9 is full) and +A sets the anti-aliasing. This gives the following image:

File rendered using standard settings

Cute, but Pymol does a better job than that. One thing that is not so nice is the sharpness of the shadows. This is due to the use of a single light source, like what you get in space with the light from the sun. On earth, light is scattered, making shadows look softer, just as if the light was coming from an area rather than a point. In POV-Ray we can mimick this by adding an array of lights, rather than a single light source. Note that this, jsut like any of the subsequent steps, increases the complexity of the scene, and consequently increases the rendering time. Replace the line specifying the light source with

light_source
{
  <40,40,98>
  rgb 1
  area_light <40,0,0>,<0,40,0>,5,5
  adaptive 1
  jitter
}

This gives the following image. Notice the softer shadows.

File rendered with area_light for softer 
shadows

Better, but the colours still look a bit dull. I like a glossy, shining plastic like appearance. Actually, I like to use that as the default appearance of surfaces. Replace the current #default statement with the following one and render again.

#default { finish {
        ambient .15
        diffuse .5
        specular 1
        roughness .001
        reflection {
          .5
          metallic
        }}}

File rendered with glossy finish

Now you can see the side chains reflecting in the surface of the helices :) But it should be a bit brighter. To make the colours more vivid, we can increase the intensity of the light. In the light_source statement, the color of the light is specified as 1, which is shorthand for <1,1,1>: 100% of red, green and blue. Let's just set it to two, increasing the light intensity twofold.

File rendered with glossy finish, but 
light intensity increased to 200%, giving brighter colours

Let's take a step further. One characteristic of photographic images is that they have focal blurring. With POV-Ray, we can add this effect to our image. Change the camera statement so that it reads:

camera {direction<0.0,0.0,  -2.835>
  location <0.0 , 0.0 , 0.0>
  right 2*x up y 
  focal_point < 0.0, 0.0, -100 >
  aperture 40
  blur_samples 20
}

The rendering will now take a bit longer, but this does add a dimension to the picture. You may have to tune the focal point and the aperture. An aperture of 40 is bit overdone, but it does illustrate the effect.

File rendered with focal blur

Focal blur also really helps making grayscale images stand out. Here's an example from my thesis, showing hemoglobin in stereo view.

Hemoglobin rendered in grayscale with 
focal blur

Debugging g_rms

This applies to Gromacs version 3.3.1. If we run g_rms with a reference structure that contains fewer atoms than the trajectory, we get a segmentation fault. Needless to say on this page that that is due to an attempt to access a part of memory that is forbidden. It seems there is something wrong with counters for iteration. One cause could be that the number of atoms from the trajectory is used where the number of atoms from the reference should be. Since the former is higher than the lower, that could lead to an iterator going out of array bounds. Let's have a look at the source code of g_rms. Probably this was put in /home/you/gromacs-3.3.1/src/tools. The file g_rms.c is only a wrapper and for the real part of the code, we have to have gmx_rms.c. Open this file in your favourite editor. The function called from main in g_rms.c is "gmx_rms", and its definition starts at line 87:

int gmx_rms (int argc,char *argv[])

The first part is, in sequence, the description of the tool, handling of command line options, not including file I/O, declaration of variables and handling of file I/O settings. After that the real body of the function starts. To my opinion, g_rms is one of the least accessible tools, with little structuring. But, tracking and fixing this bug is relatively easy. First go to line 302, where you will find the statement:

bTop=read_tps_conf(ftp2fn(efTPS,NFILE,fnm),buf,&top,&xp,NULL,box,TRUE);

This reads the structure file that is extracted from the file name options (ftp2fn(efTPS,NFILE,fnm)). It stores the topology information in top, the coordinates in xp, and the unit cell definition in box. In this case, it does not read velocities from the reference (NULL). If the file is a run input file, the topology also includes bond information and other things, and the function read_tps_conf returns TRUE.

The top is a topology struct that is declared in GMXDIR/include/types/topology.h, and contains all the available information about the atoms:

typedef struct {
  char            **name;       /* Name of the topology         */
  t_idef        idef;           /* The interaction function definition  */
  t_atoms       atoms;          /* The atoms                    */
  t_atomtypes   atomtypes;      /* Atomtype properties          */
  t_block       blocks[ebNR];   /* The blocks                   */
  t_symtab      symtab;         /* The symbol table             */
} t_topology;

So this is where the reference structure gets put. When we use a .pdb or other coordinate file for reference, we actually only have the atoms. These are stored in another struct: t_atoms. This one's declared in GMXDIR/include/types/atoms.h:

typedef struct {
  int           nr;             /* Nr of atoms                          */
  t_atom        *atom;          /* Array of atoms (dim: nr)             */
                                /* The following entries will not       */
                                /* allways be used (nres==0)            */
  char          ***atomname;    /* Array of pointers to atom name       */
                                /* use: (*(atomname[i]))                */
  char          ***atomtype;    /* Array of pointers to atom types      */
                                /* use: (*(atomtype[i]))                */
  char          ***atomtypeB;   /* Array of pointers to B atom types    */
                                /* use: (*(atomtypeB[i]))               */
  int           nres;           /* Nr of residue names                  */
  char          ***resname;     /* Array of pointers to residue names   */
                                /* use: (*(resname[i]))                 */
  int           ngrpname;       /* Number of groupnames                 */
  char          ***grpname;     /* Names of the groups                  */
  t_block       excl;           /* Exclusions                           */
  t_grps        grps[egcNR];    /* Groups of things                     */
  t_pdbinfo     *pdbinfo;       /* PDB Information, such as aniso. Bfac */
} t_atoms;

Although there's quite a bit of it, we're probably looking for the number of atoms, set as nr. So, the number of atoms is the member nr from the member atoms from the variable top: top.atoms.nr

Now go back to the file gmx_rms.c and go to line 395. This is the start of processing the trajectory. In the next line information is read from the first frame from the trajectory, including the number of atoms that is returned and put in natoms:

natoms=read_first_x(&status,opt2fn("-f",NFILE,fnm),&t,&x,box);

Now, if in the following natoms is used in stead of top.atoms.nr in iteration, that could well cause an idnex to go out of bounds, thus giving a segfault. You may go searching for the place(s) where this happens. But a simple trick is setting natoms to the maximum number of atoms in the reference structure. Add the next line after the one in which the first frame is read:

natoms=min(natoms,top.atoms.nr);

Now save the file and exit the editor. Make the fixed program:

make g_rms