This tutorial introduces the student to the practice of Molecular Dynamics (MD) simulations of proteins. The protocol used is a suitable starting point for investigation of proteins, provided that the system does not contain non-standard groups. At the end of the tutorial, the student should know the steps involved in setting up and running a simulation, including some reflection on the choices made at different stages. Besides, the student should know how to perform quality assurance checks on the simulation results and have a feel for methods of analysis to retrieve information.
For those who know how to get around in Linux, it may be worth installing Gromacs yourself. In fact, you might want to install it from source. It's really one of the few programs that will compile with little else than ./configure and make.
But then again, it does pay off to install a library for doing Fourier transformations. So, first get FFTW and install that:
wget ftp://ftp.fftw.org/pub/fftw/fftw-3.2.tar.gz
tar xvzf fftw-3.2.tar.gz
cd fftw-3.2 && ./configure --enable-float --prefix=/home/YOU/fftw && make && make install
Now it's time to get Gromacs. For this tutorial, we use version 3.3.1. Feel free to (also) install another (newer) version, but be aware that some steps in the tutorial might need to be changed a bit then.
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-3.3.1.tar.gz
tar xvzf gromacs-3.3.1.tar.gz
cd gromacs-3.3.1
Time for business! Set the environment variables CPPFLAGS and LDFLAGS to point to the FFTW directories. Then compile Gromacs. Setting the environment variables in the following way will work for bash/sh as well as for (t)csh, but since you went to this page, you'll know which part of the command to take.
setenv CPPFLAGS -I/home/YOU/fftw/include || export CPPFLAGS=-I/home/YOU/fftw/include
setenv LDFLAGS -L/home/YOU/fftw/lib || export LDFLAGS=-L/home/YOU/fftw/lib
./configure --prefix=/home/YOU/GMX331 && make && make install
Well, there you have it! But then again, this doesn't give you MPI support yet. To achieve that, you'll also have to fetch an MPI library and compiler.
wget http://www.lam-mpi.org/download/files/lam-7.1.4.tar.bz2
tar xvjf lam-7.1.4.tar.bz2
cd lam-7.1.4 && ./configure --prefix=/home/YOU/lam714 && make && make install
That should've done the trick. If it didn't, you'll probably have to resolve some dependencies. Assuming you got this one right (in the end), let's compile gromacs' mdrun with MPI support. Go to the gromacs directory and type
make clean
setenv MPICC /home/YOU/lam714/bin/mpicc || export MPICC=/home/YOU/lam714/bin/mpicc
./configure --prefix=/home/YOU/GMX331 --enable-mpi --program-suffix="_mpi"
make mdrun && make install-mdrun
So, that should do it. Now you ought to be able to run your simulations on all two/four/eight or dedicate your home-made network to something else than FPSs. You can run a simulation on four processors/cores with
grompp -f parameters.mdp -c structure.pdb -p topology.top -o run.tpr -np 4
/home/YOU/lam714/lamboot
/home/YOU/lam714/bin/mpirun -np 4 mdrun_mpi -np 4 -s run.tpr
Sometimes it's just convenient to modify text files without actually opening them. Especially when writing scripts to automate tasks, including setting up simulations, it will frequently occur that you have to modify the contents of some file. Now, if you know what to change and how to change it, there's a good chance you can do it with just one or a few lines of code. Changing a single parameter in the simulation parameter file is one of these cases. For instance, we can change the pbc setting from 'none' to 'xyz' in the file minim.mdp using
sed -i /^pbc/s/no/xyz/ minim.mdp
This calls the program 'sed', the "stream editor", which is a powerful
tool to handle text files or streams and offers many options for editing
automatically. In the present case, the flag -i calls sed in interactive
mode, which means that the file(s) on the command line are changed, rather
than giving the edited stream as output to stdout. The next part tells sed
what to do:
/^pbc/ is an address specifier. It tells sed to look for lines starting
with the string 'pbc'. On those lines, the following will be
executed:
s/none/xyz/ substitutes the part matching the first regexp, simply 'no'
in this case, with the string specified in the second part, 'xyz'.
The previous sed trick showed how to replace a word in a line starting with "pbc". sed can also be used to operate on blocks of text, by using two address specifiers; one for start and one for stop. The objective is simple: print the part of the file that starts with "Total NODE time". To indicate the end of the file, the special address specifier "$" is used. The command for printing is "p", so that makes the sed line '/Total NODE time/,$p'. But the common behaviour of sed is to print every line. To diasable that, the command line option -n is used:
sed -ne '/^Total NODE time'/,$p' md0.log
As a side note, we could also want a couple of lines following a line containing a specific piece of text. For instance, we are mainly interested in the performance. This is summarized in the four lines following "NODE (s) Real (s) (%)". To get these lines and print them, we seek the line containing "NODE (s)", print it (p) and read in the next line (n) and print it, repeating the last two commands three times. To group a set of commands to be executed at one address, use curly braces and semicolons:
sed -ne '/NODE (s)/{p;n;p;n;p;n;p;n;p}' md0.log
The built-in raytracer of Pymol does a fair job producing images. In fact, it is now widely used for generating images for publications. In most cases the general settings are kept default and with a bit of training one will easily recognize which images were created with Pymol, VMD or molmol. Okay, but in most cases not standing out in terms of graphics. To make more from an image, it pays off to export the file to a format that can be read with a ray-tracing engine such as POV-Ray. One might argue that it makes no sense to do so, but an attractive image may draw attention to your work, especially if it stands out in one sense or the other.
For the following POV-Ray should be installed on your computer. Try the command 'povray' and if it doesn't show you a page of help, but rather "command not found", first install it. For other reasons than this tutorial to install POV-Ray, have a look at the sample page
Now download the Pymol script make_pov.py and start Pymol, loading your protein. Then, load the script using:
run make_pov.py
Set up the scene just the way you want it, changing colours, representation, etc. A lot of settings relating to the view can be changed in POV-Ray, but it doesn't allow you to view the results as quickly, since you have to ray-trace before you can get it on the screen. Also, colours can be changed in POV-Ray, but requires a bit of work (using sed). Best is to use Pymol to set everything up as nicely as you can and use POV-Ray for the polishing. If you're satisfied with the scene, export it to POV-Ray format using the command "make_pov" that was added to Pymol by the script, and then quit.
make_pov myprion.pov
quit
Now open the file myprion.pov in your favourite editor. It will probably look something like this:
camera {direction<0.0,0.0, -2.835> location <0.0 , 0.0 , 0.0> right 2*x up y } #default { finish{phong -1.000 ambient 0.500 diffuse 0.450 phong_size 13.750000}} light_source{<4000.0001,4000.0001,9978.9431> rgb<1.0,1.0,1.0>} plane{z , -182.9770 pigment{color rgb<0.0000,0.0000,0.0000>} finish{phong 0 specular 0 diffuse 0 ambient 1.0}} // Uncomment the following lines if you have the pymolmacro.inc include file and want to use it. /* #include "pymolmacro.inc" PYMOL_VIEW( -0.48392, -0.48639, -0.72749, 0.49048, -0.83922, 0.23483, -0.72474, -0.24319, 0.64468, -0.00008, -0.00003, -52.02350, 35.34558, -4.81726, 11.94539, 21.05691, 82.97701, 0.00000 ) */ #include "myprion.inc"
Without further editing, this file can be raytraced as a 1000x500 pixel, 8 bits/colour PNG image file using:
povray +Imyprion.pov +Onormal.png +FN8 +W700 +H350 +Q9 +A0.3
The option +Q sets the quality (9 is full) and +A sets the anti-aliasing. This gives the following image:
Cute, but Pymol does a better job than that. One thing that is not so nice is the sharpness of the shadows. This is due to the use of a single light source, like what you get in space with the light from the sun. On earth, light is scattered, making shadows look softer, just as if the light was coming from an area rather than a point. In POV-Ray we can mimick this by adding an array of lights, rather than a single light source. Note that this, jsut like any of the subsequent steps, increases the complexity of the scene, and consequently increases the rendering time. Replace the line specifying the light source with
light_source { <40,40,98> rgb 1 area_light <40,0,0>,<0,40,0>,5,5 adaptive 1 jitter }
This gives the following image. Notice the softer shadows.
Better, but the colours still look a bit dull. I like a glossy, shining plastic like appearance. Actually, I like to use that as the default appearance of surfaces. Replace the current #default statement with the following one and render again.
#default { finish { ambient .15 diffuse .5 specular 1 roughness .001 reflection { .5 metallic }}}
Now you can see the side chains reflecting in the surface of the helices :) But it should be a bit brighter. To make the colours more vivid, we can increase the intensity of the light. In the light_source statement, the color of the light is specified as 1, which is shorthand for <1,1,1>: 100% of red, green and blue. Let's just set it to two, increasing the light intensity twofold.
Let's take a step further. One characteristic of photographic images is that they have focal blurring. With POV-Ray, we can add this effect to our image. Change the camera statement so that it reads:
camera {direction<0.0,0.0, -2.835> location <0.0 , 0.0 , 0.0> right 2*x up y focal_point < 0.0, 0.0, -100 > aperture 40 blur_samples 20 }
The rendering will now take a bit longer, but this does add a dimension to the picture. You may have to tune the focal point and the aperture. An aperture of 40 is bit overdone, but it does illustrate the effect.
Focal blur also really helps making grayscale images stand out. Here's an example from my thesis, showing hemoglobin in stereo view.
This applies to Gromacs version 3.3.1. If we run g_rms with a reference structure that contains fewer atoms than the trajectory, we get a segmentation fault. Needless to say on this page that that is due to an attempt to access a part of memory that is forbidden. It seems there is something wrong with counters for iteration. One cause could be that the number of atoms from the trajectory is used where the number of atoms from the reference should be. Since the former is higher than the lower, that could lead to an iterator going out of array bounds. Let's have a look at the source code of g_rms. Probably this was put in /home/you/gromacs-3.3.1/src/tools. The file g_rms.c is only a wrapper and for the real part of the code, we have to have gmx_rms.c. Open this file in your favourite editor. The function called from main in g_rms.c is "gmx_rms", and its definition starts at line 87:
int gmx_rms (int argc,char *argv[])
The first part is, in sequence, the description of the tool, handling of command line options, not including file I/O, declaration of variables and handling of file I/O settings. After that the real body of the function starts. To my opinion, g_rms is one of the least accessible tools, with little structuring. But, tracking and fixing this bug is relatively easy. First go to line 302, where you will find the statement:
bTop=read_tps_conf(ftp2fn(efTPS,NFILE,fnm),buf,&top,&xp,NULL,box,TRUE);
This reads the structure file that is extracted from the file name options (ftp2fn(efTPS,NFILE,fnm)). It stores the topology information in top, the coordinates in xp, and the unit cell definition in box. In this case, it does not read velocities from the reference (NULL). If the file is a run input file, the topology also includes bond information and other things, and the function read_tps_conf returns TRUE.
The top is a topology struct that is declared in GMXDIR/include/types/topology.h, and contains all the available information about the atoms:
typedef struct { char **name; /* Name of the topology */ t_idef idef; /* The interaction function definition */ t_atoms atoms; /* The atoms */ t_atomtypes atomtypes; /* Atomtype properties */ t_block blocks[ebNR]; /* The blocks */ t_symtab symtab; /* The symbol table */ } t_topology;
So this is where the reference structure gets put. When we use a .pdb or other coordinate file for reference, we actually only have the atoms. These are stored in another struct: t_atoms. This one's declared in GMXDIR/include/types/atoms.h:
typedef struct { int nr; /* Nr of atoms */ t_atom *atom; /* Array of atoms (dim: nr) */ /* The following entries will not */ /* allways be used (nres==0) */ char ***atomname; /* Array of pointers to atom name */ /* use: (*(atomname[i])) */ char ***atomtype; /* Array of pointers to atom types */ /* use: (*(atomtype[i])) */ char ***atomtypeB; /* Array of pointers to B atom types */ /* use: (*(atomtypeB[i])) */ int nres; /* Nr of residue names */ char ***resname; /* Array of pointers to residue names */ /* use: (*(resname[i])) */ int ngrpname; /* Number of groupnames */ char ***grpname; /* Names of the groups */ t_block excl; /* Exclusions */ t_grps grps[egcNR]; /* Groups of things */ t_pdbinfo *pdbinfo; /* PDB Information, such as aniso. Bfac */ } t_atoms;
Although there's quite a bit of it, we're probably looking for the number of atoms, set as nr. So, the number of atoms is the member nr from the member atoms from the variable top: top.atoms.nr
Now go back to the file gmx_rms.c and go to line 395. This is the start of processing the trajectory. In the next line information is read from the first frame from the trajectory, including the number of atoms that is returned and put in natoms:
natoms=read_first_x(&status,opt2fn("-f",NFILE,fnm),&t,&x,box);
Now, if in the following natoms is used in stead of top.atoms.nr in iteration, that could well cause an idnex to go out of bounds, thus giving a segfault. You may go searching for the place(s) where this happens. But a simple trick is setting natoms to the maximum number of atoms in the reference structure. Add the next line after the one in which the first frame is read:
natoms=min(natoms,top.atoms.nr);
Now save the file and exit the editor. Make the fixed program:
make g_rms