Parametrizing a new molecule based on known fragments

In this part of the tutorial, we will discuss how to build a Martini topology for a molecule that consists of known fragments, but was not previously described. The model will then be used in a self-assembly simulation to assess its behavior. Refinement of the model is not attempted. If you feel the need for refinement, consult the second part of this tutorial (where the procedure is described for another molecule), or similar refinement procedures that are part of the lipid and polymer tutorials, respectively.

Suppose you got all excited about a class of molecules that are asymmetric bola-amphiphiles, and more in particular the types of molecules shown in Figure 1 below, taken from Masuda and Shimizu, Langmuir 20, 5969 (2004). These molecules have hydrophilic groups at both ends of the chain, but they are distinctly different in size, one being a relatively large glucose ring, the other a relative small carboxylic acid. These molecules self-assemble in tubes whose diameter depends on the length of the linker, as described by Masuda and Shimizu. You recognize chemical groups that are not too far from standard lipid and/or protein components, and so it is quite likely that the building blocks themselves should be available in the Martini force field. Indeed, Martini does also treat proteins and sugars and glycolipid simulations and parameters have been published.


Figure 1: asymmetric bola lipid family, a large, glucose-based hydrophilic group connected by a hydrocarbon chain to a small hydrophilic group

Unfortunately, it appears the molecule you are interested in, is not a glycolipid as published but has a different connectivity. You need to implement this molecule yourself! However, you may use existing building blocks as much as possible. Below, we explain how to go about this. 

The first step is therefore to split the molecule in reasonable building blocks. This is sometimes referred to as a mapping (from atomistic to coarse-grained representations). If you consider the chemical nature of the molecular fragments and the existing Martini models for glycolipids and proteins, you may come up with an 8-bead model for the molecule with a linker of 12 C-atoms (n=12 in Figure 1). It may be useful to make a drawing showing which atoms you group into which bead, showing the bead type, as in e.g. the Martini glycolipid paper cited above. A reasonable mapping based on the glycolipid GCER is shown in Figure 2. GCER is representative of both glucosyl- and galactosylceramide; the structures can be found in more detail also on the lipid library website. Although GCER has an amide group, it is connected via an ether linker in GCER, whereas it is directly attached to the sugar ring in our lipid, so we will need to come up with different bond and angle parameters, but the mapping scheme seems reasonable.


Figure 2: Published glycolipid GCER and possible (partial) mapping of asymmetric bola lipid family

Let us call the new molecule GDAL (for glucose-dodecane-acid-lipid). You can probably think of a much better name, but this is the name that will be assumed in the following...

You will need to write a topology file for the molecule. You may start from scratch, or you may take an existing file and edit it. A good candidate may be the GCER lipid. GCER itself is just the name of the head group linked to a ceramide; if we look in the Martini lipidome, we find a ceramide lipid, DPGS, with two tails, that may well be useful as a template. Some part of the .itp file is shown here: 

[ moleculetype ]
DPGS               1
[ atoms ]
 1          P4    1    DPGS     C1     1         0        72.0000
 2          P1    1    DPGS     C2     2         0        72.0000
 3          P1    1    DPGS     C3     3         0        72.0000
 4          P1    1    DPGS    AM1     4         0        72.0000
 5          P5    1    DPGS    AM2     5         0        72.0000
 6          C3    1    DPGS    D1A     6         0        72.0000 ; corrected (C3 instead of C1), oct 2013
 7          C1    1    DPGS    C2A     7         0        72.0000
 8          C1    1    DPGS    C3A     8         0        72.0000
 9          C1    1    DPGS    C4A     9         0        72.0000
10          C1    1    DPGS    C1B    10         0        72.0000
11          C1    1    DPGS    C2B    11         0        72.0000
12          C1    1    DPGS    C3B    12         0        72.0000
13          C1    1    DPGS    C4B    13         0        72.0000

Based on other lipid mappings and the connectivity lower down in the file, and the drawing in Figure 2, beads 1-3 are the glucose ring particles, bead 4 is the connection to the glycerol parent of the lipid, bead 5 is the amide, and the other beads are the tails. We should connect bead 5 to bead 3 instead of bead 4 as is done in DPGS, then keep the tail connected to bead 5, which are beads 10-13. Based on 12 CH2-groups, the original bead 13 will be the carboxylic acid end bead. This means all assignments can stand, except that for bead 13, which instead of C1 should probably best be P3. This assignment is based on the Martini 2007 paper, where P3 is representative of acetic acid. So, the first step is to copy the DPGS file, rename DPGS to GDAL, remove beads 4-9, change the type of bead 13 from C1 to P3 and renumber the remaining beads. Before renumbering, perhaps it is best to remove all lines in the [ bonds ], [ angles ], and [ dihedrals ] sections that involve the original beads 4-9. Now the renumbering can be done by changing all entries 5 to 4, 10 to 5, etc. Automatically, the bonded parameters are preserved. There is one thing we need to fix, and that is the bonded parameters around the old bead 5, the new bead 4. The ring bead-amide distance, and ring bead-amide-hydrocarbon angle should be given reasonable values. What are these? The moiety itself is fairly rigid, so the angle around the amide is likely something like 180 degrees, with reasonably high force constant. The ring bead-amide distance may be close to the backbone-backbone distance in proteins. You can look these up in the Martini publications.


Now you have a topology. You can download also the coordinates for DPGS, remove beads 4-9 and use those coordinates for a minimization in vacuum. This coordinate file can then be used to partly fill an empty simulation volume with randomly placed molecules. Water may be added, 10% of the water may be made into anti-freeze water, and a self-assembly simulation may be run as in the lipid tutorial. Follow the protocol shown there.

If you've done all that, you may want to have a look at how one of us did it by downloading this archive. 200 molecules certainly like to form a bilayer, but it does not appear all that bent, maybe because the different head groups not separated on the different sides of the amphiphile layer as proposed by Masuda and Shimizu. This may again be due to the limited number of molecules, the periodic boundary conditions and/or some of the parameters, or much longer relaxation times being required to allow the lipids to flip-flop. This is where the fun starts! You are free to change interaction strengths one level up or down, which means you could test different bead assignments. It may be useful to check whether a single molecule in water has the correct flexibility by running an atomistic simulation and performing the mapping. A similar procedure is shown below for toluene and elsewhere in the lipid and polymer tutorials. Have fun!       


Parametrizing a new molecule based on atomistic simulations

In this part of the tutorial I'll explain how to create a Martini topology for a molecule that has not been parametrized before. The procedure is highly dependent on the chemical structure of the molecule and what (experimental) data is available for your molecule. In this example we use a small molecule, toluene, for which an atomistic topology and water-octanol partitioning data (LogP) is available (2.73 from The tutorial assumes you have general experience with the Martini force field (if not, try the other tutorials on this webpage), experience with Gromacs (if not, check the Gromacs-pages and thThe final step should say: "World Victory!"e tutorials mentioned there) and experience with a text editor.

1) Generate atomistic reference data. For most reliable parametrization, atomistically simulate the molecule in solution. Martini has mostly been based on Gromos, but you can use any force field you like. If no topology exists you could use the ATB or the GAFF force field to construct one (this might be quite some work as well). Alternatively, if other people have published results with atomistic resolution for the molecule in question, you may rely on this data as reference material.

The choice of solvent is of importance: The molecule might behave differently in aqueous or apolar solvents and the CG model might not be able capture both equally well. Choose a solvent comparable to the environment where the molecule will spend most of its time. If both environments are important, you may choose to compromise and get the best of two worlds.

For toluene you can run the simulations using (click on the files to download):

grompp -f gromos.mdp -c gromos.gro -p -o gromos
mdrun -v -deffnm gromos

This lets you do a simulation of a single toluene molecule in decane as a solvent. Notice that the 1 ns simulation time used in the example is enough because toluene is such a simple molecule.

2) Choose an atom-to-bead mapping. This is the heart of "Coarse Graining" and relies on experience, chemical knowledge and trial-and-error. My favorite method: Draw your molecule on paper, copy it 10 times and start drawing and trying: It's just like primary school. There are no hard rules but here are some tips you could use:

- You have to use the beads in Table 3 from the original Martini paper. Find chemical sub-structures in your molecule that match the example fragments in that table. Don't worry about bonds/angles/dihedrals between the beads yet.

- Split the molecule in blocks of aproximately 4 heavy atoms. For planar or ring structures or when 4 atoms simply don't fit use S-beads. Remember that S-beads only have a small radius compared to standard beads and will thus have the same partitioning free energy (between water and octanol) as normal beads. For example one SC3 bead in P4 solvent (water) will behave exactly as a C3 would.

- Don't use partial charges. Except for fully charged molecules, Martini beads are neutral (exceptions could be made for strongly delocalized charges as found in metal ligands, for instance). For charges use the 'Q'-type beads.

- Compare to existing Martini topologies. Your molecule might contain chemical structures that do not appear in Table 3. But maybe someone else already tackled that problem, for example in proteins, sugars, etc.

- Respect the symmetry of the molecule, and use the same beads for the same chemical groups within one molecule.

3) Coarse-grain target atomistic data. Using your just created mapping, reverse transform the simulation you did at 1) to CG resolution. There are many ways to do this: you can use the reverse transformation tool or use However, in my opinion the simplest method is to create a Gromacs index file where every index group stands for a bead and contains the mapped atom numbers. For toluene run:

seq 0 2 | g_traj -f gromos.xtc -s gromos.gro -oxt mapped.xtc -n mapping.ndx -com -ng 3

where 3 is the number of CG beads in your molecule and the 'seq 0 2' avoids having to type all index groups. Inspect the file mapping.ndx to see how the atoms of toluene are grouped into the CG beads.

4) Generate target CG distributions. Go back to the drawing table and identify between which beads there should be bonded interactions (bonds, constraints, angles, proper and improper dihedrals). Measure the length/angle of these interactions in you mapped-to-CG atomistic simulations from step 3 (mapped.xtc). To do this, create a index file with a directive for "[bonds]" containing (multiple) pairs of CG beads, "[angles]" containing triples and "[dihedrals]" quartets. Use g_bond for the bonds, g_angle for angles and dihedrals:

g_bond -f mapped.xtc -n bonded.ndx -noaver -d
g_angle -f mapped.xtc -n bonded.ndx -ov angles.xvg -type angle -all

From the generated files, create histograms (in the g_angle output the first dataset is the average and can be discarded). You can use g_analyze. e.g. for the angles:

g_analyze -f angles.xvg -dist angledistr.xvg -bw 1.

In some cases it might be sufficient to calculate the average and standard deviation for all the distributions, however the distributions might be bimodal and it is always wise to inspect the distributions! Using xmgrace for the angle distributions, type:

xmgrace -nxy -dist angledistr.xvg

A word of warning is in order here. You may see strange bond distributions if your mapped molecule is split by the periodic boundary conditions. To remedy this, you can first convert your atomistic trajectory to make molecules whole (trjconv -pbc whole) and then do the mapping. Another thing to beware of is that the g_bond command shown above does give average and standard deviation of the separate bonds in the list in the index file in the output file bonds.log, but only the average value is plotted in the file distance.xvg. To get separate distributions for the different bonds, you either need to run g_bond for each bond separately, or follow the procedure shown below for the CG trajectory. Once you have a CG .tpr file, you can run the CG procedure also on the mapped trajectory and have a direct comparison of bond length distributions. 

5) Create the CG itp file. This has to be done by hand, although some copy-pasting from existing itp files might help to get the format right. The different directives you (probably) need are:

- "[ moleculetype ]": One line containing the molecule name and a number of exclusions. For Martini the standard is 1.
- "[ atoms ]" One line per atom, containing atomnumber, beadtype, residuenumber, residuename, atomname, chargegroup, charge. Residue number and residue name will be all the same for small molecules. Atomname can be freely chosen. In Martini every bead has its own charge group. Charge can be 0,1 or -1. Mass is not specified and taken from the bead definition.
- "[ bonds ]" One line per bond, containing atom 1, atom 2, function, average length, force constant. The functional form for Martini is 1. Bond length can be set to the average length obtained in step 4. The force constant should be adjusted to the width of the obtained histograms: small width means high force constant and visa versa.
- "[ constraints ]", "[ angles ]", "[ dihedrals ]", see the gromacs manual for the right format.

6) Create CG simulation. Extract one frame from your mapped trajectory and solvate it in the same solvent as the atomistic simulation. Create a top file (include the general martini itp files and the newly created molecule itp and add the right number of molecules (solute and solvent)). Create a Martini mdp file (see examples on the website). Simulate the initial form of the CG molecule (you probably will have to energy minimize and maybe relax for a few thousand steps at a smaller timestep (e.g. 5fs). You also need itp, itp and itp. It may be wise to create a separate directory for the CG system.

grompp -f em.mdp -c martini.gro -p -o em
mdrun -v -deffnm em
grompp -f relax.mdp -c em.gro -p -o relax
mdrun -v -deffnm relax
grompp -f md.mdp -c relax.gro -p -o md
mdrun -v -deffnm md

In this simulation measure the bonded interactions similar to what you did in step 4 for the mapped trajectory. You can use the same index file.

g_bond -f md.xtc -s md.tpr -n bonded.ndx -noaverdist -d bonds_martini.xvg
g_angle -f md.xtc -n bonded.ndx -ov angles_martini.xvg -type angle -all

 Note the difference on the g_bond command compared to the treatment of the mapped trajectory. Here, you add a directive for the tpr file (-s md.tpr) and set -noaverdist.


7) Optimize CG bonded parameters. In step 6 two things might happen: The simulation finishes and you obtain distributions or the simulation crashes. In the first case, you can compare the distributions to the ones obtained in step 4 and go back to step 5 to adjust the parameters in the itp to obtain better distributions. Too wide distributions mean you'll need higher force constants, averages being off means you need to shift, etc. If the simulation crashes you will also have to go back to step 5, now with the aim to make the molecule more stable. This often means lowering force constants, removing uncessesary dihedrals or changing constraints to bonds or vice versa. Some common source of instability are:

- If a (proper) dihedral is defined over 4 atoms and the angle between 3 atoms is very wide (close to 180 degrees) this may lead to heavy fluctuations and crashes. You'll might have to remove the dihedral or use special types of dihedrals.
- In ring like structures beads might be connected in triangles. If constraints are used between those beads, LINCS might not be able to solve them.
- Three beads, a,b and c, are linearly connected with short, weak bonds, bead b will be excluded from a and c. If a and c strongly attrack each other, a or c might start overlapping with b.

Repeat until you're happy with the result, then move to the last step.

8) Compare to experimental results. Partitioning free energies (we are currently working on a tutorial explaining how to efficiently calculate those for Martini molecules) are particularly good, but for pure liquids also densities can be used or any other experimental number you can lay your hands on.