This tutorial introduces the student to the practice of Molecular Dynamics (MD) simulations of proteins. The protocol used is a suitable starting point for investigation of proteins, provided that the system does not contain non-standard groups. At the end of the tutorial, the student should know the steps involved in setting up and running a simulation, including some reflection on the choices made at different stages. Besides, the student should know how to perform quality assurance checks on the simulation results and have a feel for methods of analysis to retrieve information.
The aim of this tutorial is to investigate differences in the conformation and dynamics of two human Ubiquitin-Conjugating enzymes (E2) and some mutants. At the end of the tutorial the student should be able to:
For this tutorial students are expected to team up in groups of four. Each group will perform simulations on four E2 proteins, thus each member of each group chooses one protein as the subject of the tutorial. At the end of the tutorial, the results from all four simulations will be combined.
Each group of students should write a report of no more than eight pages, reflecting on the purpose of the work and the results you obtained. In addition, specific questions are given to drive you through the tutorial and to help you with the report. A description of the report, with a summary of the questions asked during the practical will be given on the report page. In the text, questions and assigments are indicated by grey boxes, like the following:
Write down your name and those of the members of your group
Commands are given in white on a blue background. These have to be typed carefully, since the shell (the program parsing the commands) is case-sensitive. A common error that may occur is replacing a 0 (digit zero) for an O (capital letter O), an l (lower case letter l) for a 1 (digit one), or vice-versa. You might want to copy-paste the commands, which is as simple as selecting them with the mouse and pressing the middle mouse button on the spot where the command should be entered. Now first try the following commands:
This lists your current user name. Make sure you're not logged in as "root".
This gives a listing of the things that are in the directory where you are. Use this if you encounter errors like "file not found".
This command shows you the full path of the directory where you are.
Mind that copy-pasting does not relief you from reading the text! You can't run the tutorial without the instructions around the commands. It is naively assumed that the intention of the one following the tutorial is to learn something. In some cases you might be reminded to read carefully, by a comment like the following:
Ubiquitin-Conjugating enzymes, also known as E2 enzymes, perform the second step in the ubiquitination pathway. This reaction refers to the post-translational modification of a protein by the covalent attachment of one or more ubiquitin monomers. The most prominent function of ubiquitin is labeling proteins for proteasomal degradation. Besides this function, ubiquitination also controls the stability, function, and intracellular trafficking of a wide variety of proteins.
Ubiquitination is an ATP-dependent process that involves the action of at least three enzymes: a ubiquitin-activating enzyme (E1), a ubiquitin-conjugating enzyme (E2), and a ubiquitin ligase (E3), which work sequentially in a cascade. Mutations linked to this pathway can cause human diseases like cancer, Parkinson's and cardiovascular diseases. The transfer of the ubiquitin tag to a substrate is highly specific, and relies mainly on the E2-E3 interactions.
Analysis of the human genome reveals the presence of 37 functional E2s and more than 700 E3 enzymes This E3 abundance implies that a single E2 must interact with multiple E3s. The family of ubiquitin-conjugating enzymes is characterized by the presence of a highly conserved ubiquitin conjugated (UBC) domain. The interaction surface with the E3 recognition domains involves three structural elements (namely helix1, loop1 and loop2).
We propose here to investigate the influence of a single and conserved mutation in loop1 on the behaviour of two different E2 proteins by molecular dynamics simuations. Four simulations will be performed on the wild type proteins and two specific mutants and the results will be compared to each other.
Classical molecular dynamics simulations use Newton's equations of motion to calculate trajectories of particles, starting from a defined configuration. For each particle in the system, the total force acting on it is calculated from the interactions with other particles, as described by the force field. The force divided by the mass of the particle gives the acceleration, which, together with the prior position and velocity, determines what the new position will be after a small time step. The high spatial and temporal resolution make molecular dynamics simulations useful for testing models based on experimental data, for understanding principles underlying the function and to formulate new hypotheses. Unfortunately, system sizes are limited, as are time scales.
This tutorial uses Gromacs (http://www.gromacs.org/) for performing and analysing molecular dynamics simulations. Gromacs is a suite of programs which is freely available under the GNU GPL (General Public License). The programs have a command-line interface, which means that each step involves typing the name of the program and a number of arguments. Note that the commands are case sensitive and each command has to be typed exactly as in the tutorial. More information about Gromacs as well as the manual can be found on the web site.
Since the programs have a command-line interface, there is no escape from using a terminal. Although it is possible to run Gromacs under Windows in a DOS terminal, there are several benefits attached to using Linux, which is the choice for this tutorial. For some students the transfer to Linux from Windows will form an obstacle as they are much used to the interface Windows offers. It is important to note that Linux is not intended to be a free clone of Windows. It is a powerful, highly costumizable operating system, which allows one to get much more performance from a computer. The transfer from Windows to Linux is sometimes described as switching from a motor cycle to a car. To start using the Linux terminal, it is necessary to know the most basic commands (ls,cd,mkdir,cp,mv,rm,more). Some more information about linux/unix can be found here and here. You can also refer to the reference card (cheat sheet) given during the werkcollege.
To setup your linux system environment properly, copy the .cshrc and .alias_csh files from the course directory to your home directory with the following commands:
cp /home/projects/molmod/.cshrc .
cp /home/projects/molmod/.alias_csh .
You will then need to make sure that you are using the correct shell (csh).
If you're not in the right shell, changing is as easy as:
Finally, you can check if GROMACS is correctly installed using the g_luck command. If you are lucky, you will get a quote. Enjoy!
By default, you start with a Bourne Again shell (bash). Remember that you MUST switch to a C-shell (csh) or TC-shell (tcsh) to load the environment variables. This has to be done for every new sessions!
Before anything else, starting structures have to be obtained. These can be retrieved from the Protein Databank, which is a repository for three dimensional structures of proteins. To start the tutorial, download the structures with ID's 1y6l, 3bzh, from the database. All the students in a group must download these starting structures and compare them.
Check whether the downloaded structure is in fact a Ubiquitin-Conjugating protein; Don't mistake 1y6l for 1y61!
For each structure, write down the method used to solve the structure and its resolution.
Now first have a look at the structure in a molecular viewer. The following instructions are for Pymol, which should be available on your machine. Load the structures in Pymol using
Now Pymol should start and a window should appear showing the structures in line representation. The models are listed on the right side of the main window and can be removed from view by clicking on the name. Next to each model name are menus which allow changing the representation. Try to show the structures as cartoons and color each chain with rainbow colors from N- to C-terminus. For those inclined to use a keyboard, which is strongly encouraged, the above can also be achieved by typing in the window:
hide everything, all
show cartoon, all
spectrum count, rainbow, 1Y6L
To get a better view of the structural homology, fit each structure onto 3BZH . This can be done using the command 'align'. To align structure 1Y6L onto 3BZH type:
align 1Y6L, 3BZH
Note the similarities and dissimilarities between the different models. To see the differences better, it may be necessary to give each model a separate color again. Try zooming in on regions which are different and look at specific residues. You can change the representation of parts of a molecule by right-clicking on the chain. If you like the image you have, you can further improve it by typing 'ray' and save the resulting picture using 'png filename' to have a lasting memory of this tutorial.
Save an image of the aligned structures for the report
Now exit Pymol using the command 'quit'. As you may have noticed, all the
information necessary to draw the structures is in the respective .pdb
files. Have a look at the file using the command 'less' and try to understand
the file format. 'less' is somewhat like 'more', but allows more control. The
space bar and 'b' scroll forward and backward respectively, and with 'g' and
'G', you can go all the way to the top or to the end.
The PDB file contains a lot of information regarding the protein, the experimental methods used to determine it, conditions, etc. It also contains a listing of each atom with the Cartesian coordinates. Note that there is no information in the file regarding bonding, whereas Pymol, as most molecular viewers do, did draw bonds between atoms. These bonds were inferred from the interatomic distances.
To avoid problems with missing atoms, please download the four starting structures (PDB files) from the Blackboard!!!
What are the difference between the original PDB files (3BZH and 1Y6L) and E2Awt and E2Bwt?
Compare the wild type files and their mutants, where is the substitution? Look also at the differences between E2Awt and E2Bwt.
To get started, make a directory for each of the structures. As results may be combined in the end, use a self-explanatory and unique directory name by combining the PDB id (E2Awt, E2Bwt, E2Amut, or E2Bmut) with a group or personal identifier (e.g. group number or student name). Copy the structure file into the directory and change the directory. Student Pietje Puk, performing simulations on the wild type form of E2Awt could, for example, type:
cp E2Awt.pdb E2Awt_pietje_puk/
Now it's time to start with the real Molecular Dynamics part. Remember to fill in the right filenames at every stage. In particular, the tutorial simply refers to "protein.pdb" and "protein-EM-solvated.gro", as generic names, which should be replaced with names specific to your protein (e.g. "E2Awt.pdb" and "E2Awt-EM-solvated.gro").
HINT: Rename your protein "protein.pdb" and you can simply copy-paste all commands.
Be sure to read carefully and to check at each step whether it was successful. Read the output! In case a program gives an error message, it is usually self-explanatory. Check file formats and program output to understand the processes at each step. Most of the files are readable, except for files ending in .tpr, .xtc, .trr and .edr.
If you feel ready, click here to proceed.