- Posts: 211
simulation works on just one node
- visvaldas
- Topic Author
- Visitor
I am embarrassed to ask this question, but for some reason I can only run MD on one node (-nt 1 option).
If I run a system on, say, 2 nodes (mdrun -deffnm test) which I have on my workstation, I get this error:
....
Reading file test.tpr, VERSION 4.6.3 (single precision)
Using 2 MPI threads
starting mdrun 'Protein in INSANE! Membrane UpperLeaflet>DOPC:DOPE=1:1 LowerLeaflet>DOPC:DOPE=1:1'
2000000 steps, 40000.0 ps.
Step 10, time 0.2 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 0.000461, max 0.013153 (between atoms 9594 and 9593)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
11112 11111 31.1 0.1400 0.1418 0.1400
11113 11111 37.8 0.1400 0.1390 0.1400
11868 11867 38.3 0.1400 0.1391 0.1400
11869 11867 37.3 0.1400 0.1402 0.1400
9594 9593 31.5 0.1400 0.1418 0.1400
9595 9593 42.5 0.1400 0.1397 0.1400
Step 11, time 0.22 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 0.000330, max 0.008240 (between atoms 11113 and 11111)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
11112 11111 33.0 0.1418 0.1410 0.1400
11113 11111 36.1 0.1390 0.1388 0.1400
9595 9593 38.0 0.1397 0.1399 0.1400
Step 12, time 0.24 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 0.000319, max 0.009542 (between atoms 11113 and 11111)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
11112 11111 33.1 0.1410 0.1405 0.1400
11113 11111 32.7 0.1388 0.1387 0.1400
11791 11789 33.4 0.1400 0.1400 0.1400
Step 13, time 0.26 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 0.000400, max 0.015757 (between atoms 11113 and 11111)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
11112 11111 38.6 0.1405 0.1405 0.1400
11113 11111 34.6 0.1387 0.1378 0.1400
11791 11789 30.4 0.1400 0.1400 0.1400
Step 14, time 0.28 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 0.000473, max 0.020890 (between atoms 11113 and 11111)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
11112 11111 42.6 0.1405 0.1407 0.1400
11113 11111 33.4 0.1378 0.1371 0.1400
Step 15, time 0.3 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 0.018713, max 1.049266 (between atoms 12897 and 12896)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
12897 12896 87.3 0.1400 0.2869 0.1400
12898 12896 40.0 0.1400 0.1229 0.1400
Step 15, time 0.3 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 0.069969, max 3.847600 (between atoms 6316 and 6314)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
6315 6314 90.0 0.1400 0.3249 0.1400
6316 6314 90.0 0.1400 0.6787 0.1400
11112 11111 44.0 0.1407 0.1402 0.1400
Wrote pdb files with previous and current coordinates
Wrote pdb files with previous and current coordinates
Step 16, time 0.32 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 6.422812, max 276.930695 (between atoms 5536 and 5534)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
5535 5534 110.0 0.1400 27.9168 0.1400
5536 5534 99.3 0.1400 38.9103 0.1400
12897 12896 90.0 0.2869 5.9952 0.1400
12898 12896 90.0 0.1229 16.5193 0.1400
Step 16, time 0.32 (ps) LINCS WARNING
relative constraint deviation after LINCS:
rms 948.488403, max 28138.222656 (between atoms 6316 and 6314)
bonds that rotated more than 30 degrees:
atom 1 atom 2 angle previous, current, constraint length
6315 6314 89.8 0.3249 3899.5447 0.1400
6316 6314 90.0 0.6787 3939.4915 0.1400
11379 11378 90.1 0.1400 3858.9084 0.1400
11380 11378 90.2 0.1400 3740.9395 0.1400
11112 11111 40.2 0.1402 0.1394 0.1400
Wrote pdb files with previous and current coordinates
Wrote pdb files with previous and current coordinates
Segmentation fault
I used mdp files from tutorials, in the case above using elnedyn f.f. with PW.
Actually, for me Martini never worked on more than one node, even for much simpler systems, except for perhaps energy minimization runs). What I am doing wrong, because for everybody else parallelization seems to work fine?
Best regards,
Visvaldas
Please Log in or Create an account to join the conversation.
- Clement
- Offline
- Admin
You see arriving the simplest solution then: run more equilibration steps! Let's say with position restraints on the backbone of your protein, a smaller time steps, etc.
To fix the parallelization problem (related to the previous lack of equilibration), run the first 10000 steps of your MD on one node, and then switch to the parallel code; it usually does the trick (and doesn't affect the simulation much since one never really cares about the first steps of a production run...).
Please Log in or Create an account to join the conversation.
- visvaldas
- Topic Author
- Visitor
Anyway, as you suggested, I took a finished MD run which ran for 250 ns on one node, and extended it to 300 ns, and started a continuation of a calculation on two nodes... It crashed with similar errors as above, 10 steps from the start. :(
Best,
Vis
Please Log in or Create an account to join the conversation.
- Clement
- Offline
- Admin
- Posts: 211
Please Log in or Create an account to join the conversation.
- visvaldas
- Topic Author
- Visitor
Please Log in or Create an account to join the conversation.
- Clement
- Offline
- Admin
- Posts: 211
visvaldas wrote: The thing is if LINCS does not work on two nodes, why is it working on one node? To me this is almost some communication between nodes problem...
LINCS shouldn't have any problem running on multiple nodes. And the errors returned by GROMACS due to communication issues aren't looking like that.
Could you send me your .tpr file? I'll try to reproduce the error.
Please Log in or Create an account to join the conversation.
- visvaldas
- Topic Author
- Visitor
Please Log in or Create an account to join the conversation.
- Clement
- Offline
- Admin
- Posts: 211
Looking at your system, you need to build a box of at least 15x15x12nm, but you could go slightly larger. And it's a bilayer system embedding a protein, you should make the box orthorhombic (only three dimensions on the last line of your PDB).
I didn't spend time to check the parameters of your simulation. Make sure they're alright for what you want to do...
Please Log in or Create an account to join the conversation.
- visvaldas
- Topic Author
- Visitor
Just out of curiosity, why I have to use orthorombic (I guess this would correspond to
"rectangular" option in insane.py) and can't use hexagonal or rhombic dodecahedron unit cell for my system - what's special about protein placement in a membrane in a tetragonal lattice as opposed a honeycomb pattern? Surely the latter could save some resources?
Best regards,
Visvaldas
Please Log in or Create an account to join the conversation.
- Clement
- Offline
- Admin
- Posts: 211
But it's just a suggestion... ;-)
Please Log in or Create an account to join the conversation.