Time series data for water molecules from its dcd file

Question

I am trying to make a file which contain time series data of water molecules from dcd file. Is it possible to generate this data using any of MDAnalysis module or function? Or is there any python script to generate this file?

I need to generate this file containing two columns (one with z coordinates of water molecules and 2nd with respective timesteps) by using DCD file as input.

score 0 · Accepted Answer · answered Mar 04 '20 at 23:16

0

You can get the (z, t) time series in a variety of ways but I am showing the most basic one here. I assume that you also have a PSF topology file in addition to your DCD trajectory file (but really, any topology and trajectory file format will work in MDAnalysis). I also assume that the water oxygen atoms are named "OW".

I am actually not clear how you want your "z, t" data structure to look like. If you have N water molecules, then you will have N z-coordinates per time step so I don't know how this makes sense as "two columns", assuming that you want each "row" to be a different time step. Instead I will use the following data structure: the final output will be an array with shape (T, N+1) where T is the number of time steps in the trajectory and N is the number of waters. Each row of the array contains [t, z1, z2, ..., zN], i.e., time and z-coordinate of water i.

import numpy as np
import MDAnalysis as mda

u = mda.Universe(PSF, DCD)
water_oxygens = u.select_atoms("name OW")

# pre-allocate the array for the data
data = np.zeros((u.trajectory.n_frames, water_oxygens.n_atoms + 1))

for i, ts in enumerate(u.trajectory):
   data[i, 0] = ts.time                          # store current time
   data[i, 1:] = water_oxygens.positions[:, 2]   # extract all z-coordinates

# now data contains your timeseries and you can work with it
# (or export it using np.savetxt()

For a introduction to MDAnalysis see the User Guide which also has a quickstart guide that explains selections and trajectory iteration.

For further questions ask on the MDAnalysis Google group where you typically get the fastest answers.

answered Mar 04 '20 at 23:16

orbeckst

3,189
1
17
14

Thank you so much orbeckst for your detailed answer. But I needed to change your sent mdanalysis script accordingly. As I already extracted water pdb and dcd file from the whole system using msanalysis. And loaded dcd and pdb files to universe and got my output. Here is the complete script I used: – iqra khan Mar 05 '20 at 09:49
import MDAnalysis as mda from MDAnalysis.analysis import hole from MDAnalysis.analysis.hole import HOLEtraj import numpy as np u= mda.Universe("ho.pdb","ho.dcd") #water #molecules pdd and trjaectory data is #loaded to universe data= np.zeros((u.trajectory.n_frames, len(u.residues)+1)) ts1=u.trajectory.ts for i, ts in enumerate(u.trajectory): data[i,0]= ts.time data[i,1:]= ts1.positions[:, 2] – iqra khan Mar 05 '20 at 09:57
data[0:3][0:5] #here is the extracted output: array([[ 0. , 24.3392067 , 23.94203758, ..., 40.12068939, 21.93716621, -30.37293243], [ 1.00000003, 24.3392067 , 23.94203758, ..., 40.12068939, 21.93716621, -30.37293243], [ 2.00000007, 24.3392067 , 23.94203758, ..., 40.12068939, 21.93716621, -30.37293243]]) now I am actually confused about timestep data. In first column it is showing the frame numbers not timesteps in ps(picoseconds). Isn't there any difference between timesteps and frames? – iqra khan Mar 05 '20 at 10:02
What is the timestep that you expect to see? Your output seems to indicate that it is 1 ps. Whatever `ts.time` shows is what MDAnalysis gets from your DCD file. What code produced the DCD file? You can also loop through your trajectory and `print(ts.frame, ts.time)` to see what you have. If you don't get what you expect, file an issue at https://github.com/MDAnalysis/mdanalysis/issues – orbeckst Mar 06 '20 at 18:43
Dcd file is produced from simulation results using NAMD. I dont know my timestep for simulation was 2fs and simulation time is 150 ns. But timesteps in ps generated by this script is as same as frame numbers. That is why i am asking is it something we need to calculate ourselves? – iqra khan Mar 09 '20 at 12:26
`ts.frame` gives the frame number, `ts.time` the time of the snapshot in ps. You should know how often you save your trajectory data and so you should know what you expect to see for `ts.time`. For instance, if you happen to save every 500 steps with an integrator timestep of 2 fs then you would have a frame ever ps. – orbeckst Mar 10 '20 at 15:50
By the way, you can also answer your own questions and then you can properly format the code. – orbeckst Mar 10 '20 at 15:51
If my answer solved your problem then accept it as the answer so that future readers understand that this is a solution. If you found a different solution then post it and accept it. Either way, if you have a working answer, make sure that it appears here on S/O so that this question is not left open without an accepted answer. – orbeckst Mar 13 '20 at 17:18
Hello orbeckst, wish you good health. Your answer helped alot, but I want to add one column containing water residue ID. for this, what should I add in the script? – iqra khan Jul 06 '20 at 19:33
Ask on the user list, please. https://groups.google.com/group/mdnalysis-discussion Thanks. – orbeckst Jul 07 '20 at 21:21

Time series data for water molecules from its dcd file

1 Answers1