Combining multiple queries with numpy masks

Question

what i am trying to do is plot two rows out of a file looking like this:

number          pair        atom       count         shift      error
 1            ALA ALA       CA         7624           1.35           0.13
 1            ALA ALA       HA         7494          19.67          11.44
38            ARG LYS       CA         3395          35.32           9.52
38            ARG LYS       HA         3217           1.19           0.38
38            ARG LYS       CB         3061           0.54           1.47
39            ARG MET       CA         1115          35.62          13.08
39            ARG MET       HA         1018           1.93           0.20
39            ARG MET       CB          976           1.80           0.34

What i want to do is to plot the rows that contain atom CA and CB using their atomvalues. so basically i want to do :

atomtypemask_ca = data['atom'] == 'CA'
xaxis = np.array(data['shift'][atomtypemask_ca])
aa, atom = data['aa'][atomtypemask_ca], data['atom'][atomtypemask_ca]

atomtypemask_cb = data['atom'] == 'CB'
yaxis = np.array(data['shift'][atomtypemask_cb])

plot (xaxis, yaxis)

what is kind of ruining my day is the reason that some values don't have a CB entry. How can i plot this kind of thing, ignoring entries that have only one of the two atomvalues set? I can of course program it, but i think this should be possible using masks, therefore producing cleaner code.

By "rows" in the first sentence you mean "column", right? What do you want to plot actually? I'll try if I got you properly: Atom 1 has only CA and not CB -- so you do not want to include atom 1 in your plot, right? Atoms 38 and 39 are "valid" and should be included, yeah? — Dr. Jan-Philip Gehrcke, May 22 '12 at 10:07
atom 1 has no CB atom, and therefore should not be plottet. Atoms 38 and 39 do have CB, shouldbe plotted — tarrasch, May 22 '12 at 11:00

Avaris · Accepted Answer · 2012-05-22T21:15:26.660

2

I'm guessing, first column is the residue number. Use that. I don't know your data structure or what shift refers to, but you should be able to do something like this:

In : residues
Out: array([ 1,  1, 38, 38, 38, 39, 39, 39])

In : atom
Out: 
array(['CA', 'HA', 'CA', 'HA', 'CB', 'CA', 'HA', 'CB'], 
      dtype='|S2')

In : shift
Out: array([7624, 7494, 3395, 3217, 3061, 1115, 1018,  976])

# rows with name 'CB'
In : cb = atom=='CB'

# rows with name 'CA' _and_ residues same as 'CB'
In : ca = numpy.logical_and(numpy.in1d(residues, residues[cb]), atom=='CA')
# or if in1d is not available
# ca = numpy.logical_and([(residue in residues[cb]) for residue in residues], atom=='CA')

In : shift[ca]
Out: array([3395, 1115])

In : shift[cb]
Out: array([3061,  976])

edited May 22 '12 at 21:15

answered May 22 '12 at 10:24

Avaris

35,883
7
81
72

nice answer, but in1d is too new for my python. will accept if no better answer comes up – tarrasch May 22 '12 at 12:31
@tarrasch: you can replace that part with a list comprehension: `[(residue in residues[cb]) for residue in residues]`. It'll be slower for large arrays but will work. – Avaris May 22 '12 at 12:36
can you integrate the alternate expression also into your code for the next one trying this? accepted and upvoted. – tarrasch May 22 '12 at 12:54
actually, i cant seem to get this to work using the list comprehension. can you show me how to do it? – tarrasch May 22 '12 at 15:16
ok, now i got it :) it was just that my datastructure was wrong. thanks ! – tarrasch May 22 '12 at 16:24
@tarrasch: I've edited and included that part for completeness sake. Cheers :). – Avaris May 22 '12 at 21:16

Combining multiple queries with numpy masks

1 Answers1