0

I'm a bit stumped with what logic to use to be able to match a list to a CSV-file/list containing values. I've had an idea to use a for loop to simple iterate through the CSV and match it:

for j in range(len(data)): 
    if STR_list[j] in data[j]: 
        print(data[j])

But this doesn't actually print out matches as I want it. Here is what the the data and STR_list values look like when printed (before the for loop above):

print(STR_list):
['AGATC', '4', 'AATG', '1', 'TATC', '5']  

print(data)
[OrderedDict([('name', 'Alice'), ('AGATC', '2'), ('AATG', '8'), ('TATC', '3')]), OrderedDict([('name', 'Bob'), ('AGATC', '4'), ('AATG', '1'), ('TATC', '5')]), OrderedDict([('name', 'Charlie'), ('AGATC', '3'), ('AATG', '2'), ('TATC', '5')])]

So in this case, the row with 'Bob' would have been a match as the values line up. Should I be using regex for this or am I right in thinking a for loop could be used?

Edit: Here is how I open the CSV (so it seems like it's a list after all?)

with open('file.csv') as csvfile:
reader = csv.DictReader(csvfile)
data = list(reader)
  • what should the expected output be? Could you display that for us? – M Z Aug 13 '20 at 14:36
  • I think my code (the for-loop) is badly written for it's intended use though. It should match to the strings after 'name', 'Bob' so ultimately I want to get the name 'Bob' in this example as he would be the correct match (if that makes sense). – herrvogelberg Aug 13 '20 at 14:45
  • 1
    I think the best way is to update your OP with the expected output – M Z Aug 13 '20 at 14:46
  • iterate through your list of search terms two at a time. The first is the name, the second is the value. Look for the tuple (name, value) in target, where target is an entry in data. – Kenny Ostrom Aug 13 '20 at 15:24
  • incidentally, I have a working solution with a lookup dict based on collections.namedtuple, although it would be better with a defaultdict(set) for each field. – Kenny Ostrom Aug 13 '20 at 15:26
  • Trying to wrap my head around this. I'm brand new to all of this so very little makes sense to me so far. Will read up on dicts and tuples, I keep running in to errors. – herrvogelberg Aug 13 '20 at 15:48
  • The same data set was discussed [here](https://stackoverflow.com/questions/62855413/how-can-i-loop-through-this-dictionary-instead-of-hardcoding-the-keys/62855607#62855607) – Chris Charley Aug 14 '20 at 05:12

2 Answers2

0
if STR_list[j] in data[j]: 

This line should be tabbed over to be within the for loop. Assuming that was a copy/paste error:

STR_list[j] isn't looking at the entire STR_list, only the j-th item in it (So when you look at data[0], you're looking at 'AGATC', when you look at data[1] you're looking at '4' etc).

What you want is to look for all of STR_list and see if it is in the 2nd, 3rd, and 4th position of each check for data.

Additionally, STR_list needs to be formatted the same way that data is, so you would want a list of tuples there (or an OrderedDict, I'm not familiar with that data type so I don't know if that's entirely what makes up data).

Really, what you want to look for is the equivalent of:

if [('AGATC', '2'), ('AATG', '8'), ('TATC', '3')] in a subset of OrderedDict([('name', 'Alice'), ('AGATC', '2'), ('AATG', '8'), ('TATC', '3')])

I realize I'm not giving you exactly the code you need, but I hope I'm explaining it so that you can understand and figure it out for yourself.

samsonjm
  • 270
  • 2
  • 12
  • Thanks for this! I'm trying to wrap my head around it, I have no been able to make my STR_list to an OrderedDict as well so the output of them are like this: print(STR_odict): OrderedDict([('AGATC', '4'), ('AATG', '1'), ('TATC', '5')]) print(data): [OrderedDict([('name', 'Alice'), ('AGATC', '2'), ('AATG', '8'), ('TATC', '3')]), OrderedDict([('name', 'Bob'), ('AGATC', '4'), ('AATG', '1'), ('TATC', '5')]), OrderedDict([('name', 'Charlie'), ('AGATC', '3'), ('AATG', '2'), ('TATC', '5')])] I'm getting closer. – herrvogelberg Aug 13 '20 at 15:24
  • Have you also changed your for loop? What is your for loop code and the output now? – samsonjm Aug 13 '20 at 15:31
  • Still trying to work it out...it seems my data value is in fact a list and not an OrderedDict. It's very confusing. – herrvogelberg Aug 13 '20 at 15:42
  • This should mean that data is a list of list of tuples which I'll represent as this -> [[(,),(,),(,),(,)],[(,),(,),(,),(,)]] Your for loop is iterating over the outer list, giving you a list of tuples that you are checking for each iteration -> [(,),(,),(,),(,)] That means you want your STR_list to also be a list of tuples -> [(,),(,),(,)] But this is a shorter list of tuples than your data[j] list, so you'll want to look at a subset of data[j] – samsonjm Aug 13 '20 at 15:57
  • I (think I) understand. I think what confuses me is how to access the subset in data. data[0] prints this: OrderedDict([('name', 'Alice'), ('AGATC', '2'), ('AATG', '8'), ('TATC', '3')]) STR_list[0] prints this: ('AGATC', '4') I don't understand how to isolate the different elements in data[0] for example. I also don't understand why it carried "OrderedDict" with it when it was made into a list, because although the two look the same they're not as one is a list and isn't. Frustrating. – herrvogelberg Aug 13 '20 at 16:59
  • data[0] is taking the first in the list of lists. If you do data[j][0] you should get ('name', 'Alice) for the first iteration, then the other names for the next – samsonjm Aug 13 '20 at 18:01
0
for j in range(len(data)):
    #we flatten the OrderedDict into a list
    flattened_data = [x for item in data[j].items() for x in item]
    #Now we verify that the list STR_list and the list flattened_data (minus the 2 first element, aka "name" and the actual name) are equal
    if sum([1 for x1, x2 in zip(STR_list, flattened_data[2:]) if x1 == x2])==len(STR_list):
       #Now we print the name of the person which is at index 1 inside the new list
       print(flattened_data[1])
    
 

Basically you need learn how to compare two lists and how to select a specific element from a list

GDS
  • 21
  • 1
  • I am trying to, this is my first ever time trying Python (and I only started with code 4 weeks ago). I should have added my CSV-opening code though, because it seems my data variable is already a list meaning it has no items: with open('file.csv') as csvfile: reader = csv.DictReader(csvfile) data = list(reader) – herrvogelberg Aug 13 '20 at 15:42