2

I have two lists, let's call them list1 and list2.

list1 is the main list which contains all the values of my data.

list2 contains certain values that have to be removed from list1. (You can say it is sort of like a sublist of list1)

len(list1) = 13357
len(list2) = 1751

So to remove values of list2 from list1 I do:

new_list = [x for x in list1 if x not in list2]

So what you would expect is for new_list to have a length:

len(new_list) = len(list1) - len(list2) = 13357 - 1751 = 11606

However my new_list has a length = 11584 !!!

How is this possible????!!!!!

EDIT: I obtained my list2 from list1 using a calculation. I also checked for similar values using set(list1) & set(list2)

list2 is actually a quadrant of list1, it is a follow up of THIS QUESTION I had asked previously. So from the above link you can see that I have RA and DEC co-ordinates of my data. In the example, assume list1 is my RA.

So once I have my quandrants, I apply:

for a,b in zip(sliceno,list1):
    if a == 0:
        list2.append(a)

I know this is a follow-up of my previous question, so please bare with the complications!!

Community
  • 1
  • 1
Srivatsan
  • 9,225
  • 13
  • 58
  • 83
  • does your list have duplicate values I would rather try `len(set(list1))` – The6thSense Jul 15 '15 at 11:33
  • well, they are floats correct upto 13 decimal places, so I assume no – Srivatsan Jul 15 '15 at 11:34
  • 5
    float comparison can be tricky at times, especially if the values from the two lists are results of calculations they might differ by ever so little and list2 is not a true subset of list1 but contains entries which are similar but not identical to entries in list1 – planetmaker Jul 15 '15 at 11:36
  • Please show more of your code. What calculation produced `list2`? – SuperBiasedMan Jul 15 '15 at 11:40
  • @planetmaker: Do you think that if I decrease the decimal places to say 3 or 4 I may get the correct result? – Srivatsan Jul 15 '15 at 11:40
  • 1
    Please add more information about your data and add some instance to your question! – Mazdak Jul 15 '15 at 11:44
  • 1
    What about the set difference? set(list1) - set(list2) ?, is length and content the same as new_list? – Viktor Mellgren Jul 15 '15 at 11:46
  • 1
    You can't "increase the decimal places". You need to take a big step back and understand how floating-point arithmetic works. – roippi Jul 15 '15 at 11:48
  • @Vixen: ok so here are the results. `len(set(list1)) = 13313` and `len(set(list2)) = 1747` and `len(set(list1) - set(list2)) = 11566` – Srivatsan Jul 15 '15 at 12:00
  • So it seems like you have duplicates in both lists – Viktor Mellgren Jul 15 '15 at 12:17
  • @Vixen: Can I remove the duplicates or is there any other way to get my `new_list` ?? – Srivatsan Jul 15 '15 at 12:25
  • 1
    I believe the easiest way is to do just the set differance, since when you create a set it s automatically removes duplicates. If you need your result as a list, then just 'list(theSetDifferance)' – Viktor Mellgren Jul 15 '15 at 12:45
  • @Vixen: By set difference you mean: `new_list = [x for x in set(list1) if x not in set(list2)] ` – Srivatsan Jul 15 '15 at 12:46
  • 1
    `new list = set(list1) - set(list2)` should be the same as `[x for x in set(list1) if x not in set(list2)]` – Viktor Mellgren Jul 15 '15 at 13:26

1 Answers1

2

This is almost for sure that list2 has 11606 - 11584 = 22 repeated elements. (And because of list2 is a subset of list1 then they exist also in list1)

juanmajmjr
  • 1,055
  • 7
  • 11