0

Input

datas2 = [[("01/01/2011", 1), ("02/02/2011", "No"), ("03/03/2011", 11)],
[("01/01/2011", 2), ("03/03/2011", 22), ("22/22/2222", "no")],
[("01/01/2011", 3), ("03/03/2011", 33), ("22/22/2222", "333")]]

Intended Output

[("01/01/2011", 1, 2, 3), ("03/03/2011", 11, 22, 33)]

[Update]

I was asked about real data and more examples (messy codes in history):

A                       B                       C
09.05.2011;1.561        12.04.2011;14.59        12.04.2011;1.5
10.05.2011;1.572        13.04.2011;14.50        13.04.2011;1.5    
11.05.2011;1.603        14.04.2011;14.56        14.04.2011;1.5    
12.05.2011;1.566        15.04.2011;14.54        15.04.2011;1.5    
13.05.2011;1.563        18.04.2011;14.54        18.04.2011;1.5    
16.05.2011;1.537        19.04.2011;14.52        19.04.2011;1.5    
17.05.2011;1.528        20.04.2011;14.53        20.04.2011;1.5    
18.05.2011;1.543        21.04.2011;14.59        21.04.2011;1.5    
19.05.2011;1.537        26.04.2011;14.65        26.04.2011;1.6    
20.05.2011;1.502        27.04.2011;14.68        27.04.2011;1.6    
23.05.2011;1.503        28.04.2011;14.66        28.04.2011;1.6    
24.05.2011;1.483        29.04.2011;14.62        29.04.2011;1.6    
25.05.2011;1.457        02.05.2011;14.65        02.05.2011;1.6    
26.05.2011;1.491        03.05.2011;14.63        03.05.2011;1.6    
27.05.2011;1.509        04.05.2011;14.54        04.05.2011;1.5    
30.05.2011;1.496        05.05.2011;14.57        05.05.2011;1.5    
31.05.2011;1.503        06.05.2011;14.57        06.05.2011;1.5    
01.06.2011;1.509        09.05.2011;14.61        09.05.2011;1.6    
03.06.2011;1.412        10.05.2011;14.66        10.05.2011;1.6    
06.06.2011;1.380        11.05.2011;14.71        11.05.2011;1.7    
07.06.2011;1.379        12.05.2011;14.71        12.05.2011;1.7    
08.06.2011;1.372        13.05.2011;14.70        13.05.2011;1.7    
09.06.2011;1.366        16.05.2011;14.75        16.05.2011;1.7    
10.06.2011;1.405        17.05.2011;14.69        17.05.2011;1.6    
13.06.2011;1.400        18.05.2011;14.65        18.05.2011;1.6    
14.06.2011;1.414        19.05.2011;14.69        19.05.2011;1.6 
  • If I unpacked A and B, it would contain all values.
  • If I unpacked A, B and C, it would contain:

    [ ["09.05.2011", 1.561, 14.61, 1.6], ["10.05.2011", 1.572, 14.66, 1.6], ["11.05.2011", 1.603, 14.71, 1.7], ["12.05.2011", 1.566, 14.71, 1.7], ["13.05.2011", 1.563, 14.70, 1.7], ["16.05.2011", 1.537, 14.75, 1.7], ["17.05.2011", 1.528, 14.69, 1.6], ["18.05.2011", 1.543, 14.65, 1.6], ["19.05.2011", 1.537, 14.69, 1.6] ]

so every date must have as much values as there are files i.e. columns A, B, C,...

hhh
  • 50,788
  • 62
  • 179
  • 282
  • @JBernardo: not in this case, I just want to get the tuple unpacking working. Validation would be another question. – hhh Oct 05 '11 at 02:26
  • Ok, so let me see if I have this right. You want, for each date that is found in every list, a tuple consisting of (a) the date, and (b) the associated values from each tuple in each list? – Karl Knechtel Oct 05 '11 at 03:57
  • @KarlKnechtel: I want only those values that have a value corresponding to every data. So if one value is not in one set, it will not be inculed. – hhh Oct 05 '11 at 04:11
  • @KarlKnechtel: what I was not that far off from the answers, just a few lines: `final_result = list(result.items()) newResults = defaultdict(list) for (date, vals) in final_result: if len(vals) == size: newResults[date].append(vals) print newResults` so this case is solved, still investigating the shorter way of doing this... – hhh Oct 05 '11 at 04:35

2 Answers2

3
from collections import defaultdict
import itertools

d = defaultdict(list)
for i,j in itertools.chain.from_iterable(datas2):
    if not isinstance(j, str):
        d[i].append(j)

and d will be a dict like:

{'01/01/2011': [1, 2, 3], '03/03/2011': [11, 22, 33]}

So you can format it later as tuples with d.items()

Note the "22/22/2222" wasn't validated, but is quite easy to do that inside the for loop.

JBernardo
  • 32,262
  • 10
  • 90
  • 115
  • His example data set included '333' and his output didn't include that, so I figured he just wanted to screen out anything that wasn't a number. – steveha Oct 05 '11 at 02:39
  • @steveha I assumed he wanted to screen out every date that isn't common to each data-set. Question is incredibly under-specified. – Karl Knechtel Oct 05 '11 at 03:58
  • but is dict poor choice because it is not ordered? It is slow to go through it with large data... – hhh Oct 05 '11 at 04:55
  • I don't know what you mean by "It is slow to go through it with large data" because a `dict` is a fast way to handle cases like this; a `dict` uses a hash table to look up keys. If you need the output to be ordered, you can sort the output list. Or you could use `OrderedDict()`. – steveha Oct 05 '11 at 20:34
2

This code is written to work equally well on Python 2.x or Python 3.x. I tested it with Python 2.7 and Python 3.2.

from collections import defaultdict

datas2 = [
    [("01/01/2011", 1), ("02/02/2011", "No"), ("03/03/2011", 11)],
    [("01/01/2011", 2), ("03/03/2011", 22), ("22/22/2222", "no")],
    [("01/01/2011", 3), ("03/03/2011", 33), ("22/22/2222", "333")]
]


def want_value(val):
    """return true if val is a value we want to keep"""
    try:
        # detect numbers by trying to add to 0
        0 + val
        # no exception means it is a number and we want it
        return True
    except TypeError:
        # exception means it is the wrong type (a string or whatever)
        return False

result = defaultdict(list)

for lst in datas2:
    for date, val in lst:
        if want_value(val):
            result[date].append(val)

final_result = list(result.items())
print(final_result)
steveha
  • 74,789
  • 21
  • 92
  • 117
  • Well, usually when I want to make something work with numbers, I just force the value to a number with `int(val)` or `float(val)`. In this case, he had an example string of `'333'` and he didn't want that included. I think it is generally considered Pythonic to use exceptions to sort wheat from chaff like this, and generally considered a bit icky to check types with `isinstance()`. This function will screen out `'no'`', `'No'`, `'333'`, `None`, object instances, and anything else that might show up in there, while passing anything that acts like a number. What would you recommend I do? – steveha Oct 05 '11 at 02:44
  • That's why I don't like it... You're blocking everything and it may be hard to debug later. But it's just my opinion – JBernardo Oct 05 '11 at 02:46
  • @JBernando At least he split the validation out into a separate method, which is good practice - for example it makes it trivial for you to change the validation logic to reject only 'No', 'no', 'nO', and 'NoWay!' if you decide that's what you want, and to not use Exceptions, if they offend you. Note that the original question gives *no* guidance as to what values should be legal or illegal, and how illegal values should be handled. – Peter Oct 05 '11 at 02:47
  • I don't think OP works with so many formats! The question should have more explanation and real data. BTW, I like exceptions (a lot), but it doesn't feel right for me to use them like that – JBernardo Oct 05 '11 at 02:49
  • +1 for clarity. I would name a validator method is_value() instead of want_value(), but that's just me. – Peter Oct 05 '11 at 02:54
  • @JBernardo, I am not sure how it can be hard to debug this case. If you are seeing values you don't want, the `want_value()` function is a pretty obvious place to look. The basic approach is EAFP: http://en.wikipedia.org/wiki/Python_syntax_and_semantics#Exceptions But anyway @hhh is welcome to replace my function with something else that accepts exactly what he wants, if my function accepts too much. – steveha Oct 05 '11 at 03:13
  • Thank you! I got the idea how to do it now. I used totally wrong datatype. – hhh Oct 05 '11 at 04:27