Python3 Data Impute

Question

Job interview that I bombed on.

    Remove all rows where at least half of the entries are negative
    Fill in remaining negative values in each column with the 
    average for that column, excluding invalid entries

Input: [[5],
     [3],
     [1.0, 2.0, 10.0],
     [-1.0, -99.0, 0],
     [-1.0, 4.0, 0],
     [3.0, -6.0, -0.1],
     [1.0, -0.31, 6.0]
    ]

Output: new mean rounded to one decimal place

Not sure where to start

Output new mean rounded to one decimal place

score 1 · Accepted Answer · answered Jul 27 '19 at 22:52

Assuming that there are 3 columns and there's 1 or none negative value in each remaining column that are being replaced by the average of the entire column.

Then:

np = [[5],[3],[1.0,2.0,10.0],[-1.0,-99.0,0],[-1.0,4.0,0],[3.0,-6.0,-0.1],[1.0,-0.31, 6.0]]
cols = 0  # save num of cols for later
for l in inp:
    pos = 0  # count positive
    neg = 0  # count negative
    for n in l:
        if n > 0:
            pos += 1  # update
        elif n < 0:
            neg += 1  # update
    if pos+neg > cols:  # save num of cols
        cols = pos+neg
    if pos < neg:  # remove list with too many negatives
        inp.remove(l)

for i in range(cols):  # loop through cols
    neg_index = 0  # find the negative value's index to replace with the average
    entries = 0  # for calculating the average
    summ = 0  # for calculating the average
    for c in inp:
        try:  # see if col exist in a list
            if c[i] < 0:
                neg_index = inp.index(c)  # save index of negative value found
            else:
                summ += c[i]
                entries += 1
        except:
            continue
    try:  # see if col exist in list
        inp[neg_index][i] = round(summ / entries,1)  # replace negative value index with average
    except:
        continue

print(inp)

Result:

[5],
[3],
[1.0, 2.0, 10.0],
[2.5, 4.0, 0],
[1.0, 3.0, 6.0],

I believe this is what they were looking for, hope this helps.

Thanks, it's working correctly. By I cannot calc the mean from the final array. I converted it to a numpy array and ran numpy.mean(inp). I get **TypeError: unsupported operand type(s) for /: 'list' and 'int'**. — Stringer, Jul 28 '19 at 00:38
numpy.mean(...) is trying to divide a list from an int, which is not possible. This is because the lists are different sizes. Check out this post, it might help: https://stackoverflow.com/questions/10058227/calculating-mean-of-arrays-with-different-lengths — ugatah, Jul 28 '19 at 01:03

Python3 Data Impute

1 Answers1