I have a pandas dataframe like this (simplified):
data = {'old': [['these','are','old','tokens'],
['here','are','some','more','old']], 'new':
[['and','these','are','new'],['see','the','difference','between','them']]}
example_df = pd.DataFrame(data=data).astype(str)
So the dataframe looks like this:
new
0 ['and', 'these', 'are', 'new']
1 ['see', 'the', 'difference', 'between', 'them']
old
0 ['these', 'are', 'old', 'tokens']
1 ['here', 'are', 'some', 'more', 'old']
In my real df, there are 968 rows. (this becomes relevant below)
I am performing a comparison function (for semantic analysis), again simplified:
def analysis(1st_token_list,2nd_token_list):
synonymset1 = somefunction(1st_token_list) # specifics don't matter, this works fine
synonymset2 = somefunction(2nd_token_list) # specifics don't matter, this works fine
best_score_list = []
for synset in synonymset1:
similaritylist = [synset.path_similarity(ss) for ss in synonymset2 if synset.path_similarity(ss) is not None]
if not similaritylist:
continue;
best_score = max(similaritylist)
if best_score is not None:
best_score_list.append(best_score)
print(best_score_list)
return best_score_list
For added clarity, the function before the loop returns a list of synsets (from wordnet) for each token list, like so:
[Synset('old.v.01'), Synset('token.n.01')]
When I call the below,
notnull_df['maxsim_OtN'] = notnull_df.apply(lambda row:
maxsim.word_similarity(row['old_tokens'], row['new_tokens']), axis=1)
I see the lists being generated (something along the lines of I get an error about the inappropriateness of shape.
Traceback (most recent call last):
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4637, in create_block_manager_from_arrays
blocks = form_blocks(arrays, names, axes)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4701, in form_blocks
float_blocks = _multi_blockify(float_items)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4778, in _multi_blockify
values, placement = _stack_arrays(list(tup_block), dtype)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4823, in _stack_arrays
stacked[i] = _asarray_compat(arr)
ValueError: could not broadcast input array from shape (6) into shape (5)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "semsim_calculation.py", line 133, in <module>
notnull_df['maxsim_OtN'] = notnull_df.apply(lambda row: maxsim.word_similarity(row['old_tokens'], row['new_tokens']), axis=1)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 4990, in _apply_standard
result = self._constructor(data=results, index=index)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 330, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 461, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 6173, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4642, in create_block_manager_from_arrays
construction_error(len(arrays), arrays[0].shape, axes, e)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4608, in construction_error
passed, implied))
ValueError: Shape of passed values is (968, 5), indices imply (968, 11)
Can anyone explain why this happening? the print()
actually does show me that the list of values ([0.25, 0.5, 0.07692307692307693]
) is being generated, but it's not return
ing that list (similar question was asked but not resolved in this question.