I am trying to tokenize natural language for the first sentence in wikipedia in order to find 'is a' patterns. n-grams of the tokens and left over text would be the next step. "Wellington is a town in the UK." becomes "town
is a attr_root
in the country
." Then find common patterns using n-grams.
For this I need to replace string values in a string column using other string columns in the dataframe. In Pandas I can do this using
df['Test'] = df.apply(lambda x: x['Name'].replace(x['Rep'], x['Sub']), axis=1)
but I cannot find the equivalent vaex
method. This issue led me to believe that this should be possible in vaex
based on Maarten Breddels' example code, however when trying it I get the below error.
import pandas as pd
import vaex
df = pd.DataFrame(
{
"Name": [
"Braund, Mr. Owen Harris",
"Allen, Mr. William Henry",
"Bonnell, Miss. Elizabeth",
],
"Rep": ["Braund", "Henry", "Miss."],
"Sub": ["<surname>", "<name>", "<title>"],
}
)
dfv = vaex.from_pandas(df)
def func(x, y, z):
return x.replace(y, z)
dfv['Test'] = dfv.apply(func, arguments=[df.Name.astype('str'), df.Rep.astype('str'), df.Sub.astype('str')])
Gives
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\User\AppData\Roaming\Python\Python37\site-packages\vaex\dataframe.py", line 455, in apply
arguments = _ensure_strings_from_expressions(arguments)
File "C:\Users\User\AppData\Roaming\Python\Python37\site-packages\vaex\utils.py", line 780, in _ensure_strings_from_expressions
return [_ensure_strings_from_expressions(k) for k in expressions]
File "C:\Users\User\AppData\Roaming\Python\Python37\site-packages\vaex\utils.py", line 780, in <listcomp>
return [_ensure_strings_from_expressions(k) for k in expressions]
File "C:\Users\User\AppData\Roaming\Python\Python37\site-packages\vaex\utils.py", line 782, in _ensure_strings_from_expressions
return _ensure_string_from_expression(expressions)
File "C:\Users\User\AppData\Roaming\Python\Python37\site-packages\vaex\utils.py", line 775, in _ensure_string_from_expression
raise ValueError('%r is not of string or Expression type, but %r' % (expression, type(expression)))
ValueError: 0 Braund, Mr. Owen Harris
1 Allen, Mr. William Henry
2 Bonnell, Miss. Elizabeth
Name: Name, dtype: object is not of string or Expression type, but <class 'pandas.core.series.Series'>
How can I accomplish this in vaex
?