Since my question was up-voted, I guess, it is still interesting to some people. Having learned quite a bit in Python so far, let me answer it, maybe it is going to be helpful to other users.
First, let us import the required packages
import pandas as pd
from dfply import *
from os.path import basename, dirname, join
and make the required pandas DataFrame
resultstatsDF = pd.DataFrame({'file': ['/home/user/this/file1.png', '/home/user/that/file2.png']})
which is
file
0 /home/user/this/file1.png
1 /home/user/that/file2.png
We see that we still get an error (though it changed due to continuous development of dfply):
resultstatsDF.reset_index() >> \
mutate(dirfile = join(basename(dirname(X.file)), basename(X.file)))
TypeError: index returned non-int (type Intention)
The reason is, because mutate works on series, but we need a function working on elements. Here we can use the function pandas.Series.apply of pandas, which works on series.
However, we also need a custom function that we can apply on each element of the series file
.
Everything put together we end up with the code
def extract_last_dir_plus_filename(series_element):
return join(basename(dirname(series_element)), basename(series_element))
resultstatsDF.reset_index() >> \
mutate(dirfile = X.file.apply(extract_last_dir_plus_filename))
which outputs
index file dirfile
0 0 /home/user/this/file1.png this/file1.png
1 1 /home/user/that/file2.png that/file2.png
Doing this without dfply's mutate
, we could write alternatively
resultstatsDF['dirfile'] = resultstatsDF.file.apply(extract_last_dir_plus_filename)