It is difficult to describe this on a heading but given these two DataFrames:
import pandas as pd
import numpy as np
import re
df1 = pd.DataFrame({
'url': [
'http://google.com/car',
'http://google.com/moto',
'http://google.com/moto-bike'
], 'value': [3, 4, 6]})
url value
http://google.com/car 3
http://google.com/moto 4
http://google.com/moto-bike 6
df2 = pd.DataFrame({'name': ['car','moto','bus']})
name
0 car
1 moto
2 bus
I want to see how many times the name on df2
appears on the url
for df1
, and have sort of managed with:
df2['instances'] = pd.Series([df1.url.str.contains(fr'\D{w}\D', regex=True) \
.sum() for w in df2.name.tolist()])
For some reason car has zero instances cause there is only one.
name instances
0 car 0
1 moto 2
2 bus 0
What I would like to be able to do is to have another column that sums the value
column of all matches of df1
, so it looks like this:
name instances value_total
0 car 1 3
1 moto 2 10
2 bus 0 0
Any help on the right direction would be greatly appreciated, thanks!