could you please help me out with this:
I have a dataframe (df1
) that has index of all articles published in the website's CMS. There's a column for current URL
and a column of original URLs in case they were changed after publication (column name Origin
):
URL | Origin | ArticleID | Author | Category | Cost |
---|---|---|---|---|---|
https://example.com/article1 | https://example.com/article | 001 | AuthorName | Politics | 120 USD |
https://example.com/article2 | https://example.com/article2 | 002 | AuthorName | Finance | 68 USD |
Next I have an huge dataframe (df2
)with web analytics export for a timeframe. It has a date, just 1 column for URL and number of pageviews.
PageviewDate | URL | Pageviews |
---|---|---|
2019-01-01 | https://example.com/article | 224544 |
2019-01-01 | https://example.com/article1 | 656565 |
How do I left join this with first dataframe but matching on either URL
= URL
OR Origin
= URL
So that end result would look like this:
PageviewDate | Pageviews | ArticleID | Author | Category |
---|---|---|---|---|
2019-01-01 | 881109 | 001 | AuthorName | Politics |
i.e 881109
is the result of adding up 224544
and 656565
that both related to the same article
I guess what I'm looking for is the equivalent of SQL syntax like:
LEFT JOIN ...`enter code here`
ON URL = URL
OR Origin = URL```