0

I have run into a data problem that I sense many have encountered. I Currently have a data set which contains transaction information. Based on the Transaction_Number I will find how long each person involved in transaction spent on their part.

The final output looks like this

 Transaction_Number        Created On             Created_By      Time_Diff

  27327920232            2014-12-08 03:29:08      jake            NaN
  27323232336            2017-04-28 18:39:03       James           0
  27323232336            2017-04-28 18:26:23      Tony            760.0

Now the problem that I am facing is that I am missing one column. Which I solved by querying from the DB and pulled a field named "Product_Number". I am joining the two tables on Transaction_Number. Here is where the problem arises the second data frame name D has the Transaction_Number listed only once matching the Product_Number. The master data frame has multiple occurrences of the Transactions. Therefore when I run the following code my result doesn't make sense. Outputting only 4 matches based on 12K+ possibilities.

         X                       D                        D
   Transaction_Number     Transaction_Number          Product_Number
     27327920232            27327920232                Cw-xs-re-89
     27327920232            72732332323                pw-ts-sf-89
     27327920232            32338033383                Cw-ns-tn-86

    results = pd.merge(X, D, how='inner', on= "Transaction_Number")

What can I do to find the correct matchings. I have also attempted a left join, any type of tips will help. - Thanks

CodeNoob
  • 9
  • 4
  • Didn't you just ask this same question? Why do the first transaction numbers have dashes in front of them? – Ted Petrou Sep 15 '17 at 22:36
  • *What can I do to find the correct matchings*? Only you can tell us this as `merge` is working as needed. Which *Product_Number* from the multiple *Transactions* to pull in? First? Last? – Parfait Sep 16 '17 at 03:10

1 Answers1

1

Are you looking for outer merge?

results = pd.merge(X, D, how='outer', on= "Transaction_Number")