0

I am currently working on a transaction dataset (https://github.com/IBM/TabFormer/tree/main/data/credit_card) and I intend to build a fraud detection engine, but with tabular data transformed into a graph. I have used this article as my main outline for this approach: https://developer.nvidia.com/blog/optimizing-fraud-detection-in-financial-services-with-graph-neural-networks-and-nvidia-gpus/.

So far, I have conducted preprocessing, and the dataset now contains 20 numerical features. As the article suggests, I have saved the bulk of the data on the edges between nodes, leaving the nodes featureless, besides their distinct IDs. Moreover, from my perspective, it would seem that these transactions only have one relationships, which is "Credit card purchases from Merchant". Now, I have some questions regarding the suggestions of the article

  1. The more I look into R-GCN and GCN for that matter, it would seem that these models do not use edge features, but node features instead. As such, wouldn't it be ineffective to conduct node embeddings, and node classification as the article suggests, since there is no information on the nodes themselves, and its IDs provide no information to detect fraudulence?
  2. Does R-GCN provide any significant advantage to GCN in this instance, as there is only one type of relationship.
  3. I have also seen the article suggest using Link Prediction as part of the approach, but I do not understand how it helps with detecting fraudulent transactions.

I am having a pretty hard time understanding this article, and the method for that matter, and I would really appreciate some help. Thank you all!

Hai Nguyen
  • 13
  • 1
  • 2

0 Answers0