how to add malicious features for classification in weka as a data set

Question

I am doing a project on how to detect and classify malicious content using weka data mining tool. I have developed an algorithm but the problem is I don't know how and where to add malicious features of javascript,html or URl.

For example: If there are (///)triple slashes ,it is classified as a malicious URL. Similarly I have other features on which my algorithm will perform classification.

So if anyone knows how to do it please reply me.

Thanks in advance.

score 1 · Answer 1 · answered Feb 06 '13 at 19:39

This question is more about feature extraction or finding domain features for your project. Normally weka works with ready features. Therefore your question is not about weka about how to find and use features for your project.

I can not help for html and javascript but for URL classification following articles may help.

Kan M-Y and Thi HON (2005), "Fast webpage classification using URL features", In Proceedings of the 14th ACM international conference on Information and knowledge management. New York, NY, USA , pp. 325-326. ACM.

Ma J, Saul LK, Savage S and Voelker GM (2009), "Beyond blacklists: learning to detect malicious web sites from suspicious URLs", In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA , pp. 1245-1254. ACM.

@ Thanks Atilla for your response and I want to use those malicious features in weka tool for classification purpose.Since i am new in weka, i have doubt whether , I have to do coding or i have to just make arff dataset based on these malicious and benign features?..If you have any idea share it.. — Vai, Feb 06 '13 at 19:54
You have to do coding so that your arff file will have these malicious and benign features. After that you can use algorithms of weka. — Atilla Ozgur, Feb 06 '13 at 20:05

score -1 · Answer 2 · answered Apr 24 '20 at 08:47

we created Windows API calls sequence of metamorphic malware. In our research, we have translated the families produced by each of the software into 8 main malware families: Trojan, Backdoor, Downloader, Worms, Spyware Adware, Dropper, Virus. https://github.com/ocatak/malware_api_class

how to add malicious features for classification in weka as a data set

2 Answers2