0

I have a 7z archive that i downloaded from practicalsecurityanalytics.com that contains malware files and benign files of 117GB. The compressed size of this file is 43.8GB which is large and i do not want to extract the whole archive at once.

Is there a way so that i can specifically extract a few selected files The selected files are not sequential so that i can't really rely on GUI and select individual files.

File details metric
Samples 201,549
Legitimate 86,812
Malicious 114,737
Compressed Size 43.8GB
Uncompressed Size 117GB

There is a csv file called samples.csv that shows which file is malware and which is not and the entropy of the file

The file is encrypted so it asks for a password every time I want to extract something.

I am working in linux.

pr1sm8
  • 1
  • 2

2 Answers2

0

A quick way I extracted the specific files is first add all the file names into a text file like this

228161
213960
200290
210832
230546
257545
....

and wrap the file names around like this by using any method (i used a python script to quickly do it) and save it a file - here f1.txt

pe-machine-learning-dataset/samples/228161
pe-machine-learning-dataset/samples/213960
pe-machine-learning-dataset/samples/200290
pe-machine-learning-dataset/samples/210832
pe-machine-learning-dataset/samples/230546
pe-machine-learning-dataset/samples/257545
....

and now executing

7z e foo.7z -o"path to save the files" $(cat f1.txt)
pr1sm8
  • 1
  • 2
0

One of the 7zip command line options is to extract from an archive but only specific files by name:

  1. Install the 7zip program on Linux (eg, on Ubunutu sudo apt-get install 7zip)
  2. Run 7zz e yourarchive.7z thefileyouwanttoextract -p archivepassword

You may have to run it via a loop from your CSV if you only want some of the files.

reepy
  • 18
  • 3
  • Sorry to not include that the file is encrypted. This way of doing is fine but every iteration it asks me to enter the password into the terminal – pr1sm8 Aug 23 '22 at 10:09
  • I've updated it to include the password option. – reepy Aug 23 '22 at 10:17