Hi I am working on a project which deals with large amount of data. I have a text file of around 2 GB with key value pairs and each key has multiple values. I need to extract all the keys in a different file, as I need the keys for testing a particular feature.
The format of the file is:
:k: k1 :v: {XYZ:{id:"k1",score:0e0,tags:null},ABC:[{XYZ:{id:"k1",score:0e0,tags:null},PQR:[{id:"ID1",score:71.85e0,tags:[{color:"DARK"},{Type:"S1"},{color:"BLACK"}]},MetaData:{RuleId:"R3",Score:66.26327129015809e0,Quality:"GOOD"}},{XYZ:{id:"k1",score:0e0,tags:null},PQR:[..(same as above format)..],MetaData:{RuleId:"R3",Score:65.8234565409752e0,Quality:"GOOD"}} ::
//same pattern repeats with different keys, and a new line
When I search ":k: " in the file using CTRL+F, these keys only get highlighted. SO I think this kind of pattern is no where in the file except the start of the line
Like these there are thousands of keys.
And I want all these keys (k1, k2) extracted to a separate file for testing.
There are multiple lines for :k: and want to separate (k1, k2, ..) in a separate file. How can I do this?
Python is also fine for me. I can use regular expressions in python or maybe use "sed" shell command. Please help me out here how I can use these to extract the keys.
Can someone help me in writing a shell/python script for same. I know its very trivial but I'm novice to all this kind of data processing.
Also focusing on optimizing the run time, as the data is very large.