I'm working on an Awk/Gawk script that parses a file, populating a multidimensional array for each line. The first column is a period delimited string, with each value being a reference to the array key for the next level. The 2nd column is the value
Here's an example of what the content being parsed looks like:
$ echo -e "personal.name.first\t= John\npersonal.name.last\t= Doe\npersonal.other.dob\t= 05/07/87\npersonal.contact.phone\t= 602123456\npersonal.contact.email\t= john.doe@idk\nemployment.jobs.1\t= Company One\nemployment.jobs.2\t= Company Two\nemployment.jobs.3\t= Company Three"
personal.name.first = John
personal.name.last = Doe
personal.other.dob = 05/07/87
personal.contact.phone = 602123456
personal.contact.email = john.doe@idk
employment.jobs.1 = Company One
employment.jobs.2 = Company Two
employment.jobs.3 = Company Three
Which after being parsed, Im expecting it to have the same structure as:
data["personal"]["name"]["first"] = "John"
data["personal"]["name"]["last"] = "Doe"
data["personal"]["other"]["dob"] = "05/07/87"
data["personal"]["contact"]["phone"] = "602123456"
data["personal"]["contact"]["email"] = "john.doe@foo.com"
data["employment"]["jobs"]["1"] = Company One
data["employment"]["jobs"]["2"] = Company Two
data["employment"]["jobs"]["3"] = Company Three
The part that I'm stuck on is how to dynamically populate the keys while structuring the multidimensional array.
I found this SO thread that covers a similar issue, which was resolved by using the SUBSEP
variable, which at first seemed like it would work as I needed, but after some testing, it looks like arr["foo", "bar"] = "baz"
doesn't get treated like a real array, such as arr["foo"]["bar"] = "baz"
would. An example of what I mean by that would be the inability to count the values in any level of the array: arr["foo", "bar"] = "baz"; print length(arr["foo"])
would simply print a 0
(zero)
I found this SO thread which helps a little, possibly pointing me in the right direction.
In a snippet in the thread mentioned:
BEGIN {
x=SUBSEP
a="Red" x "Green" x "Blue"
b="Yellow" x "Cyan" x "Purple"
Colors[1][0] = ""
Colors[2][0] = ""
split(a, Colors[1], x)
split(b, Colors[2], x)
print Colors[2][3]
}
Is pretty close, but the problem I'm having now is the fact that the keys (EG: Red
, Green
, etc) need to be specified dynamically, and there could be one or more keys.
Basically, how can I take the a_keys
and b_keys
strings, split them by .
, and populate the a
and b
variables as multidimensional arrays?..
BEGIN {
x=SUBSEP
# How can I take these strings...
a_keys = "Red.Green.Blue"
b_keys = "Yellow.Cyan.Purple"
# .. And populate the array, just as this does:
a="Red" x "Green" x "Blue"
b="Yellow" x "Cyan" x "Purple"
Colors[1][0] = ""
Colors[2][0] = ""
split(a, Colors[1], x)
split(b, Colors[2], x)
print Colors[2][3]
}
Any help would be appreciated, thanks!