0

I'm setting up a rails "import from csv" task, and I came across departments data (in db) in the form of path. I want it to be adjacency list.

Smth I have:

ID, NAME, PATH
---------
1,Valve,000
2,Steam,000.000
3,Sales,000.000.000
4,Developers,000.000.112
7,Designers,000.000.112.000
8,Game Designers,000.000.112.000.000
9,UI Designers,000.000.112.000.002
10,Web Designers,000.000.112.000.001
11,3D Designers,000.000.112.000.003
12,Accounting managers,000.000.114.000
13,Accounting topmanagers,000.000.114.000.000

Smth I want:

ID, NAME, PATH, PARENT_ID
---------
1,Valve,000, nil
2,Steam,000.000, 1
3,Sales,000.000.000, 2
4,Developers,000.000.112, 2
7,Designers,000.000.112.000, 4
8,Game Designers,000.000.112.000.000, 7
9,UI Designers,000.000.112.000.002, 7
10,Web Designers,000.000.112.000.001, 7
11,3D Designers,000.000.112.000.003, 7
12,Accounting managers,000.000.114.000, 322
13,Accounting topmanagers,000.000.114.000.000, 12
Norman Edance
  • 352
  • 4
  • 14

1 Answers1

1

The string appears to describe a directed tree, except the accounting managers,

'12,Accounting managers,000.000.114.000'

seem to have no boss. I've therefore added

'14,Accounting big cheese,000.000.114'

Here's the data.

data =<<-_
ID, NAME, PATH
---------
1,Valve,000
2,Steam,000.000
3,Sales,000.000.000
4,Developers,000.000.112
7,Designers,000.000.112.000
8,Game Designers,000.000.112.000.000
9,UI Designers,000.000.112.000.002
10,Web Designers,000.000.112.000.001
11,3D Designers,000.000.112.000.003
14,Accounting big cheese,000.000.114
12,Accounting managers,000.000.114.000
13,Accounting topmanagers,000.000.114.000.000
_

We can use split("\n") to convert this string to an array of lines, and then determine the parentage of each node as follows.

r1, r2, *rest = data.split("\n")
str = [
  r1,
  r2,
  rest.map do |s|
    parent_match = s[/(?:\d{3}\.)*\d{3}(?=\.\d{3})/]
    parent = arr.find { |ss| parent_match == ss[/(?:\d{3}\.)*\d{3}/] }
    parent.nil? ? "#{s}, nil" : "#{s}, #{ parent[/\d+/] }" 
    end
].join("\n")

puts str 
ID, NAME, PATH
---------
1,Valve,000, nil
2,Steam,000.000, 1
3,Sales,000.000.000, 2
4,Developers,000.000.112, 2
7,Designers,000.000.112.000, 4
8,Game Designers,000.000.112.000.000, 7
9,UI Designers,000.000.112.000.002, 7
10,Web Designers,000.000.112.000.001, 7
11,3D Designers,000.000.112.000.003, 7
14,Accounting big cheese,000.000.114, 2
12,Accounting managers,000.000.114.000, 14
13,Accounting topmanagers,000.000.114.000.000, 12

In map's block suppose

s = '8,Game Designers,000.000.112.000.000'

then

parent_match = s[/(?:\d{3}\.)*\d{3}(?=\.\d{3})/]
  #=> "000.000.112.000" 

parent_match is a string of all the triples of digits separated by periods in s, other than the last period followed by the last triple of digits. The regular expression reads, "match zero or more groups of 3 digits followed by a period, followed by 3 digits, provided this match is immediately followed by a period and 3 digits ((?=\.\d{3})) being a positive lookahead).

We then loop through rest looking for an element that ends with parent_match:

parent = rest.find { |ss| parent_match == ss[/(?:\d{3}\.)*\d{3}/] }
  #=> "7,Designers,000.000.112.000"

The regex /(?:\d{3}\.)*\d{3}/ reads, "match zero or more groups of 3 digits followed by a period, followed by 3 digits".

In the next line:

parent.nil?
  #=> false

so the block returns

"#{s}, #{ parent[/\d+/] }" 
  #=> "8,Game Designers,000.000.112.000.000, 7"

parent[/\d+/] merely extracts the digit character(s) at the beginning of parent.

Had I not added the line

14,Accounting big cheese,000.000.114

the following line ('12,Accounting ...') would have ended, ', nil'.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100