How to separate data in SQL table

Question

I have a .csv file with data records, each line containing departments of organisations. I would like to import them into a MySQL database and show them in a web application. A sample abstract of data records looks like this:

| Company A | Department aaa |
| Company A | Department bbb |
| Company A | Department ccc |
| Company B | Department ddd |
| Company B | Department eee |
| Company C | Department fff |
| Company C | Department ggg |
| Company C | Department hhh |
| Company D | Department iii |
| Company E | Department jjj |
| Company E | Department kkk |

Of course I want to show each company once with its related departments attached, which makes it look this way, more or less:

| Company A | Department aaa |
|           | Department bbb |
|           | Department ccc |
| Company B | Department ddd |
|           | Department eee |
| Company C | Department fff |
|           | Department ggg |
|           | Department hhh |
| Company D | Department iii |
| Company E | Department jjj |
|           | Department kkk |

My question is, what is the best way to concatenate these data records? Would you do it using the appropriate programing language (ie PHP, Ruby, Perl, whatever) or would you do it in the database using SQL query? If SQL is the choice, would you do it using stored procedures, temp tables, or even at the import interface? What would it look like if it was SQL? Thank you!

Are you asking how to import the file into an existing database/application or is this a question discussing the design of a solution around the CSV files you haven't designed yet? If it is the latter then IMHO it's not a question for SO. — Nope, Nov 10 '17 at 16:45
MySQL does not support CTEs (`WITH RECURSIVE` clause) which would make this query relatively simple. PostgreSQL and SQLite3, for example have the following solution: http://sqlfiddle.com/#!15/69fff/3/0 — tonypdmtr, Nov 10 '17 at 17:46
@tonypdmtr, MySQL 8.0.1 and later supports recursive CTE's, for what it's worth. But 8.0 is still in Release Candidate status currently. — Bill Karwin, Nov 10 '17 at 19:34

score 1 · Accepted Answer · answered Nov 10 '17 at 16:37

You must store the associated company with its department on every row in the database. Since there is no such thing as implicit "order" to rows in the database, there's no way for a row to treat its company as the same as on the row "before" (there is no meaning to "before").

It's not worth the trouble to blank out the redundant company name on rows of a query result set. Just let them appear on every row even though it looks redundant.

Then in your application, as you are presenting the data, you do have an opportunity to process them in order as you display them. So you'd do something like this (pseudocode):

query 'SELECT Company, Department FROM MyTable ORDER BY Company, Department'

prev_company = ''

while row = fetch
do
    if row.company == prev_company
        display ''
    else
        display row.company
        prev_company = row.company

    display row.department
done

Why the downvote? This is actually how most applications do this, in my experience. — Bill Karwin, Nov 10 '17 at 16:55

score 0 · Answer 2 · answered Nov 13 '17 at 08:30

While Bill's solution will certainly work, my bias is to create separate database entities (and by extension, separate tables) anytime there is a one to many relationship like the one you describe.

So I would create a company table and a department table and have the department reference rows in the company table by a foreign key company_id column.

TABLE: company
id
name

TABLE: department
id
company_id
name

Then you could select all department records, join against the company table, and order by company_id... then you can post-process the query results into whatever data structure you need.

There are many reasons I prefer joins over lumping data into a single table; it feels more flexible and it mimics a tree structure of interrelated models. But the biggest reason is this:

You aren't duplicating data. If a company's name changes, you only have to update one records, instead of all records pertaining to that company. 2) It's more flexible. If you later

score -1 · Answer 3 · answered Nov 10 '17 at 16:26

I would do it in SQL, it appears to be a regular GROUP BY type query so I don't see any need to jump through hoops in server code when the database can do it for you. Either run the raw query or use an ORM if you prefer. Either way the SQL should be something like:

SELECT organisation, department FROM table GROUP BY organisation;

unless all the department names are unique, then you can just select everything and order by the organisation name

How to separate data in SQL table

3 Answers3

Linked