1

I have a data file where every row value is shown as a new column entry. I want to convert this or find a logic to make this file loadable to a database. Below is sample of how the data is in the file.

The file is huge. It has more than >7000 columns. I have tried loading/importing this to a table but it exceeds max column limit in all the tools

+--------+-----------+----------+----------+----------+------------+------------+------------+------------+
| emplid | status_0  | status_1 | status_2 | status_3 | location_0 | location_1 | location_2 | location_3 |
+--------+-----------+----------+----------+----------+------------+------------+------------+------------+
| 1234   | Submitted | Reviewed | Approved | Accepted |            | California | Michigan   |            |
+--------+-----------+----------+----------+----------+------------+------------+------------+------------+
| 4568   | Submitted | Reviewed | Denied   |          | Texas      | Utah       | Illinois   | NewYork    |
+--------+-----------+----------+----------+----------+------------+------------+------------+------------+

+--------+-----------+------------+
| emplid | status    | location   |
+--------+-----------+------------+
| 1234   | Submitted |            |
+--------+-----------+------------+
| 1234   | Reviewed  | California |
+--------+-----------+------------+
| 1234   | Approved  | Michigan   |
+--------+-----------+------------+
| 1234   | Accepted  |            |
+--------+-----------+------------+
| 4568   | Submitted | Texas      |
+--------+-----------+------------+
| 4568   | Reviewed  | Utah       |
+--------+-----------+------------+
| 4568   | Denied    | Illinois   |
+--------+-----------+------------+
| 4568   |           | Newyork    |
+--------+-----------+------------+
harsha87
  • 55
  • 6

2 Answers2

0

What tool can you load your data file into? If you could load it into any SQL compliant database, you could use a SQL Query such as:

INSERT INTO master_status_table
(SELECT emplid, status_0, location_0) as x)
GO

INSERT INTO master_status_table
(SELECT emplid, status_1, location_1) as y)
GO

INSERT INTO master_status_table
(SELECT emplid, status_2, location_2) as z)
GO

But, it sounds like you can't get it into a database in the first place. So, you could try to load it into an EXCEL spreadsheet. Then, in the spreadsheet, let's say, your original data is in one sheet called 'original data', then you would create another sheet called, say, 'status 0' and in that sheet, you would have a formula to display the contents of the same row of data from column 'A' of 'original data' (employee id), and columns 'B', and 'F' for status and location. Then sheet 'status 1' would have the same but columns 'C' and 'G' for status and location, etc. Once you have all of your 'status' sheets, you can export each one as a .CSV file and import that directly into your master_status_table.

I know this will still be a very manual process, but it should be possible.

Michael Dougan
  • 1,698
  • 1
  • 9
  • 13
  • Thanks for your reply @Michael_Dougan. I have tried using the pandas module in python for data reshaping and that has fixed most of the problem. – harsha87 Jul 25 '19 at 10:07
0

If the data are in a CSV format, you could try normalizing the data with un-xtab.py (https://pypi.org/project/un-xtab/). un-xtab imports the data into SQLite, which should accommodate more than 7,000 columns. Documentation is in the doc subdirectory of the Bitbucket repository at https://bitbucket.org/rdnielsen/un-xtab/src/default/.

rd_nielsen
  • 2,407
  • 2
  • 11
  • 18
  • Thanks for your reply @rd_nielsen. I have tried using the pandas module in python for data reshaping and that has fixed most of the problem. – harsha87 Jul 25 '19 at 10:07