1

We routinely receive .CSV files from a contractor that consist of tables exported out of Revit. From a data perspective, they are less than useful so I am looking for a way to ingest these files and generate a properly laid out table.

Layout of files received

These come to us in individual files and—as you will note—the column headings are not consistent between tables (but follow either of these two formats).

LEVEL 2 - CLG CONNECTIONS TAKE-OFF
ITEM NO. COMMENTS
RCA225/54 684
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF
ITEM LENGTH NO.
362S162-54 1' - 10" 30
362S162-54 1' - 11 3/4" 4
362S162-54 2' - 5 3/4" 4
362S162-54 2' - 11 15/16" 14
362S162-54 3' - 7 1/4" 20
362S162-54 4' - 11 1/4" 28

Desired Outcome

I am looking for a method to import the .csv files (ideally just dumping them sequentially into one sheet for ease and speed) and reformat them into a usable layout.

LOCATION ITEM LENGTH NO. COMMENTS
LEVEL 2 - CLG CONNECTIONS TAKE-OFF RCA225/54 684
LEVEL 2 - INTERIOR WALLS CONNECTIONS TAKE-OFF RCA223-54 166
LEVEL 2 - INTERIOR WALLS CONNECTIONS TAKE-OFF Steel connector - WBAC162 360
LEVEL 2 - INTERIOR WALLS CONNECTIONS TAKE-OFF TIE PLATE - TP35 184
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 1' - 10" 30
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 1' - 11 3/4" 4
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 2' - 5 3/4" 4
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 2' - 11 15/16" 14
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 3' - 7 1/4" 20
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 4' - 11 1/4" 28

This scenario in general is one I encounter regularly from office staff laying out spreadsheets where their focus is presentation rather than ease of data extraction. Does anyone have a good method of "converting" from one to the other?

I've been trying to utilize regexmatch and/or query along with counta, split/join, and indirect to somehow extract a list of LOCATIONS and then count the number of rows beneath it to then extract that number of rows, but (a) I'm not sure this is a good solution and (b) I'm not getting the results I need.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
eikelben
  • 13
  • 2
  • look at Utilities.parseCsv() it can turn csv into a 2 d array which then can be stuck anywhere into a spreadsheet – Cooper Jan 11 '23 at 19:34

3 Answers3

1

Having those two scenarios, you may be able to sort it out with some auxiliary columns (Yes, you may avoid them, but calculations and formulas would be much longer!)

First Auxiliary Column in E, replicates the Level name with SCAN:

=SCAN(,A:A,LAMBDA(a,s,IF(REGEXMATCH(s,"^(?i)Level"),s,a)))

enter image description here

With SCAN again you can do a second auxiliary column to find scenario 1 and 2 considering your two tables:

=SCAN(,B:B,LAMBDA(a,s,IFS(s="","",s="LENGTH",1,s="NO.",2,s=s,a)))

enter image description here

And you'll be able to filter columns and re-arrange them with curly brackets:

={FILTER({E:E,A:D},F:F=1,A:A<>"ITEM");
FILTER({E:E,A:A,D:D,B:C},F:F=2,A:A<>"ITEM")}

If you want you can sort it through any column:

=SORT({FILTER({E:E,A:D},F:F=1,A:A<>"ITEM");
FILTER({E:E,A:A,D:D,B:C},F:F=2,A:A<>"ITEM")},1,1)

enter image description here enter image description here

Martín
  • 7,849
  • 2
  • 3
  • 13
  • This solution is the least demanding on the user and will allow me to set up a sheet in a spreadsheet in which the project manager can simply import/paste the received .csv files into with all of the other calculations being done on adjacent sheets (thereby allowing me to be hands off). @martín thanks for this solution. `SCAN` and `LAMBDA` are functions that are new to me, so I'll look into those. – eikelben Jan 13 '23 at 15:54
  • I'm glad it was useful! In this case, LAMBDA is an auxiliary for SCAN to name the two variables. Good luck! – Martín Jan 13 '23 at 16:10
  • Yeah I've been digging into it this morning. I currently have a convoluted way to translate the length dimensions shown above into decimal feet which is then compared against the nominal length for the material type to return the quantity needed to buy. I'm thinking this could be done more easily though the use of these new-to-me functions, `scan`/`map`/`lambda`. – eikelben Jan 13 '23 at 20:09
1

You may be best off by implementing this in Apps Script, but if the number of CSV files stays the same, it looks doable with a Google Sheets formula as well.

You are saying that the CSV files are imported individually. If you place each file in a Google Sheets tab of its own, such as Perimetral and CLG, you can use iferror() to copy the Location name from cell A1 on each tab to every row in the result, and to put blanks in columns where data is missing on a particular tab, like this:

=arrayformula( 
  query( 
    { 
      "Location", Perimetral!A2:C2, "Comments"; 
      iferror(Perimetral!A3:A/0, Perimetral!A1), Perimetral!A3:C, iferror(Perimetral!A3:A/0); 
      iferror(CLG!A3:A/0, CLG!A1), CLG!A3:A, iferror(CLG!A3:A/0), CLG!B3:B, CLG!C3:C 
    }, 
    "where Col2 is not null", 1 
  ) 
)

The above formula requires fine-tuning to make columns fall into the right slots. To avoid that, look up column positions dynamically with filter(), like this:

=arrayformula( 
  lambda( 
    headers, table1, table2, 
    query( 
      { 
        headers; 
        map( 
          headers, 
          lambda( 
            columnName, 
           iferror( 
               filter( 
                { iferror(offset(table1, 0, 0, rows(table1), 1) / 0, offset(table1, -2, 0, 1, 1)), table1 }, 
                columnName 
                = 
                { "Location", offset(table1, -1, 0, 1, columns(table1)) } 
              )
            ) 
          ) 
        ); 
        map( 
          headers, 
          lambda( 
            columnName, 
           iferror( 
               filter( 
                { iferror(offset(table2, 0, 0, rows(table2), 1) / 0, offset(table2, -2, 0, 1, 1)), table2 }, 
                columnName 
                = 
                { "Location", offset(table2, -1, 0, 1, columns(table2)) } 
              )
            ) 
          ) 
        ) 
      }, 
      "where Col2 is not null", 1 
    ) 
  )( 
    { "Location", "ITEM", "LENGTH", "NO.", "Comments" }, CLG!A3:Z, Perimetral!A3:Z 
  ) 
)

This latter formula looks more complex but it should actually be pretty easy to make it work with more tables by adding tables at the end, and putting in more copies of the map() block, as required.

The results look like this:

Location ITEM LENGTH NO. Comments
LEVEL 2 - CLG CONNECTIONS TAKE-OFF RCA225/54 684 COMMENTS
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 1' - 10" 30
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 1' - 11 3/4" 4
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 2' - 5 3/4" 4
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 2' - 11 15/16" 14
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 3' - 7 1/4" 20
LEVEL 2 - PERIMETRAL WALLS FRAMING TAKE-OFF 362S162-54 4' - 11 1/4" 28
doubleunary
  • 13,842
  • 3
  • 18
  • 51
  • This solution worked just as well as the one proposed by @Martín. However, since the files are received from am external source, we have no control over their naming conventions and while I would expect for them to remain consistent, there is the potential that they would differ between projects, etc., which would require updating the formula (or at very least spot checking it). Regardless, thank you for your time and effort! – eikelben Jan 13 '23 at 16:00
  • It is unclear what you are referring to with "naming conventions". If it refers to column names, you can collect them automatically by replacing the `{ "Location", "ITEM", ... }` array with `unique()`. If it refers to file names, well, the formula does not care what they are. If it refers to tab names, use `Sheet1`, `Sheet2` etc. instead of descriptive names, and keep using the same sheet names repeatedly. – doubleunary Jan 13 '23 at 16:06
  • The issue is regarding the latter – yes using generic sheet names like you suggest (`sheet1`, `sheet2`) would solve the issue of having to edit the formula each time. I think the only other downside is the fact that users generally duplicate and rename files. Having to clear out potentially 30+ sheets and reimport new data at least introduces the potential for the wrong data to be imported. The current process will be to combine the .csv files in terminal `cat *.csv > combined-files.csv` then import that single file. New projects then only consist of clearing that one sheet. – eikelben Jan 13 '23 at 20:06
  • Please [edit](https://stackoverflow.com/posts/75087752/edit) the question to add all relevant detail there instead of comments. – doubleunary Jan 13 '23 at 22:24
0

Another answer for your consideration. This one reads the CSV directly from the files. Note that the first file should conform to the formats you show with the second and every other file having the same format and headers.

class FileType {
  constructor (csv) {
    this.type = csv.shift()[0];
    this.columnHeaders = {};
    this.numColumnHeaders = 0;
    this.addColumnHeaders(csv.shift());
    this.values = csv;
  }
  addColumnHeaders(headers) {
    headers.forEach( (item,index) => {
        this.columnHeaders[item] = index;
        this.numColumnHeaders = index;
      }
    );
  }
  getValues(masterHeaders) {
    let values = this.values.map( row => {
        let results = [];
        for( key in masterHeaders ) {
          let index = this.columnHeaders[key];
          if( index !== undefined ) {
            results[masterHeaders[key]] = row[index];
           }
        }
        results.unshift(this.type)
        return results;
      }
    );
    return values;
  }
  toString() {
    console.log(this.type);
    console.log(this.columnHeaders);
    console.log(this.values);
  }
}

function combineCsvFiles() {
  try {
    let files = ["13hf..............","1CtL...................."];
    let types = [];
    files.forEach( fileId => {
        let file = DriveApp.getFileById(fileId);
        let csv = Utilities.parseCsv(file.getBlob().getDataAsString());
        types.push( new FileType(csv) );
      }
    );

    // combine headers
    // first file has comments header
    // every other file after the first has the same headers
    for( key in types[0].columnHeaders ) {
      if( types[1].columnHeaders[key] ===  undefined ) {
        types[1].numColumnHeaders = types[1].numColumnHeaders+1;
        types[1].columnHeaders[key] = types[1].numColumnHeaders;
      }
    }
    types.forEach( type => type.toString() );

    // now combine values arrays
    let masterHeaders = types[1].columnHeaders;
    let results = [[]];
    for( key in masterHeaders ) {
      results[0][masterHeaders[key]] = key;
    }
    results[0].unshift("LOCATION");
    types.forEach( type => {
        results = results.concat(type.getValues(masterHeaders));
      }
    );

    let sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1");
    sheet.getRange(1,1,results.length,results[0].length).setValues(results);
  }
  catch(err) {
    console.log("Error in combineCsvFiles: "+err)
  }
}

Results

enter image description here

References

TheWizEd
  • 7,517
  • 2
  • 11
  • 19
  • Admittedly this level of scripting is above my knowledge level. Would you be so kind as to briefly explain how I can implement it? I understand that I need to define some variables in the code, but am not certain of how to do so. – eikelben Jan 13 '23 at 16:14
  • The only variable is the file ids as I've shown in `let files = ["13hf..............","1CtL...................."];` Go to you drive and right click and "Get link" then "Copy link" and add it to the array. Remove some of the text `https://drive.google.com/file/d/`"1CtL................................."`/view?usp=share_link` – TheWizEd Jan 13 '23 at 19:42
  • Thanks for the clarification. Turns out I had that part right so something else—not sure what—was responsible for the `Unexpected error while getting the method or property getFileById on object DriveApp`. Regardless, it works flawlessly now. – eikelben Jan 13 '23 at 21:02