1

I have a csv file, with a layout as follows:

Easting Northing Elevation_OD Chainage FC Profile Reg_ID
245932.499 200226.982 5.464 148.997 CE N/A 8c12.11_021
245932.460 200225.448 5.214 150.530 S N/A 8c12.11_021
245932.485 200224.993 5.111 153.222 S N/A 8c12.1_001

The spreadsheet has ~55,000 rows and I need to be able to split the CSV based on specific research units - the '8c12.11' specifies a particular unit and the '_021' specifies a particular transect line and can be ignored. There are numerous units within this file and the character length of the units can vary e.g. '8c12.1' and I'm uncertain as to how to compensate for this alteration. Essentially the split needs to be able to identify the correct unit:

8c12.11 - The characters prior to the transect line (e.g _021)

and then compile all of the associated rows that correspond to this unit, into a new table.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
Adam Johns
  • 79
  • 5

3 Answers3

2

If your data is in data, you can loop over all rows. In each row, call a strsplit() on the seventh column, splitting on the underscore. Then, either collect all unique units, or simply verify it against your desired unit.

Here's a quick mock-up of a code that does the latter:

idx_keep = zeros(size(data,1),1); % Initialise output indices
for ii = 1:size(data,1)
    B = strsplit(data(ii, 7), '_');  % Split the string
    if strcmp(B{1}, '8c12.11')
        idx_keep(ii) = 1;  % keep those indices that match
    end
end

my_unit = data(idx_keep,:);  % Now contains only the rows with unit == 8c12.11

In case you also want to check for specific transect lines, also check B{2} == '021'.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
2

The easiest way to do this is using extractBefore

data.Unit = extractBefore( data.Reg_ID, '_' );
Wolfie
  • 27,562
  • 7
  • 28
  • 55
1
data = 
readtable('Input/Wales_Interim_23/Pembrokeshire_23_Finished.csv');
unit_str = string(data.Reg_ID);
split = cellfun(@(x) strsplit(x, '_'), unit_str, 'UniformOutput', 
false);
split_unit_cell = cell2table(split);
split_unit = splitvars(split_unit_cell)
unit = table2array(split_unit(:,1));

data.Unit = unit;
Ids = unique(data.Unit);
NID = numel(Ids);

This allows me to identify the amount of unique units within the CSV file, after splitting the unit name from the Reg_Id column.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
Adam Johns
  • 79
  • 5