I have been working on a script to concatenate multiple csv files into a single, large csv. The csv's contain names of folders and their respective sizes, in a 2-column setup with the format "Size, Projectname"
Example of a single csv file:
49747851728,ODIN
32872934580,_WORK
9721820722,LIBRARY
4855839655,BASELIGHT
1035732096,ARCHIVE
907756578,USERS
123685100,ENV
3682821,SHOTGUN
1879186,SALT
361558,SOFTWARE
486,VFX
128,DNA
For my current test I have 25 similar files, with different numbers in the first column.
I am trying to get this script to do the following:
- Read each csv file
- For each Project it sees, scan the outputfile if that Project was already printed to the file. If not, print the Projectname
- For each file, for each Project, if the Project was found, print the Size to the output csv.
However, I need the Projects to all be on textline 1, comma separated, so I can use this outputfile as input for a javascript graph. The Sizes should be added in the column below their projectname.
My current script:
csv_folder=$(echo "$1" | sed 's/^[ \t]*//;s/\/[ \t]*$//')
csv_allfiles="$csv_folder/*.csv"
csv_outputfile=$csv_folder.csv
echo -n "" > $csv_outputfile
for csv_inputfile in $csv_allfiles; do
while read line && [[ $line != "" ]]; do
projectname=$(echo $line | sed 's/^\([^,]*\),//')
projectfound1=$(cat $csv_outputfile | grep -w $projectname)
if [[ ! $projectfound1 ]]; then
textline=1
sed "${textline}s/$/${projectname}, /" >> $csv_outputfile
for csv_foundfile in $csv_allfiles; do
textline=$(echo $textline + 1 | bc )
projectfound2=$(cat $csv_foundfile | grep -w $projectname)
projectdata=$(echo $projectfound2 | sed 's/\,.*$//')
if [[ $projectfound2 ]]; then
sed "${textline}s/$/$projectdata, /" >> $csv_outputfile
fi
done
fi
done < $csv_inputfile
done
My current script finds the right information (projectname, projectdata) and if I just 'echo' those variables, it prints the correct data to a file. However, with echo it only prints in a long list per project. I want it to 'jump back' to line 1 and print the new project at the end of the current line, then run the loop to print data at the end of each next line.
I was thinking this should be possible with sed or awk. sed should have a way of inserting text to a specific line with
sed '{n}s/search/replace/'
where {n} is the line to insert to
awk should be able to do the same thing with something like
awk -v l2="$textline" -v d="$projectdata" 'NR == l2 {print d} {print}' >> $csv_outputfile
However, while replacing the sed commands in the script with
echo $projectname
echo $projectdata
spit out the correct information (so I know my variables are filled correctly) the sed and awk commands tend to spit out the entire contents of their current inputcsv; not just the line that I want them to.
Pastebin outputs per variant of writing to file
- https://pastebin.com/XwxiAqvT - sed output
- https://pastebin.com/xfLU6wri - echo, plain output (single column)
- https://pastebin.com/wP3BhgY8 - echo, detailed output per variable
- https://pastebin.com/5wiuq53n - desired output
As you see, the sed output tends to paste the whole contents of inputcsv, making the loop stop after one iteration. (since it finds the other Projects after one loop)
So my question is one of these;
- How do I make sed / awk behave the way I want it to; i.e. print only the info in my var to the current textline, instead of the whole input csv. Is sed capable of this, printing just one line of variable? Or
- Should I output the variables through 'echo' into a temp file, then loop over the temp file to make sed sort the lines the way I want them to? (Bear in mind that more .csv files will be added in the future, I can't just make it loop x times to sort the info)
- Is there a way to echo/print text to a specific text line without using sed or awk? Is there a printf option I'm missing? Other thoughts?
Any help would be very much appreciated.