I have file1.csv that looks like this:
99495
123346
1132013
and file2.csv that looks like this:
99495,141
99495,138
123346,1
Both files have been sorted using:
sort -t, -k1n file1_unsorted.csv > file1.csv
sort -t, -k1n file2_unsorted.csv > file2.csv
I tried sorting using -k1g also, but with no difference in final result.
When I join the two on the first field of both CSVs,
join -t, -o 2.1 2.2 -1 1 -2 1 file1.csv file2.csv > joined.csv
I get the errors:
join: file1.csv:2: is not sorted: 123346
join: file2.csv:3: is not sorted: 123346,1
To me it seems like a problem of the 'sort' program sorting by field whereas the 'join' program trying to join while checking the sort on the first digit. Since 1 is smaller than 9, it says that the numbers are not sorted. I am guessing that padding the numbers with 0 will work but I don't want to do that. I want to be able to get join to recognize that sorting has been done by the whole field.
Please note that when I tried the above minimum working examples, the join did occur even though sort -c throws errors. But when I tried it on my (large) actual datasets, I get the errors I post above.
I am using version 8.28 of both join and sort on Ubuntu WSL on Windows 10.