0

Sorry to post such a rudimentary question, but I'm getting confused by all the different tutorials and examples (and slashes and hyphens and back-ticks oh my) so I figured I would get someone's experienced input.

I have a .csv which is obviously comma seperated that has several hundred lines which looks like this:

abcd-3096,62#,,100,,,25,,75,3, and it should be formatted like so:

{name: 'abcd-3096', weight : 62, some-field1: null, class: 100, some-field2: null, some-field3: null, unit-weight : 25, some-field4 : null, capacity : 75,   }

I know you'll either want to use awk or sed in order to replace it, and I'm more than fine with doing the formatting in several commands.

I don't expect anyone to format the whole line for me, but I'm hoping some one can show me how to prepend a column with some some text. I can't seem to find a reliable explanation of the command anywhere online.

Csteele5
  • 1,262
  • 1
  • 18
  • 36

1 Answers1

2

You can use negating character classes like [^,] for this:

sed -r 's/^([^,]*),([^,]*),([^,]*)/{ name: "\1", weight: "\2", somefield1: "\3" }/' file.csv

The example uses only 3 groups for simplicity ... but you get the idea.

If your system does not support sed -r (extended regex syntax), you need to use \(group\) instead of (group):

sed 's/^\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/{ name: "\1", weight: "\2", somefield1: "\3" }/' file.csv

In case you don't need to use sed, you can also use bash directly:

while IFS=',' read -r name weight somefield1 class somefield2 somefield3 unitweight capacity rest
do
    echo -e "{ name: \"$name\", weight: \"$weight\", somefield1: \"$somefield1\",";
    echo -e " class: \"$class\", somefield2: \"somefield2\" somefield3: \"$somefield3\",";
    echo -e " unitweight: \"$unitweight\", capacity: \"$capacity\" }";
done < file.csv
IFS=$' \t\n'

(taken from this answer by koola)

Community
  • 1
  • 1
Leon Adler
  • 2,993
  • 1
  • 29
  • 42
  • (All of these solutions assume that you have no commas in your data, as you stated in your comment on your question.) – Leon Adler Oct 20 '15 at 23:56
  • This is an excellent answer. For your first example, it looks like you are trying to negate the same thing 3 times. Is that in compensation for the triple comma part of my data? – Csteele5 Oct 21 '15 at 00:12
  • `([^,]*)` means *"capture 0 or more characters that are not a comma"*. So for 2 values the pattern is `([^,]*),([^,]*)`, matching "a value, then a comma, then a value". For each additional group, you'd add `,([^,]*)`. – Leon Adler Oct 21 '15 at 02:18
  • So if I wanted to specify for just a single column using sed, how would the following pseudo code: `sed -r /regex field#/'string' field#/`? – Csteele5 Oct 21 '15 at 13:26
  • `sed -r 's/([^,]*).*/{ "name": "\1" }/'`. The basic sed replace syntax is `sed 's/pattern/replacement'`. For regular expressions specifically I can recommend a read of http://www.regular-expressions.info/ – Leon Adler Oct 22 '15 at 14:16