-1

I have a CSV file in this format:

#Time,CPU,Data
x,0,a
x,1,b
y,0,c
y,1,d

I want to transform it into this

#Time,CPU 0 Data,CPU 1 Data
x,a,b
y,c,d

But I don't know the number of CPU cores there will be in a system (represented by the CPU column). I also have multiple columns of data (not just the singular data column).

How would I go about doing this?

Example input

# hostname,interval,timestamp,CPU,%user,%nice,%system,%iowait,%steal,%idle
hostname,600,2018-07-24 00:10:01 UTC,-1,5.19,0,1.52,0.09,0.13,93.07
hostname,600,2018-07-24 00:10:01 UTC,0,5.37,0,1.58,0.15,0.15,92.76
hostname,600,2018-07-24 00:10:01 UTC,1,8.36,0,1.75,0.08,0.1,89.7
hostname,600,2018-07-24 00:10:01 UTC,2,3.87,0,1.38,0.07,0.12,94.55
hostname,600,2018-07-24 00:10:01 UTC,3,3.16,0,1.36,0.05,0.14,95.29
hostname,600,2018-07-24 00:20:01 UTC,-1,5.13,0,1.52,0.08,0.13,93.15
hostname,600,2018-07-24 00:20:01 UTC,0,4.38,0,1.54,0.13,0.15,93.8
hostname,600,2018-07-24 00:20:01 UTC,1,5.23,0,1.49,0.07,0.11,93.09
hostname,600,2018-07-24 00:20:01 UTC,2,5.26,0,1.53,0.07,0.12,93.03
hostname,600,2018-07-24 00:20:01 UTC,3,5.64,0,1.52,0.04,0.12,92.68

This would be the output for this file: (CPU -1 turns into CPU ALL)(also the key value is just the timestamp (the hostname and interval stay constant)

# hostname,interval,timestamp,CPU ALL %user,CPU ALL %nice,CPU ALL %system,CPU ALL %iowait,CPU ALL %steal,CPU ALL %idle,CPU 0 %user,CPU 0 %nice,CPU 0 %system,CPU 0 %iowait,CPU 0 %steal,CPU 0 %idle,CPU 1 %user,CPU 1 %nice,CPU 1 %system,CPU 1 %iowait,CPU 1 %steal,CPU 1 %idle,CPU 2 %user,CPU 2 %nice,CPU 2 %system,CPU 2 %iowait,CPU 2 %steal,CPU 2 %idle,CPU 3 %user,CPU 3 %nice,CPU 3 %system,CPU 3 %iowait,CPU 3 %steal,CPU 3 %idle
hostname,600,2018-07-24 00:10:01 UTC,5.19,0,1.52,0.09,0.13,93.07,5.37,0,1.58,0.15,0.15,92.76,8.36,0,1.75,0.08,0.1,89.7,3.87,0,1.38,0.07,0.12,94.55,3.16,0,1.36,0.05,0.14,95.29
hostname,600,2018-07-24 00:20:01 UTC,5.13,0,1.52,0.08,0.13,93.15,4.38,0,1.54,0.13,0.15,93.8,5.23,0,1.49,0.07,0.11,93.09,5.26,0,1.53,0.07,0.12,93.03,5.64,0,1.52,0.04,0.12,92.68
Community
  • 1
  • 1
STemma
  • 21
  • 4
  • @HåkonHægland I accidentally mistyped the example. my bad. – STemma Jul 29 '18 at 18:22
  • 1
    @Ed Morton I will add the output of the csv now – STemma Jul 29 '18 at 18:23
  • I've removed the spurious newlines from your "example output" but I've had to guess. Please check to make sure it's correct. You should take care with what you post, as a single character can make all the difference to whether a program works or not. – Borodin Jul 29 '18 at 19:47
  • 1
    @STemma: You are lucky to have any responses to this at all. Posting a requirement without any evidence of trying to solve this yourself, and spamming all the scripting languages you know in the tags, is very poor form. *Stack Overflow* isn't an ordinary help forum, or really a forum at all. You should think of your question as if it were a new page on *Wikpedia*, and the content should be unique and well-researched. Getting other people to do your work for you isn't part of our purpose. – Borodin Jul 29 '18 at 19:57
  • 1
    **To everyone commenting:** please stop with the personal back-and-forth in the comments, and please don’t engage in blind upvoting *or* downvoting. – elixenide Jul 29 '18 at 20:40
  • I'll make sure to keep that in mind next time I post here. The new output is good too. – STemma Jul 29 '18 at 23:31
  • So is the key value that you want per-CPU data for REALLY just the timestamp or is it the combination of hostname+interval+timestamp? If so, edit your question to make that clear. If not, edit your example to show what the output would look like given different hostnames and/or intervals for a given timestamp. – Ed Morton Jul 30 '18 at 00:27
  • @EdMorton I have clarified the question with information about the key. – STemma Jul 30 '18 at 13:17
  • OK, let me know if you have any questions on how to adapt the answer I posted to produce the new output you want. – Ed Morton Jul 30 '18 at 13:19
  • @EdMorton In general, no. Users are allowed to downvote as many answers as they like, even if it's all of them. Also, please be aware that there are multiple users who are prone to mass-downvoting, not just one, and I don't see anything in this particular Q&A to indicate who may have done it here. – elixenide Jul 30 '18 at 13:42

2 Answers2

1

Your question isn't clear and doesn't contain the expected output for your posted larger/presumably more realistic sample CSV so idk what output you were hoping for but this will show you the right approach at least:

$ cat tst.awk
BEGIN{
    FS = OFS = ","
}
NR==1 {
    for (i=1; i<=NF; i++) {
        fldName2nmbr[$i] = i
    }
    tsFldNmbr  = fldName2nmbr["timestamp"]
    cpuFldNmbr = fldName2nmbr["CPU"]
    next
}
{
    tsVal  = $tsFldNmbr
    cpuVal = $cpuFldNmbr

    if ( !(seenTs[tsVal]++) ) {
        tsVal2nmbr[tsVal]  = ++numTss
        tsNmbr2val[numTss] = tsVal
    }

    if ( !(seenCpu[cpuVal]++) ) {
        cpuVal2nmbr[cpuVal]  = ++numCpus
        cpuNmbr2val[numCpus] = cpuVal
    }

    tsNmbr  = tsVal2nmbr[tsVal]
    cpuNmbr = cpuVal2nmbr[cpuVal]

    cpuData = ""
    for (i=1; i<=NF; i++) {
        if ( (i != tsFldNmbr) && (i != cpuFldNmbr) ) {
            cpuData = (cpuData == "" ? "" : cpuData OFS) $i
        }
    }
    data[tsNmbr,cpuNmbr] = cpuData
}
END {
    printf "%s", "timestamp"
    for (cpuNmbr=1; cpuNmbr<=numCpus; cpuNmbr++) {
        printf "%sCPU %s Data", OFS, cpuNmbr2val[cpuNmbr]
    }
    print ""

    for (tsNmbr=1; tsNmbr<=numTss; tsNmbr++) {
        printf "%s", tsNmbr2val[tsNmbr]
        for (cpuNmbr=1; cpuNmbr<=numCpus; cpuNmbr++) {
            printf "%s\"%s\"", OFS, data[tsNmbr,cpuNmbr]
        }
        print ""
    }
}

.

$ awk -f tst.awk file
timestamp,CPU -1 Data,CPU 0 Data,CPU 1 Data,CPU 2 Data,CPU 3 Data
2018-07-24 00:10:01 UTC,"hostname,600,5.19,0,1.52,0.09,0.13,93.07","hostname,600,5.37,0,1.58,0.15,0.15,92.76","hostname,600,8.36,0,1.75,0.08,0.1,89.7","hostname,600,3.87,0,1.38,0.07,0.12,94.55","hostname,600,3.16,0,1.36,0.05,0.14,95.29"
2018-07-24 00:20:01 UTC,"hostname,600,5.13,0,1.52,0.08,0.13,93.15","hostname,600,4.38,0,1.54,0.13,0.15,93.8","hostname,600,5.23,0,1.49,0.07,0.11,93.09","hostname,600,5.26,0,1.53,0.07,0.12,93.03","hostname,600,5.64,0,1.52,0.04,0.12,92.68"

I put the per-CPU data within double quotes so you could import it to Excel or similar without worrying about the commas between the sub-fields.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    I should be able to work with this. It's my fault the question was unclear, so I will make the question clearer, but thank you for your response. – STemma Jul 29 '18 at 18:24
-1

If we assume that the CSV input file is sorted according to increasing timestamps, you could try something like this:

use feature qw(say);
use strict;
use warnings;

my $fn = 'log.csv';
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my $header = <$fh>;
my %info;
my @times;
while ( my $line = <$fh> ) {
    chomp $line;
    my ( $time, $cpu, $data ) = split ",", $line;
    push @times, $time if !exists $info{$time};
    push @{ $info{$time} }, $data;
}
close $fh;

for my $time (@times) {
    say join ",", $time, @{ $info{$time} };
}

Output:

x,a,b
y,c,d
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174