How to loop through an output and format the result into a new report?

Question

I'm working on a script to process the output of a configuration file in a way that it results in valuable data.

The output is in the format:

[header]
attribute1 = integer
attribute2 = integer
[header]
attribute1 = integer
attribute2 = integer
...

Where there can be an unknown amount of stanzas with the same two attributes (with unknown integer values), but with different headers.

So far I have only been able to generate the number of different stanzas to use it as a loop counter, but I am not sure how to loop through the output, leave the header unchanged but sum the integer value of the two attributes and replace these with a new attribute with the sum such as,

[header]
new_attribute = integer
[header]
new_attribute = integer

I have looked into the command read but I am not sure how to generate the report I want with it.

John Kugelman · Answer 1 · 2019-06-03T22:38:09.297

1

while read header &&
      read name1 equal value1 &&
      read name2 equal value2
do
    echo "$header"
    echo "new_attribute = $((value1 + value2))"
done < input.cfg > output.cfg

This code assumes that the input is in exactly the proscribed format. It doesn't handle erroneous input robustly: misformatted lines, missing lines, unexpected backslashes, etc.

edited Jun 03 '19 at 22:38

answered Jun 03 '19 at 19:54

John Kugelman

349,597
67
533
578

Mr. Kugelman, what are your thoughts on [this exegesis of shell-based text processing](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice)? I have been thoroughly shamed for applying the shell to such problems, and I hope to learn your perspective. – vintnes Jun 03 '19 at 21:52
My own view: Stéphane Chazelas knows of which he speaks, and makes a compelling argument as to why text-processing in shell should be done *carefully* (and indeed, this answer's code *does* have numerous bugs, which the mini-essay on [unix.se] points out). OTOH, following the best practices set out in [BashFAQ #1](http://mywiki.wooledge.org/BashFAQ/001) addresses those bugs, so it's not hopeless, but simply a case where a great deal of care is called for. – Charles Duffy Jun 03 '19 at 21:55
1

I don't subscribe to any blanket rule that forbids shell scripting for correctness or efficiency reasons. Most of the tasks I deal with involve very small text files; I hardly ever care about efficiency when parsing short config files, for instance. Awk and especially sed are, frankly, ugly tools. I use them because they're available but I have no fondness for them. Sed isn't fun to use. I care a lot about correctness and efficiency but I also care about readability and familiarity, and the shell does well on the latter two measures. – John Kugelman Jun 03 '19 at 22:28
1

In other words, don't throw the baby out with the bath water. – John Kugelman Jun 03 '19 at 22:36

vintnes · Answer 2 · 2019-06-03T21:53:36.623

1

Please don't use the shell to process text files in bulk; it's slow and insecure. My favorite text processing tool is Awk, which you can learn more about with man awk.

In awk, NR refers to the Number Record, or line number. % is "modulo" or remainder, so if we know there are only three kinds of records, we can write the desired script very bluntly.

Try awk '{print NR%3, $0}' file to see the structure.

awk -F ' = ' '                                # Field Separator = space, equals, space
  NR%3 == 1 {print $0}                        # print header
  NR%3 == 2 {i=$2}                            # save second field as i
  NR%3 == 0 {print "new_attribute" FS i+$2}   # print string, field separator, and sum
' file

edited Jun 03 '19 at 21:53

answered Jun 03 '19 at 21:49

vintnes

2,014
7
16

Slow, I'll grant you (typically even a well-written script in ksh93, perhaps the fastest POSIX-compliant shell available, is a full base-10 order of magnitude behind awk), but "insecure"? Only if done badly. – Charles Duffy Jun 03 '19 at 21:52
1

Consider doing `{n=(NR%3)}` up front and then `n==1`, etc. for the existing blocks of code rather than having to re-calculate `NR%3` for every block. – Ed Morton Jun 04 '19 at 04:56

score 1 · Answer 3 · answered Jun 03 '19 at 23:45

1

Using a purpose-built library is much more robust. Especially so when compared to relying on lines appearing contiguously.

Here is a short script written in Python. It would be trivial to add tests for particular sections and attributes to ignore or pass through unchanged.

Using the input file new.ini:

$ cat test.ini
[header1]
attribute1 = 10
attribute2 = 12
[header2]
attribute1 = 23
attribute2 = 25

and the script transform_ini.py:

$ cat ini.py
#!/usr/bin/python3
import configparser
config = configparser.ConfigParser()
new_config = configparser.ConfigParser()

new_key = 'new_attribute'

config.read('test.ini')

for section in config.sections():
    val = 0

    for key in config[section]:
        val += int(config[section][key])

    new_config[section] = {}
    new_config[section][new_key] = str(val)

with open('new.ini', 'w') as configfile:
    new_config.write(configfile)

the result is new.ini:

$ cat new.ini 
[header1]
new_attribute = 22

[header2]
new_attribute = 48

The script favors Mapping Protocol Access and thus requires Python 3.2 or greater. I'm not using getint() because it seems to be classified as part of the legacy API.

Note that ConfigParser.read() closes the input file for you.

answered Jun 03 '19 at 23:45

Dennis Williamson

346,391
90
374
439

@TimBiegeleisen: When I passed 100K, I received an email asking for my address, which I supplied, but I never received anything. – Dennis Williamson Jun 04 '19 at 16:33
Seriously? I live in Singapore, and when I filled out that form I did receive some stuff, a T-shirt, coffee mug, and a few other trinkets. You (or maybe we) should raise a complaint about this I think. – Tim Biegeleisen Jun 04 '19 at 16:34
@TimBiegeleisen: I'm in the US so it's not like it fell in the ocean (assuming the origin was NY). It would have been nice to have, but I didn't get to worked up about not getting it. Although a few months later I sent a follow-up email to find out if it went missing. I never received a reply. – Dennis Williamson Jun 04 '19 at 16:43
There is a [post on the Meta site](https://meta.stackexchange.com/questions/329051/2019-do-six-figure-reputation-users-on-non-so-sites-still-get-swag) discussing the problem. Apparently SO is in the process of switching swag vendors, so it might take a while before we receive anything. – Tim Biegeleisen Jun 04 '19 at 23:55
@TimBiegeleisen: Thanks for finding that and letting me know. – Dennis Williamson Jun 05 '19 at 00:16

How to loop through an output and format the result into a new report?

3 Answers3