Script to Parse and Change Numbers

Question

I am working with numbers a lot when editing a particular type of file, and it's mostly tedious work. The file has a format like this:

damagebase = 8.834
    "abc_foo.odf" 3.77
    "def_bar.odf" 3.77
    "ghi_baz.odf" 3.77
    "jkl_blah.odf" 4.05
    ...

What would you recommend for writing a script that parses this and lets me programmatically change each number?

Language: i use C#, some F# (noob), and Lua. If you suggest regexes, could you provide specific ones as i am not familiar with them?

More detail please. You have two different line formats there. does the 'damagebase' format recur in the file at all or is it a header of some sort? Do you want to programatically change *all* the numbers or just the non-header ones? How complicated is the method to change the number? Does it involve a simple addition or is it more complicated, like applying a running average? — Jherico, Jun 22 '09 at 23:18
The damagebase line is a header, and i want to just be able to apply a multipler to the number on each line (including the header). — RCIX, Jun 22 '09 at 23:25
I appreciate that you feel more comfortable with a c# answer, but generally speaking, you really should learn enough perl or awk so that you don't have to resort to c# for these kinds of problems. — anthony, Jun 23 '09 at 02:55
It's the regexes that I have a problem with. "Did I use the right number batching character here? how do I match this? eek! it ate my string..." :D — RCIX, Jun 23 '09 at 05:18

anthony · Answer 1 · 2009-06-22T23:46:37.263

4

Perl is pretty good for stuff like this. Here's a perl script that will do what you want.

#!/usr/bin/env perl

$multiplier = 2.0;

while (<>)
{
    $n = /=/ ? 2 : 1;
    @tokens = split;
    $tokens[$n] *= $multiplier;

    print "\t" if not /=/;
    print join(' ', @tokens) . "\n";
}

Usage:

./file.pl input_file > output_file

edited Jun 22 '09 at 23:46

answered Jun 22 '09 at 23:39

anthony

40,424
5
55
128

Perl is really better for these things. – nik Jun 23 '09 at 02:24
To show how this can be easily scripted I have added an AWK form in another answer. – nik Jun 23 '09 at 02:31

score 2 · Answer 2 · 2009-06-23T00:55:03.630

2

If that's really all you want to do, use awk:

awk '{$NF *= 2.5 ; print }' < input_file > output_file

EDITED: All right, if you want to keep the whitespace as you describe, this should work (although it's getting inelegant).

awk '{$NF *= 2.5} /^\"/{print "\t" $0} !/^\"/{print}' < input_file > output_file

edited Jun 23 '09 at 00:55

answered Jun 23 '09 at 00:08

this eats the leading tabs on non header lines. – anthony Jun 23 '09 at 00:37
um, i haven't really heard of awk. what is it? – RCIX Jun 23 '09 at 00:42
ok, you have crazy awk skills. i would vote this +100 if i could. – anthony Jun 23 '09 at 04:53

nik · Answer 3 · 2009-06-23T02:37:45.937

1

You can use AWK like this (note how the formatting was converted easily for the purpose),

sed 's/damagebase =/damagebase=/g' input.txt |\
    awk '{printf "     %s %s\n",$1,3.1*$2}' |\
    sed 's/.*damagebase=/damagebase =/g'

I am multiplying the 2nd column by 3.1 in this sample script.
Note that to restore your formatting,
there is a TAB inserted at the start of the printf and,
the two sed commands translate from-and-back your format to a suitable one for the AWK command

edited Jun 23 '09 at 02:37

answered Jun 23 '09 at 02:30

nik

13,254
3
41
57

Ah, there is already another AWK answer. Did not notice it. Yet, you can see the varied ways scripts can be written for your purpose. – nik Jun 23 '09 at 02:33

Greg Bacon · Accepted Answer · 2009-06-23T02:58:47.610

You can match runs of non-whitespace and punt to Double.Parse:

int multiplier = 3;

string input =
  "damagebase = 8.834\n" +
  "  \"abc.odf\" 3.77\n" +
  "  \"def.odf\" 3.77\n" +
  "  \"ghi.odf\" .77\n" +
  "  \"jkl.odf\" -4.05\n" +
  "  \"mno.odf\" 5\n";

Regex r = new Regex(@"^(\w+)\s*=\s*(\S+)" +
                    @"(?:\s+""([^""]+)""\s+(\S+))+",
                    RegexOptions.Compiled | RegexOptions.Multiline);

Match m = r.Match(input);
if (m.Success) {
  double header = Double.Parse(m.Groups[2].Value);
  Console.WriteLine("{0} = {1}", m.Groups[1].Value,
                                 header * multiplier);

  CaptureCollection files = m.Groups[3].Captures;
  CaptureCollection nums  = m.Groups[4].Captures;
  for (int i = 0; i < files.Count; i++) {
    double val = Double.Parse(nums[i].Value);
    Console.WriteLine(@"  ""{0}"" {1}", files[i].Value,
                                        val * multiplier);
  }
}
else
  Console.WriteLine("no match");

Running it gives

damagebase = 26.502
  "abc.odf" 11.31
  "def.odf" 11.31
  "ghi.odf" 2.31
  "jkl.odf" -12.15
  "mno.odf" 15

RCIX · Answer 5 · 2009-06-23T04:40:13.933

I tried

static void Main(string[] args)
{
    Console.WriteLine("Please enter the multiplier:");
    string stringMult = Console.ReadLine();
    int multiplier;
    Int32.TryParse(stringMult, out multiplier);
    StreamReader sr = new StreamReader(@"C:\Users\[obscured]\Desktop\Fleetops Mod\Data To Process.txt", true);
    string input = sr.ReadToEnd();
    sr.Close();
    StreamWriter sw = new StreamWriter(@"C:\Users\[obscured]\Desktop\Fleetops Mod\Data To Process.txt", false);
    Regex r = new Regex(@"^(\w+)\s*=\s*(\S+)" +
            @"(?:\s+""([^""]+)""\s+(\S+))+",
            RegexOptions.Compiled | RegexOptions.MultiLine);
    Match m = r.Match(input);
    if (m.Success) {
        double header = Double.Parse(m.Groups[2].Value);
        sw.WriteLine("{0} = {1}", m.Groups[1].Value,
                         header * multiplier);
        CaptureCollection files = m.Groups[3].Captures;
        CaptureCollection nums  = m.Groups[4].Captures;
        for (int i = 0; i < files.Count; i++) {
            double val = Double.Parse(nums[i].Value);
            sw.WriteLine(@"  ""{0}"" {1}", files[i].Value,
                                val * multiplier);
        }
    }
    else
        Console.WriteLine("no match");
    sw.Close();
    Console.WriteLine("Done!");
    Console.ReadKey();
}

(thanks gbacon) and it comes back with "no match" even when i put in the right data. Why does it do this? Here's the test data:

damagebase = 8.098
    "bor_adaptor_03.odf" 3.77
    "bor_adaptor_13.odf" 3.77
    "bor_adaptor_23.odf" 3.77
    "bor_adaptor_33.odf" 4.05
    "bor_adaptor_R3.odf" 3.77
    "bor_adaptor_T3.odf" 3.77
    "bor_cube_BHHHMM.odf" 6.48
    "bor_cube_BRHHHM.odf" 4.52
    "bor_cube_BRHHMM.odf" 6.48
    "bor_cube_BTHHHM.odf" 4.52
    "bor_cube_BTHHMM.odf" 6.48
    "bor_cube_BTRHHM.odf" 4.52
    "bor_cube_BTRHMM.odf" 6.48
    "bor_cube_BTTHHM.odf" 4.52
    "bor_cube_BTTHMM.odf" 6.48
    "bor_cube_BTTRHM.odf" 4.52
    "bor_cube_BTTRMM.odf" 6.48
    "bor_cube_BTTTHM.odf" 4.52
    "bor_cube_BTTTMM.odf" 6.48
    "bor_cube_BTTTRM.odf" 4.52
    "bor_cube_RHHHMM.odf" 6.48
    "bor_cube_THHHMM.odf" 6.48
    "bor_cube_TRHHHM.odf" 4.52
    "bor_cube_TRHHMM.odf" 6.48
    "bor_cube_TTHHHM.odf" 4.52
    "bor_cube_TTHHMM.odf" 6.48
    "bor_cube_TTRHHM.odf" 4.52
    "bor_cube_TTRHMM.odf" 6.48
    "bor_cube_TTTHHM.odf" 4.52
    "bor_cube_TTTHMM.odf" 6.48
    "bor_cube_TTTRHM.odf" 4.52
    "bor_cube_TTTRMM.odf" 6.48
    "dom_battle_cruiserY2r6.odf" 4.123
    "dom_battle_cruiserYr6.odf" 4.123
    "dom_battle_cruiserZ2r6.odf" 4.123
    "dom_battle_cruiserZr6.odf" 4.123
    "dom_battle_cruiser_fed2r6.odf" 4.123
    "dom_battle_cruiser_fedr6.odf" 4.123
    "dom_defenderr4.odf" 7.775
    "dom_defenderr5.odf" 7.452
    "dom_defenderr6.odf" 3.793
    "dom_dreadnought_borr4.odf" 3.77
    "dom_dreadnought_borr5.odf" 3.77
    "dom_dreadnought_borr6.odf" 3.77
    "dom_dreadnought_fedr4.odf" 3.77
    "dom_dreadnought_fedr5.odf" 3.77
    "dom_dreadnought_fedr6.odf" 3.77
    "dom_dreadnought_klir4.odf" 3.77
    "dom_dreadnought_klir5.odf" 3.77
    "dom_dreadnought_klir6.odf" 3.77
    "dom_dreadnought_romr4.odf" 3.77
    "dom_dreadnought_romr5.odf" 3.77
    "dom_dreadnought_romr6.odf" 3.77
    "dom_intercept_destr4.odf" 5.346
    "dom_intercept_destr5.odf" 2.673
    "dom_intercept_destr6.odf" 2.673
    "dom_intercept_dest_romr4.odf" 5.346
    "dom_intercept_dest_romr5.odf" 2.673
    "dom_intercept_dest_romr6.odf" 2.673
    "fed_ambassadorMr6.odf" 5.67
    "fed_ambassadorr6.odf" 5.67
    "fed_intrepidYr6.odf" 5.67
    "fed_intrepidZr6.odf" 5.67
    "fed_intrepid_borr6.odf" 5.67
    "fed_mirandaii.odf" 5.905
    "fed_mirandaiiM.odf" 5.905
    "fed_mirandaiiMr2.odf" 5.905
    "fed_mirandaiiMr3.odf" 5.905
    "fed_mirandaiiMr4.odf" 5.905
    "fed_mirandaiiMr5.odf" 5.905
    "fed_mirandaiiMr6.odf" 5.905
    "fed_mirandaiir2.odf" 5.905
    "fed_mirandaiir3.odf" 5.905
    "fed_mirandaiir4.odf" 5.905
    "fed_mirandaiir5.odf" 5.905
    "fed_mirandaiir6.odf" 5.905
    "fed_monsoonr4.odf" 4.782
    "fed_monsoonr5.odf" 2.31
    "fed_monsoonr6.odf" 3.726
    "fed_monsoonZr4.odf" 4.782
    "fed_monsoonZr5.odf" 2.31
    "fed_monsoonZr6.odf" 3.726
    "fed_monsoon_bor.odf" 4.52
    "fed_monsoon_borr2.odf" 4.52
    "fed_monsoon_borr3.odf" 4.52
    "fed_monsoon_borr4.odf" 6.32
    "fed_monsoon_borr5.odf" 3.315
    "fed_monsoon_borr6.odf" 2.916
    "fed_monsoon_klir4.odf" 4.782
    "fed_monsoon_klir5.odf" 2.31
    "fed_monsoon_klir6.odf" 3.726
    "fed_sovereignr4.odf" 6.69
    "fed_sovereignr5.odf" 5.51
    "fed_sovereignr6.odf" 5.51
    "fed_sovereignYr4.odf" 6.69
    "fed_sovereignYr5.odf" 5.51
    "fed_sovereignYr6.odf" 5.51
    "kli_brelr4.odf" 7.452
    "kli_brelr5.odf" 6.69
    "kli_brelr6.odf" 6.69
    "kli_brelZr4.odf" 7.452
    "kli_brelZr5.odf" 6.69
    "kli_brelZr6.odf" 6.69
    "kli_brel_borr4.odf" 7.452
    "kli_brel_borr5.odf" 6.69
    "kli_brel_borr6.odf" 6.69
    "kli_brel_romr4.odf" 7.452
    "kli_brel_romr5.odf" 6.69
    "kli_brel_romr6.odf" 6.69
    "kli_edjenr4.odf" 7.452
    "kli_edjenr5.odf" 6.69
    "kli_edjenr6.odf" 6.69
    "kli_kvortr6.odf" 6.69
    "kli_kvortZr6.odf" 6.69
    "kli_kvort_fedr6.odf" 6.69
    "rom_generix_dreadr4.odf" 7.723
    "rom_generix_dreadr5.odf" 7.21
    "rom_generix_dreadr6.odf" 7.21
    "rom_generix_dreadYr4.odf" 7.723
    "rom_generix_dreadYr5.odf" 7.21
    "rom_generix_dreadYr6.odf" 7.21
    "rom_generix_dread_klir4.odf" 7.723
    "rom_generix_dread_klir5.odf" 7.21
    "rom_generix_dread_klir6.odf" 7.21

My theory is that because the whitespace preceding each non-header line is a tab (and it won't show up that way here), the regex doesn't match. In case you're wondering, the whitespace IS important.

Add RegexOptions.Multiline to the call to the Regex constructor. The same revision is in my answer. — Greg Bacon, Jun 23 '09 at 02:58
It ran fine for me on the first try, but I was able to reproduce the behavior you saw by adding a blank line before the header line. In regular expressions, ^ means "beginning of string," but with the Multiline option, it can match at the beginning of any line. Also note that \s matches whitespace, which includes spaces and tabs. — Greg Bacon, Jun 23 '09 at 03:05
Now it runs but every number in the output is set to zero. I'll futz around with it a little to see if i can get it working... is there anything different that would cause this? — RCIX, Jun 23 '09 at 04:38
I figured out why. Somehow the multiplier is getting set to zro. Why is this? i don't see anything wrong with the input number i'm giving it... — RCIX, Jun 23 '09 at 04:45
I got it. It was a matter of fixing the input type. Thanks for the help! — RCIX, Jun 23 '09 at 04:55

Script to Parse and Change Numbers

5 Answers5