0

I have a file with many types of data record which I need to parse into structs.

I'd be grateful to learn of a idiomatic way -- if it exists -- of filling structs by record type. Something like python's namedtuple(*fields) constructor.

package main

import (
    "fmt"
    "strconv"
    "strings"
)

type X interface{}

type HDR struct {
    typer, a string
    b        int
}

type BDY struct {
    typer, c string
    d        int
    e        string
}

var lines string = `HDR~two~5
BDY~four~6~five`

func sn(s string) int {
    i, _ := strconv.Atoi(s)
    return i
}

func main() {
    sl := strings.Split(lines, "\n")
    for _, l := range sl {
        fields := strings.Split(l, "~")
        var r X
        switch fields[0] {
        case "HDR":
            r = HDR{fields[0], fields[1], sn(fields[2])} // 1
        case "BDY":
            r = BDY{fields[0], fields[1], sn(fields[2]), fields[3]} // 2
        }
        fmt.Printf("%T : %v\n", r, r)
    }
}

I'm specifically interested to learn if lines marked // 1 and // 2 can be conveniently replaced by code, perhaps some sort of generic decoder which allows the struct itself to handle type conversion.

Cœur
  • 37,241
  • 25
  • 195
  • 267
rorycl
  • 1,344
  • 11
  • 19
  • Using reflection will make your code much harder to read. The reflect package is for rare use cases, mostly to implement things like the csv library that you pointed to. Struct tags are read with the reflect package in that case. You do not want to use it in reality though, why would you complicate things like that? As I said, your code is easy to understand, break up the lines, split at `~` and parse the thing. Straight forward, easy to understand, easy to change, it has no dependencies. The problem with the csv library is that it does not make the code shorter or better to read. ... – gonutz Oct 22 '19 at 08:20
  • ... Looking at the csv library I see a zillion .go files, adding this complexity for such an easy problem will not benefit you. – gonutz Oct 22 '19 at 08:21
  • Thanks for your comments, @gonutz. I'm not sure what style you recommend (you seem to use single char variable names yourself!). Also you haven't qualified why you believe reflection is a bad idea. As previously mentioned in my edited comment, it would be good to know whether decoding (eg using github.com/jszwec/csvutil) is a better route? It's also not clear why you are proscribing complexity per se -- or perhaps that is just a "note to self"? In which case, thanks for sharing. Otherwise, I think you just hit a "nul points" for your comment -- on the Wogan scale! – rorycl Oct 22 '19 at 10:50
  • I don't know what that means. Please read my other comments for the qualification of why reflection is not the way to go. – gonutz Oct 22 '19 at 13:27
  • Trying to rewrite your code into an answer I realize that I do not understand what the code is supposed to do. What is the purpose of having these structs in the same interface{} form? Do you want to put them into an array? Why not treat them separately anyway? You seem to have a header and a body type. Why would you even want to treat them as the same thing anyway? – gonutz Oct 22 '19 at 13:45

1 Answers1

3

Use the reflect package to programmatically set fields.

A field must be exported to be set by the reflect package. Export the names by uppercasing the first rune in the name:

type HDR struct {
    Typer, A string
    B        int
}

type BDY struct {
    Typer, C string
    D        int
    E        string
}

Create a map of names to the type associated with the name:

var types = map[string]reflect.Type{
    "HDR": reflect.TypeOf((*HDR)(nil)).Elem(),
    "BDY": reflect.TypeOf((*BDY)(nil)).Elem(),
}

For each line, create a value of the type using the types map:

for _, l := range strings.Split(lines, "\n") {
    fields := strings.Split(l, "~")
    t := types[fields[0]]
    v := reflect.New(t).Elem()
    ...
}

Loop over the fields in the line. Get the field value, convert the string to the kind of the field value and set the field value:

    for i, f := range fields {
        fv := v.Field(i)
        switch fv.Type().Kind() {
        case reflect.String:
            fv.SetString(f)
        case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
            n, _ := strconv.ParseInt(f, 10, fv.Type().Bits())
            fv.SetInt(n)
        }
    }

This is a basic outline of the approach. Error handling is notabling missing: the application will panic if the type name is not one of the types mentioned in types; the application ignores the error returned from parsing the integer; the application will panic if there are more fields in the data than the struct; the application does not report an error when it encounters an unsupported field kind; and more.

Run it on the Go Playground.

Charlie Tumahai
  • 113,709
  • 12
  • 249
  • 242
  • 1
    Do NOT use the reflect package! Program it in a readable manner and you will be thankful for it in the future. – gonutz Oct 22 '19 at 06:53
  • This is a terrible approach to take. It is way complicated. I program in Go and the original question had a solution in Go. Why do I want to program in this strange reflection style which I have no idea about how to use. All this boiler plate code is completely unnecessary to solve the problem. I do not know if this is covering all cases, Int8, Int16 etc. Then what about the Uints? This is craziness. – gonutz Oct 22 '19 at 08:27
  • Your code is longer, harder to read and uses a very uncommon package. I have to know a lot of details that are irrelevant to the actual problem. – gonutz Oct 22 '19 at 08:34
  • 4
    @gonutz SO is not the right venue to push you personal opinions. The question was about how to fill structs programmatically and this answer answers that. It uses the same approach as many other packages in the standard library, most notably `encoding/json` which is used by many without complaining about "strange relfection style". Whether this approach is the correct one for OPs problem is for them to decide. – mkopriva Oct 22 '19 at 08:53
  • @mkopriva SO is for aiding programmers. I am helping make code better by giving advice based on my experience. Even though my comments might seem ranty, they are justified by solid arguments. – gonutz Oct 22 '19 at 09:51
  • @mkopriva Concerning your comment about the json package, yes this is a package widely used. USED, not implemented. It is in the standard library and was written and tested very thoroughly by people with intricate knowledge about the reflect package. One of Rob Pike's proverbs is "Reflection is never clear." See https://go-proverbs.github.io/ ... – gonutz Oct 22 '19 at 09:53
  • ... and it is really not. Using the json package is good, it makes code nicer. Writing reflection-based code is not, it makes things harder to read. – gonutz Oct 22 '19 at 09:54
  • 2
    @gonutz please share with us your answer to the question without using reflection. – colm.anseo Oct 22 '19 at 12:22
  • Cerise: Thanks for the very clear answer and elegant code. Do you believe this type of solution is worth turning into a custom decoder package? Am I right in thinking that deriving the source of decoding errors can be quite tricky through reflection, so putting it into a specific decoder package with lots of tests might be worthwhile? – rorycl Oct 22 '19 at 13:55
  • @rorycl do you know all the types up front? Because if you do not then your solution simply won't work and reflection is probably the only thing that can help you here (unless you favour code generation which is also an option). If, however, you know all the possible types up front and there isn't too many of them and they are easy to maintain then using your approach is probably the wiser choice here. Keep in mind that *how many* is "too many" and *what is and is not* "easy to maintain" will depend on the individual who has to work with that code. – mkopriva Oct 22 '19 at 15:21
  • @rorycl If you have a small number of types with a small number of fields, then the complexity of the reflection approach does not carry its weight. If reflection is warranted, then the decoding logic should be moved to a separate function and tests should be written for that function. The decision to create a package for that function and tests depends on context that you have not shared here and may be opinion based. – Charlie Tumahai Oct 22 '19 at 15:24
  • 2
    @rorycl If you have control over the data format, then you should at least consider using [CSV](https://en.wikipedia.org/wiki/Comma-separated_values). The standard [encoding/csv](https://godoc.org/encoding/csv) parses CSV and there are third party packages built around the encoding/csv package. – Charlie Tumahai Oct 22 '19 at 15:29
  • @CeriseLimón: thanks for the great advice. I'll have a play with some of the encoding/csv packages such as https://github.com/jszwec/csvutil. As I have many different structs and many fields of many types in each, I'll certainly move the structs and "filler" logic to a well-tested package. Thanks again. – rorycl Oct 22 '19 at 20:50