2

Is there a mechanism to apply a standard set of checks to detect and then transform a String to the detected type, using one of Jackson's standard text related libs (csv, json, or even jackson-core)? I can imagine using it along with a label associated with that value (CSV header, for example) to do something sorta like the following:

JavaTypeAndValue typeAndValue = StringToJavaType.fromValue(Object x, String label);  
typeAndValue.type() // FQN of Java type, maybe
typeAndValue.label() // where label might be a column header value, for example
typeAndValue.value() // returns Object  of typeAndValue.type()

A set of 'extractors' would be required to apply the transform, and the consumer of the class would have to be aware of the 'ambiguity' of the 'Object' return type, but still capable of consuming and using the information, given its purpose.

The example I'm currently thinking about involves constructing SQL DDL or DML, like a CREATE Table statement using the information from a List derived from evaluating a row from a csv file.

After more digging, hoping to find something out there, I wrote the start of what I had in mind.

Please keep in mind that my intention here isn't to present something 'complete', as I'm sure there are several things missing here, edge cases not addressed, etc.

The pasrse(List<Map<String, String>> rows, List<String> headers comes from the idea that this could be a sample of rows from a CSV file read in from Jackson, for example.

Again, this isn't complete, so I'm not looking to pick at everything that's wrong with the following. The question isn't 'how would we write this?', it's 'is anyone familiar with something that exists that does something like the following?'.

import gms.labs.cassandra.sandbox.extractors.Extractor;
import gms.labs.cassandra.sandbox.extractors.Extractors;
import lombok.Builder;
import lombok.Getter;
import lombok.Setter;
import lombok.experimental.Accessors;

@Accessors(fluent=true, chain=true)
public class TypeAndValue
{

    @Builder
    TypeAndValue(Class<?> type, String rawValue){
        this.type = type;
        this.rawValue = rawValue;
        label = "NONE";
    }

    @Getter
    final Class<?> type;

    @Getter
    final String rawValue;

    @Setter
    @Getter
    String label;

    public Object value(){
        return Extractors.extractorFor(this).value(rawValue);
    }

    static final String DEFAULT_LABEL = "NONE";

}

A simple parser, where the parse came from a context where I have a List<Map<String,String>> from a CSVReader.

import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;

import java.util.*;
import java.util.function.BiFunction;

public class JavaTypeParser
{
public static final List<TypeAndValue> parse(List<Map<String, String>> rows, List<String> headers)
{
    List<TypeAndValue> typesAndVals = new ArrayList<TypeAndValue>();
    for (Map<String, String> row : rows) {
        for (String header : headers) {
            String val = row.get(header);
            TypeAndValue typeAndValue =
                    //  isNull, isBoolean, isNumber
                    isNull(val).orElse(isBoolean(val).orElse(isNumber(val).orElse(_typeAndValue.apply(String.class, val).get())));
            typesAndVals.add(typeAndValue.label(header));
        }
    }
  
}

public static Optional<TypeAndValue> isNumber(String val)
{
    if (!NumberUtils.isCreatable(val)) {
        return Optional.empty();
    } else {
        return _typeAndValue.apply(NumberUtils.createNumber(val).getClass(), val);
    }
}

public static Optional<TypeAndValue> isBoolean(String val)
{
    boolean bool = (val.equalsIgnoreCase("true") || val.equalsIgnoreCase("false"));
    if (bool) {
        return _typeAndValue.apply(Boolean.class, val);
    } else {
        return Optional.empty();
    }
}

public static Optional<TypeAndValue> isNull(String val){
    if(Objects.isNull(val) || val.equals("null")){
        return _typeAndValue.apply(ObjectUtils.Null.class,val);
    }
    else{
        return Optional.empty();
    }
}

static final BiFunction<Class<?>, String, Optional<TypeAndValue>> _typeAndValue = (type, value) -> Optional.of(
        TypeAndValue.builder().type(type).rawValue(value).build());

}

Extractors. Just an example of how the 'extractors' for the values (contained in strings) might be registered somewhere for lookup. They could be referenced any number of other ways, too.

import gms.labs.cassandra.sandbox.TypeAndValue;
import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;

import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Arrays;
import java.util.List;

public class Extractors
{

private static final List<Class> NUMS = Arrays.asList(
        BigInteger.class,
        BigDecimal.class,
        Long.class,
        Integer.class,
        Double.class,
        Float.class);

public static final Extractor<?> extractorFor(TypeAndValue typeAndValue)
{
    if (NUMS.contains(typeAndValue.type())) {
        return (Extractor<Number>) value -> NumberUtils.createNumber(value);
    } else if(typeAndValue.type().equals(Boolean.class)) {
        return  (Extractor<Boolean>) value -> Boolean.valueOf(value);
    } else if(typeAndValue.type().equals(ObjectUtils.Null.class)) {
        return  (Extractor<ObjectUtils.Null>) value -> null; // should we just return the raw value.  some frameworks coerce to null.
    } else if(typeAndValue.type().equals(String.class)) {
        return  (Extractor<String>) value -> typeAndValue.rawValue(); // just return the raw value.  some frameworks coerce to null.
    }
    else{
        throw new RuntimeException("unsupported");
    }
}
}

I ran this from within the JavaTypeParser class, for reference.

public static void main(String[] args)
{

    Optional<TypeAndValue> num = isNumber("-1230980980980980980980980980980988009808989080989809890808098292");
    num.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass());  // BigInteger
    });
    num = isNumber("-123098098097987");
    num.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass()); // Long
    });
    num = isNumber("-123098.098097987"); // Double
    num.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass());
    });
    num = isNumber("-123009809890898.0980979098098908080987"); // BigDecimal
    num.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass());
    });

    Optional<TypeAndValue> bool = isBoolean("FaLse");
    bool.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass()); // Boolean
    });

    Optional<TypeAndValue> nulll = isNull("null");
    nulll.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        //System.out.println(typeAndVal.value().getClass());  would throw null pointer exception
        System.out.println(typeAndVal.type()); // ObjectUtils.Null (from apache commons lang3)
    });

}
Gary Sharpe
  • 2,369
  • 8
  • 30
  • 51
  • I'm not sure I clearly understand what you want to do. – Giorgi Tsiklauri Sep 22 '20 at 18:35
  • Hi @GiorgiTsiklauri. I was looking for something that would do relatively simple type inference (e.g. int, String, boolean, double, etc.). The 'label' parameter is just a means of associating that values key, but this could just look like StringToJavaType.fromValue(Object x); I imagine there has to be something tucked inside of Jackson to do this, but I haven't found it, yet. – Gary Sharpe Sep 22 '20 at 18:57
  • Type Inference is not done by programmer, it's an intrinsic mechanics of JVM. If I'm not mistaken, you are getting to compile time constraint, to circumvent it somehow, which is not possible, and should not be. – Giorgi Tsiklauri Sep 22 '20 at 19:00
  • I think there are plenty of use cases for when we would like to look at a String and infer it's 'type'. An 'extractor' of sorts could be used to attempt a transform on the String, based on the needs of the moment, or somewhat generally acceptable rules. I'm not trying to circumvent the compiler. String -> transform -> 'something else'. The JavaTypeAndValue I had in mind is just meant to store the information about what type was 'inferred', and the expected 'value' based on whatever rules are applied by the 'transformer/extractor'. – Gary Sharpe Sep 22 '20 at 19:12
  • 1
    Apache Commons, for example, has `StringUtils.IsNumeric(s) StringUtils.IsAlpha(s)`. I imagine there is a collection of these tucked somewhere that applies a series of 'checks' to determine an 'optimal' type for the given string. – Gary Sharpe Sep 22 '20 at 19:14
  • I think we are meaning different things in Type Inference. Have a look at [this](https://docs.oracle.com/javase/tutorial/java/generics/genTypeInference.html#:~:text=Type%20inference%20is%20a%20Java,that%20make%20the%20invocation%20applicable.), stating: "*Type inference is a Java compiler's ability to look at each method invocation and corresponding declaration to determine the type argument."* – Giorgi Tsiklauri Sep 22 '20 at 19:20
  • Yeah, I see what you mean, and you're right. I'm referring to the process as 'Type inference', but in a more general, conceptual way. I see how that creates confusion, though. – Gary Sharpe Sep 22 '20 at 19:40
  • If you could edit your question and provide a minimal reproducible example, or an example of *how* you want it to be working, maybe I can try to help a bit better. – Giorgi Tsiklauri Sep 22 '20 at 19:42
  • Are you aware of `instanceof`? I still can't clearly understand your question, as you haven't provided the clear code.. but if you want to accept a String, and then cast it to other type, after some if-checks, then `instanceof` can be useful for you. But in most cases, it will throw ClassCastException. Your question is still unclear. Please make it as simple as possible. – Giorgi Tsiklauri Sep 22 '20 at 19:56
  • What has Jackson to do with your point? provide that as an example as well. Please, read how to create [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). Write exemplifying code instead of the user-story. – Giorgi Tsiklauri Sep 22 '20 at 19:59
  • Yes, I realize that a series of checks could be done for any given value. I'm familiar with a lot of ways this could be done, but not looking to write out all of the rules that might be used to detect and then transform a String myself, hence why I ask if someone is aware of something that already exists to do all of these checks and contains corresponding transform logic. It might be easier for me to look for and reference examples of where I've seen similar things done in other libraries. – Gary Sharpe Sep 22 '20 at 20:02
  • @GarySharpe just did an update of the response, if it can help – rascio Sep 27 '20 at 15:25

3 Answers3

3

I don't know of any library to do this, and never seen anything working in this way on an open set of possible types.

For closed set of types (you know all the possible output types) the easier way would be to have the class FQN written in the string (from your description I didn't get if you are in control of the written string).
The complete FQN, or an alias to it.

Otherwise I think there is no escape to not write all the checks.

Furthermore it will be very delicate as I'm thinking of edge use case.

Suppose you use json as serialization format in the string, how would you differentiate between a String value like Hello World and a Date written in some ISO format (eg. 2020-09-22). To do it you would need to introduce some priority in the checks you do (first try to check if it is a date using some regex, if not go with the next and the simple string one be the last one)

What if you have two objects:

   String name;
   String surname;
}

class Employee {
   String name;
   String surname;
   Integer salary
}

And you receive a serialization value of the second type, but with a null salary (null or the property missing completely).

How can you tell the difference between a set or a list?

I don't know if what you intended is so dynamic, or you already know all the possible deserializable types, maybe some more details in the question can help.

UPDATE

Just saw the code, now it seems more clear. If you know all the possible output, that is the way.
The only changes I would do, would be to ease the increase of types you want to manage abstracting the extraction process.
To do this I think a small change should be done, like:

interface Extractor {
    Boolean match(String value);
    Object extract(String value);
}

Then you can define an extractor per type:

class NumberExtractor implements Extractor<T> {
    public Boolean match(String val) {
        return NumberUtils.isCreatable(val);
    }
    public Object extract(String value) {
        return NumberUtils.createNumber(value);
    }
}
class StringExtractor implements Extractor {
    public Boolean match(String s) {
        return true; //<-- catch all
    }
    public Object extract(String value) {
        return value;
    }
}

And then register and automatize the checks:

public class JavaTypeParser {
  private static final List<Extractor> EXTRACTORS = List.of(
      new NullExtractor(),
      new BooleanExtractor(),
      new NumberExtractor(),
      new StringExtractor()
  )

  public static final List<TypeAndValue> parse(List<Map<String, String>> rows, List<String> headers) {
    List<TypeAndValue> typesAndVals = new ArrayList<TypeAndValue>();
    for (Map<String, String> row : rows) {
        for (String header : headers) {
            String val = row.get(header);
            
            typesAndVals.add(extract(header, val));
        }
    }
}
  public static final TypeAndValue extract(String header, String value) {
       for (Extractor<?> e : EXTRACTOR) {
           if (e.match(value) {
               Object v = extractor.extract(value);
               return TypeAndValue.builder()
                         .label(header)
                         .value(v) //<-- you can put the real value here, and remove the type field
                         .build()
           }
       }
       throw new IllegalStateException("Can't find an extractor for: " + header + " | " + value);

  }

To parse CSV I would suggest https://commons.apache.org/proper/commons-csv as CSV parsing can incur in nasty issues.

rascio
  • 8,968
  • 19
  • 68
  • 108
2

What you actually trying to do is to write a parser. You translate a fragment into a parse tree. The parse tree captures the type as well as the value. For hierarchical types like arrays and objects, each tree node contains child nodes.

One of the most commonly used parsers (albeit a bit overkill for your use case) is Antlr. Antlr brings out-of-the-box support for Json.

I recommend to take the time to ingest all the involved concepts. Even though it might seem overkill initially, it quickly pays off when you do any kind of extension. Changing a grammar is relatively easy; the generated code is quite complex. Additionally, all parser generator verify your grammars to show logic errors.

Of course, if you are limiting yourself to just parsing CSV or JSON (and not both at the same time), you should rather take the parser of an existing library. For example, jackson has ObjectMapper.readTree to get the parse tree. You could also use ObjectMapper.readValue(<fragment>, Object.class) to simply get the canonical java classes.

Arvid Heise
  • 3,524
  • 5
  • 11
0

Try this :

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

String j = // json string;

            JsonFactory jsonFactory = new JsonFactory();
            ObjectMapper jsonMapper = new ObjectMapper(jsonFactory);
            JsonNode jsonRootNode = jsonMapper.readTree(j);
            Iterator<Map.Entry<String,JsonNode>> jsonIterator = jsonRootNode.fields();

            while (jsonIterator.hasNext()) {
                Map.Entry<String,JsonNode> jsonField = jsonIterator.next();
                String k = jsonField.getKey();
                String v = jsonField.getValue().toString();
                ...

            }
oat
  • 72
  • 3