I have jsons like,
{
"name":"someone",
"job":"doctor",
"etc":"etc"
}
in every json there is different value for "job" like doctor, pilot, driver, watchman etc.
i want to separte each json based on the "job" value and store it in diffrent locations like /home/doctor
, /home/pilot
, /home/driver
etc.
i have tried SplitStream function to do this but i have to specify those value to match the condition.
public class MyFlinkJob {
private static JsonParser jsonParser = new JsonParser();
private static String key_1 = "doctor";
private static String key_2 = "driver";
private static String key_3 = "pilot";
private static String key_default = "default";
public static void main(String args[]) throws Exception {
Properties prop = new Properties();
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties props = new Properties();
props.setProperty("bootstrap.servers", kafka);
props.setProperty("group.id", "myjob");
FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("topic", new SimpleStringSchema(), props);
DataStream<String> record = env.addSource(myConsumer).rebalance()
SplitStream<String> split = record.split(new OutputSelector<String>() {
@Override
public Iterable<String> select(String val) {
JsonObject json = (JsonObject)jsonParser.parse(val);
String jsonValue = CommonFields.getFieldValue(json, "job");
List<String> output = new ArrayList<String>();
if (key_1.equalsIgnoreCase(jsonValue)) {
output.add("doctor");
} else if (key_2.equalsIgnoreCase(jsonValue)) {
output.add("driver");
} else if (key_3.equalsIgnoreCase(jsonValue)) {
output.add("pilot");
} else {
output.add("default");
}
return output;
}});
DataStream<String> doctor = split.select("doctor");
DataStream<String> driver = split.select("driver");
DataStream<String> pilot = split.select("pilot");
DataStream<String> default1 = split.select("default");
doctor.addSink(getBucketingSink(batchSize, prop, key_1));
driver.addSink(getBucketingSink(batchSize, prop, key_2));
pilot.addSink(getBucketingSink(batchSize, prop, key_3));
default1.addSink(getBucketingSink(batchSize, prop, key_default));
env.execute("myjob");
} catch (IOException ex) {
ex.printStackTrace();
} finally {
if (input != null) {
try {
input.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public static BucketingSink<String> getBucketingSink(Long BatchSize, Properties prop, String key) {
BucketingSink<String> sink = new BucketingSink<String>("hdfs://*/home/"+key);
Configuration conf = new Configuration();
conf.set("hadoop.job.ugi", "hdfs");
sink.setFSConfig(conf);
sink.setBucketer(new DateTimeBucketer<String>(prop.getProperty("DateTimeBucketer")));
return sink;
}
}
suppose if any other value comes in "job" like engineer or something else and i have not specified in class then it goes to default folder is there any way to split those json events automatically based on the value of "job" without specifing it and create a path which contains name of value like /home/enginerr?