-3

I have a JavaRDD object and want to create another new JavaRDD object by selecting a substring of the original one. How to achieve that?

 // Read input_train data
    logger.info("start to read file");
    JavaRDD<String> inputDataRaw= sc.textFile(input_train);

There inputDataRaw.first() is something like: "apple1; apple2;" (say String s1)

I want to JavaRDD with each line consisting of "apple1" only, i.e.,:

  String s2 = s1.substring(0, 6)
HappyCoding
  • 5,029
  • 7
  • 31
  • 51

3 Answers3

0
 JavaRDD<String> inputDataRaw= sc.textFile(input_train);
 inputDataRaw.new Function<String>() {
 public String call(String arg0) throws Exception {
     return arg0.substring(0,6);
 }
 });
None
  • 1,448
  • 1
  • 18
  • 36
0

Below is the simple option. I included the newer JDK8 lambda syntax as well as the older JDK6 compatible syntax:

    JavaRDD<String> inputDataRaw = sc.textFile("file.txt");

    JavaRDD<String> mapped_jdk8 = inputDataRaw.map(s -> s.substring(0, 6));

    JavaRDD<String> mapped_jdk6 = inputDataRaw.map(new Function<String, String>() {
        @Override
        public String call(String s) throws Exception {
            return s.substring(0, 6);
        }
    });
clay
  • 18,138
  • 28
  • 107
  • 192
  • can you help on this ? https://stackoverflow.com/questions/55332897/how-to-add-new-column-to-datasetrow-using-map-function-on-the-dataset – Pyd Mar 25 '19 at 12:07
0

I think substring is not a good idea to grab the first object from a line.

substring(0,6) # this will help only when first object is of fixed size.

Instead first split the line with ; (comma) and grab the first index

JavaRDD<String> inputDataRaw = sc.textFile("file.txt");

JavaRDD<String> mapped_jdk8 = inputDataRaw.map(s -> s.split(";")).map(r -> r(0)); 

try r[0] if you get any syntax error in java, I've not tried lambda in java but i do scala only

hnahak
  • 246
  • 1
  • 14