40

So I come from a C background (originally originally, though I haven't used that language for almost 5 years) and I'm trying to parse some values from a string in Java. In C I would use sscanf. In Java people have told me "use Scanner, or StringTokenizer", but I can't see how to use them to achieve my purpose.

My input string looks like "17-MAR-11 15.52.25.000000000". In C I would do something like:

sscanf(thestring, "%d-%s-%d %d.%d.%d.%d", day, month, year, hour, min, sec, fracpart);

But in Java, all I can do is things like:

scanner.nextInt();

This doesn't allow me to check the pattern, and for "MAR" I end up having to do things like:

str.substring(3,6);

Horrible! Surely there is a better way?

riffraff
  • 2,429
  • 1
  • 23
  • 32
Adam Burley
  • 5,551
  • 4
  • 51
  • 72
  • 1
    if your problem is actually parsing a datetime string? Than there could be better options but yoy should be looking for `strptime` equivalents rather than scanf – riffraff Dec 08 '11 at 11:26
  • have you tried using SimpleDateFormat? it has a parse method which returns null on error. – rineez Dec 08 '11 at 12:51

9 Answers9

41

The problem is Java hasn't out parameters (or passing by reference) as C or C#.

But there is a better way (and more solid). Use regular expressions:

Pattern p = Pattern.compile("(\\d+)-(\\p{Alpha}+)-(\\d+) (\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)")
Matcher m = p.matcher("17-MAR-11 15.52.25.000000000");
day = m.group(1);
month= m.group(2);
....

Of course C code is more concise, but this technique has one profit: Patterns specifies format more precise than '%s' and '%d'. So you can use \d{2} to specify that day MUST be compose of exactly 2 digits.

korifey
  • 3,379
  • 17
  • 17
  • 1
    great...it requires me to do the string-integer conversion myself, but this seems like the best solution, and one I hadn't thought of. – Adam Burley Dec 08 '11 at 11:23
  • 1
    Yes, you should notice one thing here: invoke `Pattern.compile` only once (maybe create `final static Pattern pattern` field), because it's very time consuming operation – korifey Dec 08 '11 at 11:27
  • 6
    Note also that you could capture everything but the fractional seconds part as a single group and parse it into a Date using SimpleDateFormat("dd-MMM-yy hh.mm.ss"). – ewan.chalmers Dec 08 '11 at 12:05
  • 9
    C does have width specifiers. So you can say "%2d" to specify that you need exactly two digits. Just my $0.02 cents! :) – Kounavi Nov 14 '12 at 22:50
  • Curious - does this or is there an option to do so in an input buffer without allocating intermediate objects ?? – peterk Apr 22 '13 at 23:25
  • 18
    I think you need to call `m.find()` before you call `m.group()`. – xuhdev Sep 26 '14 at 23:02
  • 1
    Indeed you should first call `m.find()` (as xuhdev mentioned above) or just use `if (m.matches()) {m.group(1); ...}` – Mitrakov Artem Jul 24 '17 at 12:20
29

Here is a solution using scanners:

Scanner scanner = new Scanner("17-MAR-11 15.52.25.000000000");

Scanner dayScanner = new Scanner(scanner.next());
Scanner timeScanner = new Scanner(scanner.next());

dayScanner.useDelimiter("-");
System.out.println("day=" + dayScanner.nextInt());
System.out.println("month=" + dayScanner.next());
System.out.println("year=" + dayScanner.nextInt());

timeScanner.useDelimiter("\\.");
System.out.println("hour=" + timeScanner.nextInt());
System.out.println("min=" + timeScanner.nextInt());
System.out.println("sec=" + timeScanner.nextInt());
System.out.println("fracpart=" + timeScanner.nextInt());
dsboger
  • 496
  • 4
  • 8
13

None of these examples were really satisfactory to me so I made my own java sscanf utility:

https://github.com/driedler/java-sscanf/tree/master/src/util/sscanf

Here's an example of parsing a hex string:

String buffer = "my hex string: DEADBEEF\n"
Object output[] = Sscanf.scan(buffer, "my hex string: %X\n", 1);

System.out.println("parse count: " + output.length);
System.out.println("hex str1: " + (Long)output[0]);

// Output:
// parse count: 1
// hex str1: 3735928559
driedler
  • 3,750
  • 33
  • 26
  • I'm getting exception while exracting city,state & zip : Invalid number format: 's' is not one of 'diuoxX'. e.g. `String buffer = "[\"WALTER PAYTON HIGH SCHOOL - CHICAGO, IL\",\"60622\"]"; Object output[] = Sscanf.scan(buffer, "[\"%s - %s, %s\",\"%d\"]", 1,2,3,4); System.out.println("parse count: " + output.length); System.out.println("data : " + output[0]+output[1]+output[2]+output[3]);` – MD. Mohiuddin Ahmed Sep 13 '14 at 08:57
3

For "17-MAR-11 15.52.25.000000000":

SimpleDateFormat format = new SimpleDateFormat("dd-MMM-yy HH.mm.ss.SSS");

try 
{
    Date parsed = format.parse(dateString);
    System.out.println(parsed.toString());
}
catch (ParseException pe)
{
    System.out.println("ERROR: Cannot parse \"" + dateString + "\"");
}
DefenestrationDay
  • 3,712
  • 2
  • 33
  • 61
  • I think this answer is too specific to date, while the OP asked for a way to parse values using a generic known pattern – magnum87 May 30 '18 at 08:54
2

This is far from as elegant solution as one would get with using regex, but ought to work.

public static void stringStuffThing(){
String x = "17-MAR-11 15.52.25.000000000";
String y[] = x.split(" ");

for(String s : y){
    System.out.println(s);
}
String date[] = y[0].split("-");
String values[] = y[1].split("\\.");

for(String s : date){
    System.out.println(s);
}
for(String s : values){
    System.out.println(s);
}
Zavior
  • 6,412
  • 2
  • 29
  • 38
1

2019 answer: Java's Scanner is flexible for reading a wide range of formats. But if your format has simple {%d, %f, %s} fields then you can scan easily with this small class (~90 lines):

import java.util.ArrayList;

/**
 * Basic C-style string formatting and scanning.
 * The format strings can contain %d, %f and %s codes.
 * @author Adam Gawne-Cain
 */
public class CFormat {
    private static boolean accept(char t, char c, int i) {
        if (t == 'd')
            return "0123456789".indexOf(c) >= 0 || i == 0 && c == '-';
        else if (t == 'f')
            return "-0123456789.+Ee".indexOf(c) >= 0;
        else if (t == 's')
            return Character.isLetterOrDigit(c);
        throw new RuntimeException("Unknown format code: " + t);
    }

    /**
     * Returns string formatted like C, or throws exception if anything wrong.
     * @param fmt format specification
     * @param args values to format
     * @return string formatted like C.
     */
    public static String printf(String fmt, Object... args) {
        int a = 0;
        StringBuilder sb = new StringBuilder();
        int n = fmt.length();
        for (int i = 0; i < n; i++) {
            char c = fmt.charAt(i);
            if (c == '%') {
                char t = fmt.charAt(++i);
                if (t == 'd')
                    sb.append(((Number) args[a++]).intValue());
                else if (t == 'f')
                    sb.append(((Number) args[a++]).doubleValue());
                else if (t == 's')
                    sb.append(args[a++]);
                else if (t == '%')
                    sb.append(t);
                else
                    throw new RuntimeException("Unknown format code: " + t);
            } else
                sb.append(c);
        }
        return sb.toString();
    }

    /**
     * Returns scanned values, or throws exception if anything wrong.
     * @param fmt format specification
     * @param str string to scan
     * @return scanned values
     */
    public static Object[] scanf(String fmt, String str) {
        ArrayList ans = new ArrayList();
        int s = 0;
        int ns = str.length();
        int n = fmt.length();
        for (int i = 0; i < n; i++) {
            char c = fmt.charAt(i);
            if (c == '%') {
                char t = fmt.charAt(++i);
                if (t=='%')
                    c=t;
                else {
                    int s0 = s;
                    while ((s == s0 || s < ns) && accept(t, str.charAt(s), s - s0))
                        s++;
                    String sub = str.substring(s0, s);
                    if (t == 'd')
                        ans.add(Integer.parseInt(sub));
                    else if (t == 'f')
                        ans.add(Double.parseDouble(sub));
                    else
                        ans.add(sub);
                    continue;
                }
            }
            if (str.charAt(s++) != c)
                throw new RuntimeException();
        }
        if (s < ns)
            throw new RuntimeException("Unmatched characters at end of string");
        return ans.toArray();
    }
}

For example, the OP's case can be handled like this:

    // Example of "CFormat.scanf"
    String str = "17-MAR-11 15.52.25.000000000";
    Object[] a = CFormat.scanf("%d-%s-%d %d.%d.%f", str);

    // Pick out scanned fields
    int day = (Integer) a[0];
    String month = (String) a[1];
    int year = (Integer) a[2];
    int hour = (Integer) a[3];
    int min = (Integer) a[4];
    double sec = (Double) a[5];

    // Example of "CFormat.printf"  
    System.out.println(CFormat.printf("Got day=%d month=%s hour=%d min=%d sec=%f\n", day, month, year, hour, min, sec));
Adam Gawne-Cain
  • 1,347
  • 14
  • 14
0

Here is a simple implementation of sscanf using Scanner:

public static ArrayList<Object> scan(String s, String fmt)
{ ArrayList<Object> result = new ArrayList<Object>();
  Scanner scanner = new Scanner(s);

  int ind = 0; // s upto ind has been consumed

  for (int i = 0; i < fmt.length(); i++) 
  { char c = fmt.charAt(i); 
    if (c == '%' && i < fmt.length() - 1)
    { char d = fmt.charAt(i+1); 
      if (d == 's') 
      { scanner = new Scanner(s.substring(ind)); 
        try { 
          String v = scanner.next(); 
          ind = ind + v.length(); 
          result.add(v); 
        } 
        catch (Exception _ex) { 
          _ex.printStackTrace(); 
        }  
        i++; 
      }
      else if (d == 'f')
      { String fchars = ""; 
        for (int j = ind; j < s.length(); j++) 
        { char x = s.charAt(j); 
          if (x == '.' || Character.isDigit(x))
          { fchars = fchars + x; } 
          else 
          { break; } 
        } 

        try { 
          double v = Double.parseDouble(fchars); 
          ind = ind + (v + "").length(); 
          result.add(v); 
        } 
        catch (Exception _ex) { 
          _ex.printStackTrace(); 
        }  
        i++;  
      }
      else if (d == 'd') 
      { String inchars = ""; 
        for (int j = ind; j < s.length(); j++) 
        { char x = s.charAt(j); 
          if (Character.isDigit(x))
          { inchars = inchars + x; } 
          else 
          { break; } 
        } 
      
        try { 
          int v = Integer.parseInt(inchars); 
          ind = ind + (v + "").length(); 
          result.add(v); 
        } 
        catch (Exception _ex) { 
          _ex.printStackTrace(); 
        }  
        i++;  
      }
    } 
    else if (s.charAt(ind) == c) 
    { ind++; } 
    else 
    { return result; }

  } 
  return result; 
} 

public static void main(String[] args)
{ ArrayList res = StringLib.scan("100##3.3::20\n", "%d##%f::%d\n"); 
  System.out.println(res); 
}  
0

Are you familiar with the concept of regular expressions? Java provides you with the ability to use regex by using the Pattern class. Check this one out: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

You can test your String like that:

Matcher matcher = Pattern.match(yourString);
matcher.find();

and then use the methods provided by Matcher to manipulate the string you found or NOT.

rbrito
  • 2,398
  • 2
  • 21
  • 24
Nikola Yovchev
  • 9,498
  • 4
  • 46
  • 72
-3

System.in.read() is another option.

radbanr
  • 37
  • 5