63

I've been using java to parse numbers, e.g.

(. Integer parseInt  numberString)

Is there a more clojuriffic way that would handle both integers and floats, and return clojure numbers? I'm not especially worried about performance here, I just want to process a bunch of white space delimited numbers in a file and do something with them, in the most straightforward way possible.

So a file might have lines like:

5  10  0.0002
4  12  0.003

And I'd like to be able to transform the lines into vectors of numbers.

Rob Lachlan
  • 14,289
  • 5
  • 49
  • 99
  • 22
    As a side note, the way you called Java in your post is a unsugared way. Please prefer the sugared way when calling java. `(Integer/parseInt number-string)`, and for instance methods: `(.method obj args)`. – Rayne Apr 14 '10 at 21:02
  • 3
    Adding on top of Rayne's "sweet" suggestion, you can also use `(Integer. number-string)` to parse a into a java.lang.Integer (and similarly for Long, Double, etc...). – ɲeuroburɳ Aug 27 '13 at 20:07
  • 1
    It would seem that the word "easiest" in the title attracted some answers that are easy and unsafe. Please, unless you want to get hacked, use a number parser that can tolerate malicious strings. – David J. Jan 03 '15 at 00:00

10 Answers10

69

You can use the edn reader to parse numbers. This has the benefit of giving you floats or Bignums when needed, too.

user> (require '[clojure.edn :as edn])
nil
user> (edn/read-string "0.002")
0.0020

If you want one huge vector of numbers, you could cheat and do this:

user> (let [input "5  10  0.002\n4  12  0.003"]
        (read-string (str "[" input "]")))
[5 10 0.0020 4 12 0.0030]

Kind of hacky though. Or there's re-seq:

user> (let [input "5  10  0.002\n4  12  0.003"]
        (map read-string (re-seq #"[\d.]+" input)))
(5 10 0.0020 4 12 0.0030)

Or one vector per line:

user> (let [input "5  10  0.002\n4  12  0.003"]
        (for [line (line-seq (java.io.BufferedReader.
                              (java.io.StringReader. input)))]
             (vec (map read-string (re-seq #"[\d.]+" line)))))
([5 10 0.0020] [4 12 0.0030])

I'm sure there are other ways.

Inaimathi
  • 13,853
  • 9
  • 49
  • 93
Brian Carper
  • 71,150
  • 28
  • 166
  • 168
  • 1
    this method has the useful benefit of properly parsing rational numbers as well. if you modify the re-seq properly (re-seq #"[\d\/\.]+" input) – Jeremy Wall Jun 24 '10 at 03:14
  • beware of read-string user> (class (read-string "0.5")) = java.lang.Double user> (class (read-string ".5")) = clojure.lang.Symbol – alexguev Jul 15 '12 at 03:30
  • 8
    In modern times, we have the EDN reader, which would be the right tool for this job. – Charles Duffy Dec 11 '13 at 16:29
  • Re: "If you're very sure that your file contains only numbers". Whether you are sure (or not) matters not! This is parsing; deal with reality instead of making assumptions. When you parse strings, design your program to tolerate the worst! Go with a safe parsing option instead. – David J. Jan 02 '15 at 23:55
  • 22
    This is a horrible practice. `read-string` can *execute* code. This fact cannot be emphasized enough. For examples and a good explanation of how bad it can be, even with `*read-eval*` bound to false, see: https://clojuredocs.org/clojure.core/read – ɲeuroburɳ Jun 17 '15 at 14:26
  • 8
    There's also [clojure.edn/read-string](https://clojuredocs.org/clojure.edn/read-string) which only parses edn format, and won't execute code. – ktsujister Mar 21 '16 at 22:48
  • 2
    Just another "user beware" on this, numbers with a leading zero are treated as octal by the clojure reader, as with many programming langs. – bfabry Nov 04 '16 at 00:59
26

If you want to be safer, you can use Float/parseFloat

user=> (map #(Float/parseFloat (% 0)) (re-seq #"\d+(\.\d+)?" "1 2.2 3.5"))
(1.0 2.2 3.5)
user=> 
Miki Tebeka
  • 13,428
  • 4
  • 37
  • 49
26

Not sure if this is "the easiest way", but I thought it was kind of fun, so... With a reflection hack, you can access just the number-reading part of Clojure's Reader:

(let [m (.getDeclaredMethod clojure.lang.LispReader
                            "matchNumber"
                            (into-array [String]))]
  (.setAccessible m true)
  (defn parse-number [s]
    (.invoke m clojure.lang.LispReader (into-array [s]))))

Then use like so:

user> (parse-number "123")
123
user> (parse-number "123.5")
123.5
user> (parse-number "123/2")
123/2
user> (class (parse-number "123"))
java.lang.Integer
user> (class (parse-number "123.5"))
java.lang.Double
user> (class (parse-number "123/2"))
clojure.lang.Ratio
user> (class (parse-number "123123451451245"))
java.lang.Long
user> (class (parse-number "123123451451245123514236146"))
java.math.BigInteger
user> (parse-number "0x12312345145124")
5120577133367588
user> (parse-number "12312345142as36146") ; note the "as" in the middle
nil

Notice how this does not throw the usual NumberFormatException if something goes wrong; you could add a check for nil and throw it yourself if you want.

As for performance, let's have an unscientific microbenchmark (both functions have been "warmed up"; initial runs were slower as usual):

user> (time (dotimes [_ 10000] (parse-number "1234123512435")))
"Elapsed time: 564.58196 msecs"
nil
user> (time (dotimes [_ 10000] (read-string "1234123512435")))
"Elapsed time: 561.425967 msecs"
nil

The obvious disclaimer: clojure.lang.LispReader.matchNumber is a private static method of clojure.lang.LispReader and may be changed or removed at any time.

Michał Marczyk
  • 83,634
  • 13
  • 201
  • 212
21

In my opinion the best/safest way that works when you want it to for any number and fails when it isn't a number is this:

(defn parse-number
  "Reads a number from a string. Returns nil if not a number."
  [s]
  (if (re-find #"^-?\d+\.?\d*$" s)
    (read-string s)))

e.g.

(parse-number "43") ;=> 43
(parse-number "72.02") ;=> 72.02
(parse-number "009.0008") ;=> 9.008
(parse-number "-92837482734982347.00789") ;=> -9.2837482734982352E16
(parse-number "89blah") ;=> nil
(parse-number "z29") ;=> nil
(parse-number "(exploit-me)") ;=> nil

Works for ints, floats/doubles, bignums, etc. If you wanted to add support for reading other notations, simply augment the regex.

solussd
  • 494
  • 5
  • 7
17

Brian Carper's suggested approach (using read-string) works nicely, but only until you try and parse zero-padded numbers like "010". Observe:

user=> (read-string "010")
8
user=> (read-string "090")
java.lang.RuntimeException: java.lang.NumberFormatException: Invalid number: 090 (NO_SOURCE_FILE:0)

This is because clojure tries to parse "090" as an octal, and 090 is not a valid octal!

Stathis Sideris
  • 624
  • 4
  • 10
15

Brian carper's answer is almost correct. Instead of using read-string directly from clojure's core. Use clojure.edn/read-string. It is safe and it will parse anything that you throw at it.

(ns edn-example.core
    (require [clojure.edn :as edn]))

(edn/read-string "2.7"); float 2.7
(edn/read-string "2"); int 2

simple, easy and execution safe ;)

carocad
  • 455
  • 6
  • 12
9

Use bigint and bigdec

(bigint "1")
(bigint "010") ; returns 10N as expected
(bigint "111111111111111111111111111111111111111111111111111")
(bigdec "11111.000000000000000000000000000000000000000000001")

Clojure's bigint will use primitives when possible, while avoiding regexps, the problem with octal literals or the limited size of the other numeric types, causing (Integer. "10000000000") to fail.

(This last thing happened to me and it was quite confusing: I wrapped it into a parse-int function, and afterwards just assumed that parse-int meant "parse a natural integer" not "parse a 32bit integer")

Community
  • 1
  • 1
berdario
  • 1,851
  • 18
  • 29
9

These are the two best and correct approaches:

Using Java interop:

(Long/parseLong "333")
(Float/parseFloat "333.33")
(Double/parseDouble "333.3333333333332")
(Integer/parseInt "-333")
(Integer/parseUnsignedInt "333")
(BigInteger. "3333333333333333333333333332")
(BigDecimal. "3.3333333333333333333333333332")
(Short/parseShort "400")
(Byte/parseByte "120")

This lets you precisely control the type you want to parse the number in, when that matters to your use case.

Using the Clojure EDN reader:

(require '[clojure.edn :as edn])
(edn/read-string "333")

Unlike using read-string from clojure.core which isn't safe to use on untrusted input, edn/read-string is safe to run on untrusted input such as user input.

This is often more convenient then the Java interop if you don't need to have specific control of the types. It can parse any number literal that Clojure can parse such as:

;; Ratios
(edn/read-string "22/7")
;; Hexadecimal
(edn/read-string "0xff")

Full list here: https://www.rubberducking.com/2019/05/clojure-for-non-clojure-programmers.html#numbers

Didier A.
  • 4,609
  • 2
  • 43
  • 45
  • This should be the accepted answer. Much more useful and advocates less bad practices than the top answer. – Jeremy D Jul 10 '20 at 19:25
  • To update my answer, with Clojure 1.11, there are now functions in core for some of these: https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/parse-boolean – Didier A. Apr 23 '22 at 01:59
7

I find solussd's answer work great for my code. Based on it, here's an enhancement with support for Scientific notation. Besides, (.trim s) is added so that extra space can be tolerated.

(defn parse-number
  "Reads a number from a string. Returns nil if not a number."
  [s]
  (if (re-find #"^-?\d+\.?\d*([Ee]\+\d+|[Ee]-\d+|[Ee]\d+)?$" (.trim s))
    (read-string s)))

e.g.

(parse-number "  4.841192E-002  ")    ;=> 0.04841192
(parse-number "  4.841192e2 ")    ;=> 484.1192
(parse-number "  4.841192E+003 ")    ;=> 4841.192
(parse-number "  4.841192e.2 ")  ;=> nil
(parse-number "  4.841192E ")  ;=> nil
Kevin Zhu
  • 2,746
  • 26
  • 23
1
(def mystring "5")
(Float/parseFloat mystring)
Varun J.P
  • 29
  • 9
  • Please edit your answer and add some context by explaining how your answer solve the problem in question, instead of posting code-only answer. – Pedram Parsian Dec 26 '19 at 11:49