0

I'm learning Clojure and working on a simple file parsing script.

I have a file in the form of:

pattern1
pattern2
pattern3
pattern1
pattern2
...

where each line has a few values (numbers) that I extract.

If I was to write this in Java for example, I would do something similar to:

Map<String, Integer> parse(String line) {
    //using Optional in this toy example, but could be an empty map or something else to
    //signal if first one was actually matched and the values are there
    Optional<Map<String, Integer>> firstMatched = matchFirst(line);
    if (firstMatched.isPresent()) {
        return firstMatched.get();
    }
    //...do the same for 2 remaining patterns
    //...
}

Now what would be a an elegant or idiomatic way to do something similar in Clojure?

I guess I can use cond, but since there's no binding in the test expression, I'll have to parse the line twice:

(defn parse
  [line]
  (cond
    (re-find #"pattern-1-regex" line) (re-find...)
    (re-find #"pattern-2-regex" line) (re-find...

I could also use if-let, but that will be a lot of nesting since there are 3 different options. Imagine with 7 different pattern how that would look like.

Any suggestions? Obviously Java solution is an imperative one and I can do a "return" whenever I want, so what would be the Clojure/FP way of dealing with this simple branching.

newman555p
  • 181
  • 1
  • 4
  • 13

2 Answers2

1

i would go with some simple function to return the first matched pattern, filtering over the patterns seq:

(defn first-match [patterns]
  (fn [line]
    (some #(re-find % line) patterns)))

this one returns the function, that would return the first match, testing the line:

user> (def mat (first-match [#"(asd)" #"(fgh)" #"aaa(.+?)aaa"]))
#'user/mat

user> (mat "aaaxxxaaa")
;;=> ["aaaxxxaaa" "xxx"]

user> (mat "nomatch")
;;=> nil

otherwise you could use some simple macro for that. maybe like this:

(defmacro when-cond [& conds]
  (when (seq conds)
    `(if-let [x# ~(first conds)]
       x#
       (when-cond ~@(rest conds)))))

user> 
(let [line "somethingaaa"]
  (when-cond
    (re-find #"something" line)
    (re-find #"abc(.*?)def" line)))
;;=> "something"

for the preceeding example that would expand to something like this (schematically)

(if-let [x__8043__auto__ (re-find #"something" line)]
  x__8043__auto__
  (if-let [x__8044__auto__ (re-find #"abc(.*?)def" line)]
    x__8044__auto__
    nil))

more examples:

user> 
(let [line "nomatch"]
  (when-cond
    (re-find #"something" line)
    (re-find #"abc(.*?)def" line)))
;;=> nil

user> 
(let [line "abcxxxxxxdef"]
  (when-cond
    (re-find #"something" line)
    (re-find #"abc(.*?)def" line)))
;;=> ["abcxxxxxxdef" "xxxxxx"]
leetwinski
  • 17,408
  • 2
  • 18
  • 42
  • nice! the function returning the first match is really simple way to do this, feel filly that I didn't think of that :) and the one with the macro looks neat, still learning so I'll need to dissect it a bit more i.e. understand macros properly. thanks a lot! – newman555p Apr 26 '20 at 16:21
1

Given some sample data:

(ns tst.demo.core
  (:use demo.core tupelo.core tupelo.test)
  (:require
    [clojure.string :as str]
    [tupelo.string :as ts]
    [tupelo.parse :as parse]))

(def data-str "
  fred123 1 2 3
  fred456   4 5    6
  wilma12  1.2
  wilma34 3.4
  barney1 1
  barney2 2
  ")

You can then define parse functions for each type of data:

(defn fred-parser
  [line]
  (let [tokens        (str/split line #"\p{Blank}+")
        root          (first tokens)
        details       (rest tokens)
        parsed-root   (re-find #"fred\n*" root)
        parsed-params (mapv parse/parse-long details)
        result        {:root parsed-root :params parsed-params}]
    result))

(defn wilma-parser
  [line]
  (let [tokens        (str/split line #"\p{Blank}+")
        root          (first tokens)
        details       (rest tokens)
        parsed-root   (re-find #"wilma\n*" root)
        parsed-params (mapv parse/parse-double details)
        result        {:root parsed-root :params parsed-params}]
    result))

I would make a map from pattern to parse function:

(def pattern->parser
  {#"fred\d*"  fred-parser
   #"wilma\d*" wilma-parser
   })

and some functions to find the right parser for each line of (cleaned) data:

(defn parse-line
  [line]
  (let [patterns          (keys pattern->parser)
        patterns-matching (filterv ; keep pattern if matches
                            (fn [pat]
                              (ts/contains-match? line pat))
                            patterns)
        num-matches       (count patterns-matching)]
    (cond
      (< 1 num-matches) (throw (ex-info "Too many matching patterns!" {:line line :num-matches num-matches}))
      (zero? num-matches) (prn :no-match-found line)
      :else (let [parser      (get pattern->parser (only patterns-matching))
                  parsed-line (parser line)]
              parsed-line))))

(defn parse-file
  [data]
  (let
    [lines       (filterv #(not (str/blank? %)) ; remove blank lines
                   (mapv str/trim ; remove leading/trailing whitespace
                     (str/split-lines data))) ; split into lines
     parsed-data (mapv parse-line lines)]
    parsed-data))

and a unit test to show it in action:

(dotest
  (is= (parse-file data-str)
    [{:root "fred", :params [1 2 3]}
     {:root "fred", :params [4 5 6]}
     {:root "wilma", :params [1.2]}
     {:root "wilma", :params [3.4]}
     nil
     nil])
  )

Note that unmatched lines return nil. You'll want to either throw an exception for problems, or at least filter out the nil values. Right now you just get a printed error msg:

-------------------------------
   Clojure 1.10.1    Java 14
-------------------------------

Testing tst.demo.core
:no-match-found "barney1 1"
:no-match-found "barney2 2"

Ran 2 tests containing 1 assertions.
0 failures, 0 errors.

More documentation here and here.

Alan Thompson
  • 29,276
  • 6
  • 41
  • 48
  • Awesome! I've learned a lot from both answers. Even though I'm totally new to Clojure, it's very easy to read this (I guess when you're looking at clean code, that's usually the case with other languages as well). Thanks Alan! – newman555p Apr 26 '20 at 19:01