0

I have a directory of Json files that I want to process using cascalog. The solution I have right now requires me to remove all newline characters from my json files using a bash script. I am looking a better solution because I sync these files using rsync.

My question is can I read the contents of a file in Cascalog and return the contents of the file as one tuple. At present the function 'lfs-textline' returns a sequence of tuples for each line in the file, hence why I have to remove the newline characters. Preferably I want to return a sequence of tuples for each file.

(defn textline-parsed [dir]
    (let [source (lfs-textline dir)]
        (<- [?line]
            (source ?line))))
john
  • 709
  • 3
  • 13
  • 25

1 Answers1

1

Use hfs-wholefile from cascalog.more-taps to do this.

(:require [cascalog.more-taps :as taps])

(defn- byte-writable-to-str [bw]
  "convert byte writable to stirng"
  [(apply str (map char (. bw (getBytes))))])

And, use

(??<- [?str] 
    ((taps/hfs-wholefile path) ?filename ?file-content) 
    (byte-writable-to-str ?file-content :> ?str)
pavanred
  • 12,717
  • 14
  • 53
  • 59