0

I already read a file in hdfs using filesystem and need to count the no of records of a file. can u help for counting no of records of file for below code.

val inputStream:FSDataInputStream = fileSystem.open(dataFile)

val data = IOUtils.toString(inputStream, "UTF-8")

inputStream.close()
Claire
  • 3,146
  • 6
  • 22
  • 37
Mahi
  • 1
  • 2
  • Possible duplicate of [count number of lines in file - Scala](https://stackoverflow.com/questions/8865551/count-number-of-lines-in-file-scala) – Arnaud Claudel Sep 07 '19 at 17:22

1 Answers1

1

I am assuming that by record count you mean the count of lines.

You can use the java.io.BufferedReader to read the input stream line by line and incrementing a counter variable

import java.io.BufferedReader
import java.io.InputStreamReader
var count = 0
val inputStream: FSDataInputStream = fileSystem.open(dataFile)
val reader: BufferedReader = new BufferedReader(new InputStreamReader(inputStream))
var line: String = reader.readLine()

while(line!=null){
    count+=1
    line = reader.readLine()
}

Alternatively you can also use reader.lines().count() to get the count of lines but using this you will not be able to reuse the input stream to get the actual data in lines since inputstream is not reusable.

Kiran Maniya
  • 8,453
  • 9
  • 58
  • 81
  • If I use while loop to count records then this line to store inputstream will work or not val data = IOUtils.toString(inputStream, "UTF-8") because after counting records I want to store that stream as string. I will write this data variable to outputstream. – Mahi Sep 16 '19 at 17:53