Count words of a text, without special characters

Question

I need a little bit aid. I want to see each of the elements of the rdd (rddseparar) The idea is to count the words of a text, eliminating the special characters and this is one of de steps for get it

import re

fileName = "/databricks-datasets/cs100/lab1/data-001/shakespeare.txt"


rdd = sc.textFile(fileName)
separar = re.split(r"[^A-Za-z\s\d]", rdd.collect()[0])
separarPalabras = [word for frase in separar for word in frase.split()]
rddseparar = sc.parallelize(separarPalabras)

print(rddseparar.collect())

When I run the code, I should be able to see each of the elements in the rddseparate, but I don't.

Spark code execution output

Why can't I see the elements of the rddseparar ?

(2) Spark Jobs ['1609']

Thanks for answering. The output is: (2) Spark Jobs ['1609']. So I can't to see each element of rdd — Gregorio Acedo, Jul 17 '23 at 07:28

score 0 · Answer 1 · answered Jul 17 '23 at 08:34

0

The output is correct, but it only returns one row: ['1609']. This is because you only input one row: rdd.collect()[0]). If you want to apply your regex to every row, you could use a loop through your collect output, or go a more spark-route using pyspark functions/udf

answered Jul 17 '23 at 08:34

user2704177

109
2
6

Thanks for answering – Gregorio Acedo Jul 21 '23 at 15:25

shalnarkftw · Answer 2 · 2023-07-26T12:54:10.753

0

You're not using spark functionality to calculate the word count. You're just getting the the n row value from the rdd and pass It as an argument to another function.

So you're using the rdd as a data structure (array or list ect..)

Instead of doing it that way, you can use the spark transformations and actions to calculate the word count directly.

      val results = sc.textFile(""/databricks-datasets/cs100/lab1/data-001/shakespeare.txt"")  
      .flatMap(line => line.split(";"))   
      .map(word => (word,1))  
      .reduceByKey(_+_)  
      .collect()

I've putted ";" as an exemple but you can develop here to add the list of chars

edited Jul 26 '23 at 12:54

answered Jul 26 '23 at 09:57

shalnarkftw

402
2
8

if the answer helped you to solve the issue, take a moment to accept and upvote to close this thread as solved.! meta.stackexchange.com/questions/5234/… – shalnarkftw Jul 28 '23 at 14:26

Count words of a text, without special characters

2 Answers2