Highest Voted 'spark-koalas' Questions

1

vote

0 answers

Unclear why I'm getting a TypeError: str object is not callable

I have a Koalas / Pandas-on-Spark dataframe named df. When I try the function below I get a TypeError: str object is not callable df[~(df.time.eq('00:00:00').groupby(df.vehicle_id).transform('sum')>=2)] When I check the datatypes of both columns I…

asked Jun 22 '22 at 13:57

sampeterson

459
4
16

1

vote

0 answers

Rolling window working in pandas but not in koalas

I have a rolling window computation that works in pandas but not in koalas, and I am wondering why: import pandas as pd import databricks.koalas as ks Timestamp = pd.Timestamp df = pd.DataFrame([[Timestamp('2022-05-18 18:10:50.021831300'),…

pandas rolling-computation spark-koalas

asked Jun 17 '22 at 17:34

Lei

733
1
5
13

1

vote

1 answer

Pandas on Spark 3.2 -NLP.pipe - pd.Series.iter() is not implemented

I'm currently trying to migrate some processes from python to (pandas on) spark to measure performance, everything went good until this point: df_info is of type pyspark.pandas nlp is defined as: nlp = spacy.load('es_core_news_sm',…

python apache-spark pyspark spark-koalas pyspark-pandas

asked Mar 09 '22 at 21:01

Alejandro

519
1
6
32

1

vote

0 answers

How to find memory usage for a koalas dataframe

I am trying to do some memory profiling on an azure databricks job. This job uses a python script that relies heavily on koalas dataframes for analysis. I want to analyze which dataframes or objects are taking up the most memory but koalas and…

python databricks spark-koalas

asked Mar 02 '22 at 22:32

MacMixer13

73
1
8

1

vote

2 answers

Join two dataframes on the values present in a specific column in the name_data dataframe using koalas

I am trying to join two the dataframes as shown below on the code column values present in the name_data dataframe. I have two dataframes shown below and I expect to have a resulting dataframe which would only have the rows from the…

python pandas dataframe databricks spark-koalas

asked Feb 15 '22 at 17:51

Anna

181
1
12

1

vote

1 answer

Azure Databricks - reading tables with koalas

I am quite new to Databricks, and I am trying to do some basic data exploration with koalas. When I log into Databricks, under DATA I see 2 main tabs, DATABASE TABLES and DBFS. I managed to read csv files as koalas dataframes…

sql azure databricks azure-databricks spark-koalas

asked Feb 07 '22 at 15:49

Brigi Szabo

11
1

1

vote

1 answer

PandasNotImplementedError for converted pandas dataframe to Koalas dataframe

I am having a small issue which I am facing in my code logic. I am converting a line of code which uses pandas dataframe to use Koalas dataframe and I get the following error during the code execution. # Error Message PandasNotImplementedError: The…

python pandas dataframe databricks spark-koalas

asked Feb 04 '22 at 17:04

Anna

181
1
12

1

vote

1 answer

pySpark dataframe transformations performance

I recently started working with pySpark. (Before it I worked with Pandas) I want to understand how does Spark execute and optimize transformations on dataframe. Can I make transformations one by one using one variable with dataframe? #creating…

apache-spark pyspark apache-spark-sql spark-koalas

asked Dec 31 '21 at 14:04

Ando23

11
1

1

vote

1 answer

Understanding the jars in pyspark

I'm new to spark and my understanding is this: jars are like a bundle of java code files Each library that I install that internally uses spark (or pyspark) has its own jar files that need to be available with both driver and executors in order for…

apache-spark pyspark spark-koalas

asked Dec 09 '21 at 10:22

figs_and_nuts

4,870
2
31
56

1

vote

2 answers

min() function doesn't work on koalas.DataFrame columns of date types

I created the following dataframe: import pandas as pd import databricks.koalas as ks df = ks.DataFrame( {'Date1': pd.date_range('20211101', '20211110', freq='1D'), 'Date2': pd.date_range('20201101', '20201110',…

pyspark spark-koalas

asked Nov 29 '21 at 13:55

Eran

844
6
20

1

vote

1 answer

How to use UDFs with pandas on pyspark groupby?

I am struggling to use pandas UDFs on pandas on pyspark. Can you please help me understand how this is to be achieved? Below is my attempt: import pyspark from pyspark.sql import SparkSession from pyspark.sql.functions import pandas_udf from pyspark…

apache-spark pyspark apache-spark-sql spark-koalas

asked Oct 27 '21 at 02:05

figs_and_nuts

4,870
2
31
56

1

vote

0 answers

Is plotting with Koalas using TopN has any statistic meaning?

I was going through the source code of Koalas, trying to get a handle on how they actually achieve plotting large datasets. It turns our that they use either sampling or TopN - selecting a given number of records. I understand the meaning of…

pandas pyspark statistics data-visualization spark-koalas

asked Jun 03 '21 at 10:53

jiawei hu

41
4

1

vote

1 answer

Adding a new column to an existing Koalas Dataframe results in NaN's

I am trying to add a new column to my existing Koalas dataframe. But the values turn into NaN's as soon as the new column is added. I am not sure what's going on here, could anyone give me some pointers? Here's the code: import databricks.koalas as…

python pandas apache-spark pyspark spark-koalas

asked May 23 '21 at 17:35

ShellZero

4,415
12
38
56

1

vote

1 answer

Set NOT NULL columns in koalas to_table

when I create a Delta table I can set some columns to be NOT NULL CREATE TABLE [db_name.]table_name [(col_name1 col_type1 [NOT NULL], ...)] USING DELTA Is there any way to set non null columns with koalas.to_table?

pyspark databricks delta-lake spark-koalas

asked Apr 19 '21 at 15:48

kismsu

1,049
7
22

1

vote

1 answer

How to create a new column with 2 or more condition validation in Koalas

I have made the column "Turno" on the df3 using 3 validation to classify into "Turno_PM", "Turno_AM" or "N/A", but I want to know if exist an "easies way" to reach the same result, like a "cycle for" with if/elif/else or something like that. Here…

pandas dataframe azure-databricks spark-koalas

asked Feb 10 '21 at 03:41

Francisco Leiva Díaz

33
5

Questions tagged [spark-koalas]