Highest Voted 'spark-koalas' Questions

1

vote

0 answers

"SparkException: Job aborted" when Koalas writes to Azure blob storage

I am using Koalas (pandas API on Apache Spark) to write a dataframe out to a mounted Azure blob storage. When calling the df.to_csv API, Spark throws an exception and aborts the job. Only a few of the stages seem to fail with the following…

asked Oct 23 '19 at 13:27

bramb

213
2
14

0

votes

0 answers

Why does ps.merge() give a different result from pd.merge()?

I have a dataframe billplan which is a Pyspark.Pandas dataframe. I have a code where i convert it to Pandas dataframe and do a pd.merge: # TS_BILLPLAN : PREV & PREV_PREV attributes billplan = billplan.to_pandas() billplan =…

pandas merge spark-koalas

asked Sep 02 '23 at 10:11

Ee Ann Ng

109
1
8

0

votes

1 answer

Solving a system of multi-variable equations using PySpark on Databricks

Any suggestion or help or references are most welcome for the below problem statement. I am performing big data analysis on the data that is currently stored on Azure. The actual implementation is more complex than the set of equations provided…

python apache-spark pyspark pyspark-pandas spark-koalas

asked Aug 30 '23 at 20:27

lord_mendonca

9
4

0

votes

0 answers

Using Koalas, how do I save to an external table?

I have the code below to save a Koalas dataframe to an Orc table. How to modify it to save to an EXTERNAL table? df.reset_index().to_orc( f"/corporativo/mydatabase/mytable", mode="overwrite", partition_cols=["year", "month"] )

pyspark-pandas spark-koalas

asked Aug 28 '23 at 20:33

neves

33,186
27
159
192

0

votes

0 answers

How to save a Koala dataframe to ORC using ZLIB compression?

Koalas dataframes (the Pandas API to Spark) has a to_orc method to save in the ORC format. How to call it telling it save compressed using the ZLIB method?

pyspark spark-koalas

asked Aug 24 '23 at 22:06

neves

33,186
27
159
192

0

votes

0 answers

Can I do a groupby and shift on a Date column using Pandas on Spark API, similar to how I can in Pandas?

On Pandas, I can do the following code: contract['PREV_END'] = contract.groupby('SUBSCR_NO').END.shift(1) But using Pandas on Spark API, I get this error: AnalysisException: cannot resolve 'isnan(lag(CON_END, 1, NULL) OVER (PARTITION BY SUBSCR_NO…

group-by spark-koalas

asked Aug 19 '23 at 10:36

Ee Ann Ng

109
1
8

0

votes

0 answers

koalas: does it have PARTITION BY + ROW_COUNT()?

I'm trying to use Koalas to process my dataframes. Does it have rolling window functions over partitions? Something like PARTITION BY and ROW_NUMBER() in Hive or Postgres?

sql pandas apache-spark pyspark spark-koalas

asked Jun 07 '23 at 16:00

Felix

3,351
6
40
68

0

votes

1 answer

facing issues in installing koalas for Python version 3.8.10 (AttributeError: module 'numpy' has no attribute 'bool')

According to this document https://koalas.readthedocs.io/en/latest/getting_started/install.html System info: numpy 1.24.3 koalas 1.8.2 pyspark 3.4.0 Python 3.8.10 Facing Issue when trying to read csv file import databricks.koalas as…

python numpy spark-koalas

asked Jun 03 '23 at 11:48

Sam777

15
6

0

votes

2 answers

The method `pd.groupby.GroupBy.prod()` is not implemented yet

I have a database with two columns: name (str) and probability (float). I am running this command: df[['name','probability']].groupby('name').prod() on a Databricks (runtime 7.3) notebook and df is a pyspark.pandas dataframe. The error I get…

python pandas database databricks spark-koalas

asked Dec 13 '22 at 13:03

Qarolina

1
1

0

votes

1 answer

How to pivot string column using pandas api on spark

I am attempting to convert some code my organization uses from pandas dataframes to pandas api on spark dataframes. We have run into a problem when we try to convert our pivot functions where pandas api on spark does not allow pivot operations on…

python pyspark spark-koalas

asked Dec 02 '22 at 22:28

MacMixer13

73
1
8

0

votes

0 answers

Series object error in koalas using count vectorizer

I am new to spark and trying to run in count vectorizer using koalas data frame but getting error over this code. Koalas uses Pandas API, so I tried to run this count vectorizer code but got an error - 'Series' object has no attribute…

python pyspark spark-koalas

asked Nov 21 '22 at 13:22

Bibhu Kalyan das

41
1
4

0

votes

0 answers

Koalas: ValueError: not enough values to unpack (expected 3, got 2)

I am doing a simple dfq.head() of a koalas dataframe but got the error below. I know this is not related to how my data looks like but rather than the versions of the libraries I am using. But can't figure out the issue. This is my spark…

python pyspark pyarrow spark-koalas

asked Nov 16 '22 at 19:03

heinistic

731
2
8
16

0

votes

2 answers

group by in pandas API on spark

I have a pandas dataframe below, data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings', 'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'], 'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2], 'Year':…

pandas apache-spark pyspark group-by spark-koalas

asked Nov 11 '22 at 14:47

code_bug

355
1
12

0

votes

1 answer

Pass a python variable to SQL query with koalas

I am using a databricks notebook and I would like to pass several python variables to an SQL query using koalas.sql. Here a simplified example of what I am trying to do. import databricks.koalas as ks query = """ SELECT * FROM…

python sql databricks spark-koalas

asked Oct 26 '22 at 12:53

Qarolina

1
1

0

votes

1 answer

Index position in koalas

I have a kolas dataframe and I am trying to find out the index value of a specific record, but I keep getting the error "TypeError: 'Int64Index' object is not subscriptable". Below is the code which I tried. kdf = ks.DataFrame({ 'id':[1,2,3,4], …

pandas spark-koalas

asked Sep 19 '22 at 17:01

Nikesh

47
6

Questions tagged [spark-koalas]