Highest Voted 'spark-koalas' Questions

1

vote

2 answers

How to get number of groups in a groupby object in koalas?

How to get number of groups in a groupby object in koalas ? In pandas we can use ngroups, but this method is not implemented yet in koalas. Suppose groupby object is called dfgroup. Any idea ?

python databricks spark-koalas

asked Dec 25 '20 at 21:01

Ousen92i

137
1
8

1

vote

1 answer

Is there a better solution thant dt.weekofyear?

Is there a better solution than df['weekofyear'] = df['date'].dt.weekofyear? The problem of this solution is that, sometimes, the days after the last week of the year n but before the first week of the year n+1 are counted as week 1 and and not as…

python apache-spark pyspark spark-koalas

asked Dec 24 '20 at 10:45

Ousen92i

137
1
8

1

vote

1 answer

HIVE JDBC Connection Using Pyspark returns Column names as row values

I am using Pyspark to connect to HIVE and fetch some data. The issue is that it returns all rows with the values that are column names. It is returning correct column names. Only the Row values are incorrect. Here is my…

pyspark hive apache-spark-sql hiveql spark-koalas

asked Dec 17 '20 at 09:51

Shakir Shakeel

71
9

1

vote

2 answers

How to calculate an average stock price depending on periods

I am trying to calculate the average opening price for a stock, depending on different periods (week, month, year). Here you can see a part of my df : My dataframe (987 rows for the complete df) Firstly, I am trying to calculate the average opening…

python apache-spark pyspark spark-koalas

asked Dec 16 '20 at 19:12

Ousen92i

137
1
8

1

vote

1 answer

PandasNotImplementedError : Using nested np.where() in a Koalas DataFrame returns error

I am converting code written with Pandas to Koalas, but I'm coming across the error with use of numpy where: import pandas as pd import numpy as np import databricks.koalas as ks data = {'credit': [123.23, 23423.56, 0, 0], 'debit': [0, 0, 234.21,…

python pandas numpy databricks spark-koalas

asked Dec 16 '20 at 14:59

Whitewater

297
2
12

1

vote

1 answer

How change the value in a koalas dataframe based in a condition

I am using Koalas and I want to change the value of a column based on a condition. In pandas I can do that using: import pandas as pd df_test = pd.DataFrame({ 'a': [1,2,3] ,'b': ['one','two','three']}) df_test2 = pd.DataFrame({ 'c':…

pandas pyspark spark-koalas

asked Nov 27 '20 at 13:35

J.C Guzman

1,192
3
16
40

1

vote

1 answer

Sum null values using Koalas

What is a good method to sum dataframes for all Null / NaN values when using Koalas? or stated another way How might I return a list by column of total null value counts. I am trying to avoid converting the dataframe to spark or pandas if…

python dataframe apache-spark data-science spark-koalas

asked Oct 05 '20 at 19:06

SteveZ

21
3

1

vote

2 answers

Ffill and interpolate koalas dataframe

Is it possible to interpolate and ffill different columns in a Koalas dataframe something like this? %%spark -s sparkenv2 kdf = ks.DataFrame({ 'id':[1,2,3,4], 'A': [None, 3, None, None], 'B': [2, 4, None, 3], 'C': [99, None, None,…

apache-spark interpolation missing-data fill spark-koalas

asked Aug 03 '20 at 04:37

Zeus

1,496
2
24
53

1

vote

1 answer

Koalas applymap moving all data to a single partition

I need to do element-wise operation on a Koalas DataFrame. I use for that the Koalas applymap method. On the execution Koalas moves all data to one partition and then applies the operation. The outcome is that the performance of the job is very…

python apache-spark pyspark spark-koalas

asked May 29 '20 at 11:19

Grzegorz

1,268
11
11

1

vote

1 answer

Databricks Koalas fails importing parquet file

I ran into an error when importing parquet file from Azure data lake to databricks. I tried other ways like importing parquet as Spark DataFrame successfully, but when I converted the Spark DF to Koalas DF, it gave the same error. I also tried to…

python pandas pyspark databricks spark-koalas

asked Apr 01 '20 at 15:29

MiRe Y.

57
8

1

vote

0 answers

Configure pyspark standalone to run executors by users

I had an issue writing parquet file using pyspark (Koalas) with standalone cluster. The error I encountered was java.io.IOException: Could not rename file. I figured out from here that it was because the driver ran by user, and executor processes…

python python-3.x pyspark parquet spark-koalas

asked Feb 20 '20 at 16:58

Matthew Son

1,109
8
27

1

vote

1 answer

Impossible to import koalas in scala notebook

It seems basic but from what I see on databricks website, nothing works on my side I have installed koalas package on my cluster But when I try to import the package in my Scala notebook, I have issue. command-3313152839336470:1: error: not found:…

scala databricks azure-databricks spark-koalas

asked Feb 11 '20 at 10:38

Matthieu K

25
1
7

1

vote

0 answers

Unable to load a JSON file in koalas, getting connection refused error

Problem Description I tried to load a JSON file using koalas but it's throwing connection refused error. Can someone please help me out to figure out the issue, if I am missing anything here? Package Versions Pyspark : '2.4.3' koalas:…

python-3.x pyspark spark-koalas

asked Jan 06 '20 at 07:26

Naga Budigam

689
1
10
26

1

vote

1 answer

PySpark Cannot calculate column wise standard deviation in Koalas DataFrame

I have a Koalas DataFrame in PySpark. I want to calculate the column-wise standard deviation. I have tried doing: df2['x_std'] = df2[['x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9', 'x_10','x_11', 'x_12']].std(axis = 1) I get the…

python pandas pyspark spark-koalas

asked Nov 07 '19 at 22:42

K. K.

552
1
11
20

1

vote

1 answer

Do I need to install Koalas on every node of my Spark cluster or just on the master node?

I discovered Koalas from Spark+AI Summit which brings pandas to Spark. As far as I know if I need to map a third party function to a Spark DataFrame, I have to install the package on every node of my Spark cluster. Is this the same for Koalas? Or I…

python pandas apache-spark spark-koalas

asked Oct 28 '19 at 20:48

Yuan JI

2,927
2
20
29

Questions tagged [spark-koalas]