Questions tagged [spark-koalas]

Koalas is an implementation of the pandas API on top of Apache Spark.

To learn more about koalas, you can

120 questions
0
votes
2 answers

What is the difference between a spark dataframe and a koalas dataframe?

I am trying to understand the internal workings of koalas. Every tutorial I have used has presented me with three concepts Spark dataframe Internal frame koalas dataframe According to my understanding, the spark dataframe is the typical…
figs_and_nuts
  • 4,870
  • 2
  • 31
  • 56
0
votes
1 answer

Convert list of dict into DataFrame with Koalas

I've tried to convert a list of dicts into a Databricks' Koalas DataFrame but I keep getting the error message: ArrowInvalid: cannot mix list and non-list, non-null values Pandas works perfectly (with pd.DataFrame(list)) but because of company…
Alex M
  • 51
  • 5
0
votes
1 answer

Filter index values in koalas Data frame

I am trying to recreate the below operation in kolas, In pandas this works when i try the same in koalas it throws an error. Operation tried in Pandas: df = pd.DataFrame({'foo':['a','b','c','d','e'], 'bar':['1', '2', '3','4','5']}) df1 =…
0
votes
2 answers

fill NA of a column with elements of another column

i'm in this situation, my df is like that A B 0 0.0 2.0 1 3.0 4.0 2 NaN 1.0 3 2.0 NaN 4 NaN 1.0 5 4.8 NaN 6 NaN 1.0 and i want to apply this line of code: df['A'] = df['B'].fillna(df['A']) and I expect a workflow and…
0
votes
0 answers

Databricks pyspark AnalysisException after modify a column

I have this koalas dataframe that is a merge of two other dataframes. it got 4 columns rewritten as the max value of their group on the specified key. Also got a new column with value of 0 1 if another column is null or not. t0 =…
0
votes
1 answer

Column having no values gives the error 'can not infer schema' while reading excel to dataframe using koalas read_excel()

While reading excel file as dataframe using databricks koalas read_excel() with dtype as str, if a column is not having values it gives the error can not infer schema from empty dataset How to solve this issue? If I change the dtype to None, it…
Divzz
  • 63
  • 6
0
votes
1 answer

How Add new fields to Json in Python?

I am a basic Python Programmer. i am using python3 and trying to add an element to list of dictionary. i want to add different element to each dictionary of list.I tried using append(),add() and insert but by bad i did not find any luck. here my…
0
votes
1 answer

Sample data set in Koalas

I have below code which uses pandas dataframe. However when i convert Pandas dataframe to Koalas and run the below code I get error "Function sample currently does not support specifying exact number of items to return. Use frac…
0
votes
0 answers

Select multiple list type columns from Koalas/Pandas Dataframe and construct a new Dataframe

I have a dataframe where each row looks like this: Pandas(Index=0, a=array([0.78420993, 0.61972316, 0.46183716, 0.48915005, 0.77913277, 0.06024269, 0.81624765, 0.88517468, 0.13920925, 0.1065294 ]), b=array([0.77951759, 0.66244447, 0.9437135 ,…
user2476295
  • 167
  • 1
  • 1
  • 8
0
votes
1 answer

Spark-Koalas Error: Column assignment doesn't support type tuple

I am unable to assign kdf[c].factorize() to kdf[c]. I tried this but it didn't help: ks.set_option('compute.ops_on_diff_frames', True) kdf[response] = ks.Series(kdf[response].factorize()) ks.reset_option('compute.ops_on_diff_frames') Any help…
Vaibhav
  • 2,527
  • 1
  • 27
  • 31
0
votes
2 answers

How can I iterate through elements of a koala groupby?

I would like to iterate through groups in a dataframe. This is possible in pandas, but when I port this to koalas, I get an error. import databricks.koalas as ks import pandas as pd pdf = pd.DataFrame({'x':range(3), 'y':['a','b','b'],…
Chogg
  • 389
  • 2
  • 19
0
votes
0 answers

Koalas sort_index increase spark partitions

I'm new to koalas and I was surprised that when I use the method sort_index() and sort_values() the spark partition increase automatically. Example: import databricks.koalas as ks df = ks.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'], …
Devilfire
  • 13
  • 3
0
votes
1 answer

Comparing two koalas dataframes for testing purposes

Pandas has a testing module that includes assert_frames_equal. Does Koalas have anything similar? I am writing tests on a whole set of transformations to koalas dataframes. At first, since my test csv files have only a few (<10) rows, I thought…
Nano Tellez
  • 116
  • 6
0
votes
2 answers

What is the fastest way to return one row from a big pyspark dataframe or koalas dataframe in databricks?

I have a big dataframe(20 Million rows, 35 columns) in koalas on a databricks notebook. I have performed some transform and join(merge) operations on it using python such as: mdf.path_info = mdf.path_info.transform(modify_path_info) x =…
0
votes
2 answers

how to create empty koalas df

I am trying to create empty Koalas DataFrame using the following command df = ks.from_pandas(pd.DataFrame(columns=['A', 'B', 'C'])) But I am getting the following error ValueError: can not infer schema from empty or null dataset I tried following…
Nikhil Gupta
  • 376
  • 3
  • 19