Questions tagged [py-datatable]

Use this tag for questions related to the `datatable` python library. Consider tagging your questions with [python] as well. Do not use this tag to ask questions about generic "tables of data".

Datatable is a python library for manipulating two-dimensional data tables (called Frames). It is similar in spirit to python pandas and R data.table.

108 questions
3
votes
1 answer

How to aggregate columns of type `dict`

I have a Frame as follows: x = dt.Frame(k = [1, 1, 2], v = [{'a':1, 'b':2}, {'a':3}, {'b':4}]) which looks like this: k v ▪▪▪▪ ▪▪▪▪▪▪▪▪ 1 {'a': 1, 'b': 2} 1 {'a': 3} 2 {'b': 4} What I'm trying to do is to…
R. Zhu
  • 415
  • 4
  • 16
3
votes
1 answer

How to combine (merge) two datatable Frame in python

Given two datatable Frame. How to combine (merge) them in one frame? dt_f_A = +--------+--------+--------+-----+--------+ | A_at_1 | A_at_2 | A_at_3 | ... | A_at_m | +--------+--------+--------+-----+--------+ | v_1 | | | | …
ibra
  • 1,164
  • 1
  • 11
  • 26
3
votes
4 answers

How to convert correctly a datatable of integers (from Python datatable library) to pandas Dataframe

I am using Python datatable (https://github.com/h2oai/datatable) to read a csv file that contain only integers values. After that I convert the datatable to pandas Dataframe. At the conversion, the columns that contain only 0/1 are considered as…
ibra
  • 1,164
  • 1
  • 11
  • 26
3
votes
1 answer

How to lump together factor levels of a string type column into another in pydatatable?

I have a datatable as, DT_X = dt.Frame({'variety': ['Caturra', 'Bourbon', 'Typica', 'Catuai', 'Hawaiian Kona', 'Yellow Bourbon', 'Mundo Novo', 'Catimor', 'SL14', 'SL28', 'Pacas', 'Gesha', 'Pacamara', 'SL34', 'Arusha', …
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
3
votes
2 answers

How to deselect pydatatable columns based on their types?

I have created a datatable as, DT_X = dt.Frame({'x':[1,2,3,4,5], 'y':[0.1,0.5,0.9,1.5,4.3], 'z':['a','b','c','d','e'], 'u':[True,False,True,False,False], 'v':[10,20,30,40,50], …
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
3
votes
1 answer

How to find and mark duplicates in a python datatable

I would like to identify the duplicated rows in a py-dtatable by group (and create a helper column C with a bool). It should work along the lines of this: DT = dt.Frame(A=[1, 2, 1, 2, 2, 1], B=list("XXYYYY")) I get -> TypeError: Expected a Frame,…
Zappageck
  • 122
  • 9
3
votes
1 answer

Converting string column to date format in datatable frame in python

For an easy example : import datatable as dt import pandas as pd from datetime import datetime d_t = dt.Frame(pd.DataFrame({"Date": ["04/05/2020", "04/06/2020"]})) There is only a column named Date with two values in str32 type. How could I…
3
votes
3 answers

How to find unique values by group in datatable Frame

I have created a datatable frame as follows, DT_EX = dt.Frame({'cid':[1,2,1,2,3,2,4,2,4,5], 'cust_life_cycle':['Lead','Active','Lead','Active','Inactive','Lead','Active','Lead','Inactive','Lead']}) Here I have three unique…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
3
votes
0 answers

Apply an aggregate function to a python datatable column after group by

Is it possible to "apply" a user function to a python datatable after groupby? For example: import datatable as dt from datatable import f, by, sum df = dt.Frame(SYM=['A','A','A','B','B'], xval=[1.1,1.2,2.3,2.4,2.5]) print(df[:, sum(f.xval),…
balaks
  • 215
  • 1
  • 8
3
votes
1 answer

Apply aggregate function to a datatable column and return value, not datatable

Perhaps a dumb question but.. In R data.table, if I want to get the mean of a column, I can reference a column vector like foo$x and calculate its mean with something like mean(foo$x). I can't figure out how to do this operation with Python…
Ben
  • 20,038
  • 30
  • 112
  • 189
3
votes
1 answer

Python data.table row filter by regex

What is the data.table for python equivalent of %like%? Short example: dt_foo_bar = dt.Frame({"n": [1, 3], "s": ["foo", "bar"]}) dt_foo_bar[re.match("foo",f.s),:] #works to filter by "foo" I had expected something like this to…
Jed Gore
  • 31
  • 3
3
votes
3 answers

Analyse huge csv file in R/Python and sampling X% according to the distribution of the file?

I have a large csv file (6 GB) and I want to sample 20% of it. These 20% should be with same distribution as the large original file. For example, take Kaggles data: https://www.kaggle.com/c/avazu-ctr-prediction/data I thought about chunks but how…
SteveS
  • 3,789
  • 5
  • 30
  • 64
2
votes
1 answer

How to roll up duplicate observation in pydatatable?

I have a data frame as- my_dt = dt.Frame({'last_name':['mallesh','bhavik','jagarini','mallesh','jagarini'], 'first_name':['yamulla','vemulla','yegurla','yamulla','yegurla'], …
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
2
votes
2 answers

Remove rows which have na values

I have the following datatable in python:- # A B B_lag_1 B_lag_2 B_lag_3 B_lag_4 #0 0 −0.342855 NA NA NA NA #1 0 …
Shawn Brar
  • 1,346
  • 3
  • 17
2
votes
2 answers

datatable: process 2 frames

data_df = pd.DataFrame({"AAA": [1, 2, 1, 3], "BBB": [1, 1, 2, 2], "CCC": [2, 1, 3, 1]}) lookup_df = pd.DataFrame({"key": [1,2,3], "value" : ["Alpha", "Beta", "Charlie"]}) data_dt =…
l a s
  • 3,836
  • 10
  • 42
  • 61