Questions tagged [py-datatable]

Use this tag for questions related to the `datatable` python library. Consider tagging your questions with [python] as well. Do not use this tag to ask questions about generic "tables of data".

Datatable is a python library for manipulating two-dimensional data tables (called Frames). It is similar in spirit to python pandas and R data.table.

108 questions
0
votes
1 answer

Convert python datatable column into a list column

How can I convert this python datatable have = dt.Frame(id=[1,1,2,2],val=["a","b","c","d"]) into this python datatable? want = dt.Frame(id = [1,2], val=[["a","b"],["c","d"]], types=[dt.Type.int32,dt.Type.obj64]) Specifically, I am trying to find…
langtang
  • 22,248
  • 1
  • 12
  • 27
0
votes
1 answer

Filter python datatable using `datatable.FExpr` with custom objects

Let's say I populate a Frame, where one column contains some custom object, like this: class CustomObject: def __init__(self,x=None,label=None): self.x=x self.label=label s1 = CustomObject(x="late", label="A") s2 =…
langtang
  • 22,248
  • 1
  • 12
  • 27
0
votes
0 answers

Fields are filled in with (unknown) after importing csv using fread in pydatatable

I have a csv with about 15 columns, this file is imported using fread from pydatatable. dt.fread('temp_sample.csv') Here some of columns are contained empty values i.e NAN's and partial entries and some other columns are completely NAN's. Here…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
0
votes
2 answers

Data.table in R into datatable in Python translation problem (normalization)

I have this code in R where I am using data.table, and I have the intention to translate it into Python with datatable. It creates columns with the value of each existing column divided by the mean of the total. Kind of normalization. dataset[ ,…
0
votes
1 answer

output format - array missing

I am having a problem with displaying all the columns of the array. only 2 columns are displayed instead of 8. enter image description here
FARAH
  • 1
0
votes
1 answer

With datatable is there a way to make faster appending than DataFrame?

I know read csv file in datatable is much faster than pandas DataFrame. However, in my case I have several csv files and i have to append one by one all of them. So i am doing append all of these pd.read_csv(file) to empty DataFrame. Will it be…
MCPMH
  • 175
  • 1
  • 1
  • 11
0
votes
0 answers

python datatable groupby and apply a custom function

I have historical data on users - I would like to fit an Ordinary Least Squares regression to find out the trends. my datalooks like user_id rating item_id date 12 3 19 2010-03-17 13 4 20 2010-03-18 1…
Areza
  • 5,623
  • 7
  • 48
  • 79
0
votes
0 answers

Replicate pandas grouped string join

Hi there I would like to join all strings within a group with Python datatable in order to avoid pandas. Below is the code I am currently using and which I would like to replicate in datatable. Does anyone know how to do it? Thank you very…
peter
  • 756
  • 5
  • 16
0
votes
1 answer

How to display pydatatable frame output in streamlit web app?

I have a script to display datatable frame output in streamlit app as: import datatable as dt import streamlit as st import pandas as pd st.set_page_config( page_title="pydatatable demo", layout="wide", initial_sidebar_state="expanded") DT =…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
0
votes
1 answer

Python datatable - apply lambda to multiple columns

I am looking to apply a function to multiple columns to a datatable in Python. With R's data.table one would: # columns to apply function to x <- c('col_1', 'col_2') # apply df[, (x) := lapply(.SD, function(x) as.Date(x, "%Y-%m-%d")),…
Sweepy Dodo
  • 1,761
  • 9
  • 15
0
votes
0 answers

Load a subset of columns from file into python datatable

I have a huge csv file and I want to load only a small subset of the columns with fread(). In pandas read_csv(), I'd use the usecols argument for this and pass a list of desired columns. How do I do this with a datatable? The documentation hints at…
Toby
  • 2,174
  • 4
  • 22
  • 32
0
votes
1 answer

datatable subset with functools and operator

Similar to this example py-datatable 'in' operator? but using another datatable to create a list, fails on the last step: import datatable as dt from datatable import f, by, count import operator import functools DT1 = dt.Frame(A = range(5)) DT2 =…
Rafael
  • 3,096
  • 1
  • 23
  • 61
0
votes
1 answer

Git rev-parse HEAD error when doing pip install

When trying to install the python package datatable, I get the following error: (venv) PS C:\Users\MART\Documents\Environments\cyber_analytics> pip install --no-cache-dir datatable Collecting datatable Downloading datatable-0.11.1.tar.gz (1.0 MB) …
0
votes
0 answers

Error on loading CSV using fread in pydatatable

I have a csv contains about 600K observations, and I'm importing it using fread DT = dt.fread('C:\\Users\\myamulla\\Desktop\\proyectos_de_py\\7726_analysis\\datasets\\7726export_Jan_23.csv') It is throwing out an error as…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
0
votes
1 answer

How to select observations based on Index specified in pydatatable?

I have a datatable as - DT = dt.Frame( A=[1, 3, 2, 1, 4, 2, 1], B=['A','B','C','A','D','B','A'], C=['myamulla','skumar','cary','myamulla','api','skumar','myamulla']) Out[14]: | A B C -- + -- -- -------- 0 | 1 A …
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30