Highest Voted 'data-processing' Questions

8

votes

3 answers

Plotting many lines as a heatmap

I have a large number (~1000) of files from a data logger that I am trying to process. If I wanted to plot the trend from a single one of these log files I could do it using plot(timevalues,datavalues) I would like to be able to view all of these…

asked Apr 29 '14 at 05:30

Hugoagogo

1,598
16
34

7

votes

1 answer

what is "file_like_object", what is "file"; pickle.load() and pickle.loads()

I am figuring out the differences between the pickle.load() and pickle.loads(). Somebody said what kind of object that pickle.load() process is "file_like_object", however, pickle.loads() corresponds to "file object".

python pickle data-processing

asked Jan 29 '18 at 10:19

Eric Kani

809
3
10
17

7

votes

2 answers

How to use very large dataset in RNN TensorFlow?

I have a very large dataset: 7.9 GB of CSV files. 80% of which shall serve as the training data, and the remaining 20% shall serve as test data. When I'm loading the training data (6.2 GB), I'm having MemoryError at the 80th iteration (80th file).…

pandas machine-learning tensorflow dataset data-processing

asked Jul 25 '17 at 09:19

afagarap

650
2
10
22

7

votes

3 answers

How to get topic associated with each document using pyspark(2.1.0) LdA?

I am using LDAModel of pyspark to get topics from corpus. My goal is to find topics associated with each document. For that purpose I tried to set topicDistributionCol as per Docs. Since I am new to this, I am not sure what is the purpose of this…

pyspark data-mining lda topic-modeling data-processing

asked Jan 31 '17 at 13:09

Hiren patel

971
8
25

7

votes

6 answers

Is there a way to combine these queries?

I have begun working some of the programming problems on HackerRank as a "productive distraction". I was working on the first few in the SQL section and came across this problem (link): Query the two cities in STATION with the shortest and longest…

sql-server database data-processing

asked Aug 24 '16 at 14:15

mbm29414

11,558
6
56
87

7

votes

1 answer

How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?

While using read_csv with Pandas, if i want a given column to be converted to a type, a malformed value will interrupt the whole operation, without an indication about the offending value. For example, running something like: import pandas as…

python csv pandas data-processing

asked May 12 '15 at 12:10

danza

11,511
8
40
47

7

votes

4 answers

Remove rows from dataframe that contains only 0 or just a single 0

I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns. Also, and this is where…

r filtering bioinformatics data-processing

asked Aug 08 '14 at 12:41

KnightofniDK

139
1
1
9

7

votes

4 answers

using Hibernate to loading 20K products, modifying the entity and updating to db

I am using hibernate to update 20K products in my database. As of now I am pulling in the 20K products, looping through them and modifying some properties and then updating the database. so: load products foreach products session…

java hibernate orm data-processing

asked Mar 07 '10 at 22:36

Blankman

259,732
324
769
1,199

7

votes

5 answers

What is the best way to reduce cyclomatic complexity when validating data?

Right now I'm working on a web application that receives a significant amount of data from a database that has a potential to return null results. When going through the cyclomatic complexity for the application a number of functions are weighing in…

code-metrics cyclomatic-complexity data-processing

asked Oct 15 '08 at 13:41

rjzii

14,236
12
79
119

7

votes

1 answer

Replacing numbers within a range with a factor

Given a dataframe column which is a series of integers (age), I want to convert ranges of integers into ordinal variables. My current code doesn't work, how do I do this? df <- read.table("http://dl.dropbox.com/u/822467/df.csv", header = TRUE, sep =…

r data-processing r-factor

asked Apr 19 '12 at 06:08

RJ-

2,919
3
28
35

6

votes

2 answers

Hive bucketing through sparkSQL

I have one doubt regarding bucketing in hive. I have created one temporary table which is bucketed on column key. Through spark SQL I am inserting data into this temporary table. I have enabled the hive.enforce.bucketing to true in spark…

apache-spark hive apache-spark-sql data-processing

asked Aug 02 '18 at 13:25

Sumit D

171
1
3
14

6

votes

2 answers

How can I read specific data columns from a file in c

Good day all, I am a beginner in c programming.I have this problem and have have spent quite a huge amount of time on it without any considerable progress. My problem is stated thus: I have a series of files with the extension (.msr), they contain…

c data-processing

asked Jul 28 '10 at 10:32

chriscol

91
1
2
4

6

votes

5 answers

Create new binary variables from single string of levels recorded for each observation

I have been fiddling with the Kaggle West-Nile Virus competition data as a means to practice fitting a spatio-temporal GAM. The first few rows of the (somewhat processed from the original CSV) weather data are below (plus the first 20 rows a…

r data-manipulation data-processing

asked Jun 11 '15 at 20:17

Gavin Simpson

170,508
25
396
453

6

votes

1 answer

Relational database versus R/Python data frames

I was exposed to the world of tables and data structures in R before the RDBMS systems and other database systems. It is quite elegant in R/Python to create tables and lists from stuctured data (.csv or other formats) and then do data manipulations…

database database-design dataframe data-processing data-collection

asked May 14 '15 at 18:20

KarthikS

883
1
11
17

6

votes

4 answers

How to read 4GB file on 32bit system

In my case I have different files lets assume that I have >4GB file with data. I want to read that file line by line and process each line. One of my restrictions is that soft has to be run on 32bit MS Windows or on 64bit with small amount of RAM…

c++ boost large-files 32-bit data-processing

asked Aug 05 '14 at 22:01

bioky

71
1
7

Questions tagged [data-processing]

More Info

Plotting many lines as a heatmap

what is "file_like_object", what is "file"; pickle.load() and pickle.loads()

How to use very large dataset in RNN TensorFlow?

How to get topic associated with each document using pyspark(2.1.0) LdA?

Is there a way to combine these queries?

How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?

Remove rows from dataframe that contains only 0 or just a single 0

using Hibernate to loading 20K products, modifying the entity and updating to db

What is the best way to reduce cyclomatic complexity when validating data?

Replacing numbers within a range with a factor

Hive bucketing through sparkSQL

How can I read specific data columns from a file in c

Create new binary variables from single string of levels recorded for each observation

Relational database versus R/Python data frames

How to read 4GB file on 32bit system