Questions tagged [data-processing]

Data Processing concerns the converting of raw data to machine-readable form and its subsequent processing (as storing, updating, rearranging, or printing out) by a computer.

Data Processing concerns the converting of raw data to machine-readable form and its subsequent processing (as storing, updating, rearranging, or printing out) by a computer.

More Info

909 questions
8
votes
3 answers

Plotting many lines as a heatmap

I have a large number (~1000) of files from a data logger that I am trying to process. If I wanted to plot the trend from a single one of these log files I could do it using plot(timevalues,datavalues) I would like to be able to view all of these…
Hugoagogo
  • 1,598
  • 16
  • 34
7
votes
1 answer

what is "file_like_object", what is "file"; pickle.load() and pickle.loads()

I am figuring out the differences between the pickle.load() and pickle.loads(). Somebody said what kind of object that pickle.load() process is "file_like_object", however, pickle.loads() corresponds to "file object".
Eric Kani
  • 809
  • 3
  • 10
  • 17
7
votes
2 answers

How to use very large dataset in RNN TensorFlow?

I have a very large dataset: 7.9 GB of CSV files. 80% of which shall serve as the training data, and the remaining 20% shall serve as test data. When I'm loading the training data (6.2 GB), I'm having MemoryError at the 80th iteration (80th file).…
afagarap
  • 650
  • 2
  • 10
  • 22
7
votes
3 answers

How to get topic associated with each document using pyspark(2.1.0) LdA?

I am using LDAModel of pyspark to get topics from corpus. My goal is to find topics associated with each document. For that purpose I tried to set topicDistributionCol as per Docs. Since I am new to this, I am not sure what is the purpose of this…
7
votes
6 answers

Is there a way to combine these queries?

I have begun working some of the programming problems on HackerRank as a "productive distraction". I was working on the first few in the SQL section and came across this problem (link): Query the two cities in STATION with the shortest and longest…
mbm29414
  • 11,558
  • 6
  • 56
  • 87
7
votes
1 answer

How to gracefully fallback to `NaN` value while reading integers from a CSV with Pandas?

While using read_csv with Pandas, if i want a given column to be converted to a type, a malformed value will interrupt the whole operation, without an indication about the offending value. For example, running something like: import pandas as…
danza
  • 11,511
  • 8
  • 40
  • 47
7
votes
4 answers

Remove rows from dataframe that contains only 0 or just a single 0

I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns. Also, and this is where…
KnightofniDK
  • 139
  • 1
  • 1
  • 9
7
votes
4 answers

using Hibernate to loading 20K products, modifying the entity and updating to db

I am using hibernate to update 20K products in my database. As of now I am pulling in the 20K products, looping through them and modifying some properties and then updating the database. so: load products foreach products session…
Blankman
  • 259,732
  • 324
  • 769
  • 1,199
7
votes
5 answers

What is the best way to reduce cyclomatic complexity when validating data?

Right now I'm working on a web application that receives a significant amount of data from a database that has a potential to return null results. When going through the cyclomatic complexity for the application a number of functions are weighing in…
rjzii
  • 14,236
  • 12
  • 79
  • 119
7
votes
1 answer

Replacing numbers within a range with a factor

Given a dataframe column which is a series of integers (age), I want to convert ranges of integers into ordinal variables. My current code doesn't work, how do I do this? df <- read.table("http://dl.dropbox.com/u/822467/df.csv", header = TRUE, sep =…
RJ-
  • 2,919
  • 3
  • 28
  • 35
6
votes
2 answers

Hive bucketing through sparkSQL

I have one doubt regarding bucketing in hive. I have created one temporary table which is bucketed on column key. Through spark SQL I am inserting data into this temporary table. I have enabled the hive.enforce.bucketing to true in spark…
Sumit D
  • 171
  • 1
  • 3
  • 14
6
votes
2 answers

How can I read specific data columns from a file in c

Good day all, I am a beginner in c programming.I have this problem and have have spent quite a huge amount of time on it without any considerable progress. My problem is stated thus: I have a series of files with the extension (.msr), they contain…
chriscol
  • 91
  • 1
  • 2
  • 4
6
votes
5 answers

Create new binary variables from single string of levels recorded for each observation

I have been fiddling with the Kaggle West-Nile Virus competition data as a means to practice fitting a spatio-temporal GAM. The first few rows of the (somewhat processed from the original CSV) weather data are below (plus the first 20 rows a…
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
6
votes
1 answer

Relational database versus R/Python data frames

I was exposed to the world of tables and data structures in R before the RDBMS systems and other database systems. It is quite elegant in R/Python to create tables and lists from stuctured data (.csv or other formats) and then do data manipulations…
6
votes
4 answers

How to read 4GB file on 32bit system

In my case I have different files lets assume that I have >4GB file with data. I want to read that file line by line and process each line. One of my restrictions is that soft has to be run on 32bit MS Windows or on 64bit with small amount of RAM…
bioky
  • 71
  • 1
  • 7
1
2
3
60 61