Questions tagged [data-processing]

Data Processing concerns the converting of raw data to machine-readable form and its subsequent processing (as storing, updating, rearranging, or printing out) by a computer.

Data Processing concerns the converting of raw data to machine-readable form and its subsequent processing (as storing, updating, rearranging, or printing out) by a computer.

More Info

909 questions
-1
votes
2 answers

R - Sorting a large amount of data efficiently - memory issue

Obligatory system setup in case it helps : Running Windows 10, R 3.2.3 Intel Core i7 2600k. 16 GB RAM. R is set to have access to as much RAM as it wants. Hello! I have a few hundred files, each with a data frame of size Nx29 or Nx31. I am combining…
Jibril
  • 967
  • 2
  • 11
  • 29
-1
votes
1 answer

Document arranging based on similarity using TF-IDF

I want to rank 100 documents based on similarity. For example 10 documents will be similar say (A, A', A'', A''',...) and another set of 10 documents could be similar say (B, B', B'', B''', ...). Now documents should be ranked as A, A'', A''', ...,…
Hemanthkumar
  • 51
  • 1
  • 6
-1
votes
1 answer

logical error in python script, stuck on indentation

The goals of this script are simple: read in a .csv file strip out instances of the escape character & and replace it with & eliminate all rows that don't satisfy the following criteria: validate the lines to ensure that they have no more or…
smatthewenglish
  • 2,831
  • 4
  • 36
  • 72
-1
votes
3 answers

Excel : Get the most frequent value for each group

I Have a table ( excel ) with two columns ( Time 'hh:mm:ss' , Value ) and i want to get most frequent value for each group of row. for example i have Time | Value 4:35:49 | 122 4:35:49 | 122 4:35:50 | 121 4:35:50 | 121 4:35:50 | 111 4:35:51 |…
Horthe92
  • 53
  • 1
  • 1
  • 6
-1
votes
4 answers

How to send A List between activities through intents in android if the size of list is too big

What is the best way to send big amount of data through intents from one activity to another. Should I split the data into small amounts and send them one by one through intents. If It takes a lot of intents such as around 1000 intents to send the…
sohan nohemy
  • 615
  • 5
  • 13
-1
votes
1 answer

Breaking down a SAS macro into pseudocode

I need to break down this SAS macro that adds suffixes to some number of variables into pseudocode, but there are some parts of it I don't fully understand. %macro add_suffix(lib,dsn, suffix); options pageno=1 nodate; OPTIONS OBS= 1; …
Danzo
  • 553
  • 3
  • 13
  • 26
-1
votes
1 answer

Elegant data processing in Python

I don't know even how to name my problem - I have a list of tuples in python: (int, str, datetime, float) There is a bunch of rows in that list, they are sorted by datetime and I'd like to count how much rows in a 5 minute time span, which have…
bartekmp
  • 403
  • 3
  • 9
  • 21
-1
votes
1 answer

How send json's data via curl?

I have some simple code, like this: import json from bottle import route, request,run @route('/process_json',methods='POST') def data_process(): data = json.loads(request.data) username = data['username'] password =…
szarad
  • 81
  • 1
  • 7
-1
votes
4 answers

What are some good Perl modules for flow-based programming on files?

What are some good Perl modules to process files based on configurations? Basically I am working on taking data files, split them into columns, remove some rows based on some columns, remove unnecessary columns, compare them to baseline (writes…
kthakore
  • 1,566
  • 3
  • 17
  • 32
-2
votes
2 answers

if else using datetime in python

Nama No.ID Tgl/Waktu No.PIN Kode Verifikasi Alif 100061 17/12/2022 07:53:26 Sidik Jari Alif 100061 17/12/2022 13:00:25 Sidik Jari Alif 100061 19/12/2022 07:54:59 Sidik Jari Alif 100061 19/12/2022 16:18:14 Sidik…
Akazadi
  • 55
  • 6
-2
votes
1 answer

How to make an index column based on already existing column having a specific pattern in r?

I have a column name set, in a dataframe df which looks like df <- data.frame(set <- c("","","","","","set","","","","","set","","","","","set")) now I want a column set_sequence based on pattern from column set which should look like: df <-…
cd21
  • 23
  • 3
-2
votes
2 answers

How can I filter by value for the multi valued column using awk?

I would like to use awk to filter out a multi valued column. My data is two columned with the delimiter ;. The second column has three float values separated with white spaces. randUni15799:1;0.00 0.00 0.00 randUni1785:1;0.00 0.00…
-2
votes
1 answer

delete text and all new line characters between 2 words in pyhton

I have the following text as given \nOUTPUTFORMAT \n \'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat\'\nLOCATION\n \'hdfs://nameservice1/user/hive/warehouse/dev_cmt.db/badge\'\nTBLPROPERTIES (\n …
Shiva
  • 212
  • 2
  • 11
-2
votes
1 answer

correlation failure - Pearson

I want to write to datafile information about correlation as follows: *korelacja=cor(p2,d2,method="pearson",use = "complete.obs") korelacja2=cor(p2,d2,method="kendall",use = "complete.obs") korelacja3=cor(p2,d2,method="spearman",use =…
Mateusz
  • 49
  • 4
-2
votes
3 answers

Finding the set difference(A-B) between two 1.75 GB CSV files containing 50 million rows

I have two files having 50 million rows each and of size 1.75GB each. I am unable to load it into google colab or my computer to run a python script to find the set difference (A-B). My computer and the colab notebook crash when I try to load the…