Questions tagged [data-munging]

The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.

236 questions
0
votes
1 answer

Combining different column values in data.table from a table to form row values of another table

I have the following table DT-1. The columns represents different states id col1 col2 col3 col4 col5 col6 qw-1 ABC XYZ QRT RWQ OIP KIJ qw-2 WET ERT YUP TIP IUR ETY qw-3 QRT ERT RWQ YUP 0 0 qw-4 XYZ …
amjear
  • 75
  • 1
  • 1
  • 4
0
votes
1 answer

rounding of numbers using R

I want to convert the following numbers in this way I tried to use all possible methods but i am unable to get the value which i expected value round off value 0.0 - 4.9 0 5.0 - 5.9 6 6.0 - 6.9 7 7.0 - 7.9 …
Dr.jeon
  • 15
  • 4
0
votes
1 answer

Converting Summary Table of Binary Outcome to Long Tidy DataFrame

I want to convert a table that has several categorical variables, as well as summary of the result of a binary experiment to long format to easily run a logistic regression model. Is there an easy way to do this that does not involve just making a…
user3658457
  • 275
  • 3
  • 11
0
votes
1 answer

Convert formatted hours to rounded decimal hours

What I want is this: These numbers are extracted from a script that I run on Selenium IDE. The problem is that those numbers come as text. What I need is to convert, for example, "56h 29m" to "56.5" ... "27.75h" to "27.75". Some of them are easy…
ryoishikawa74
  • 177
  • 3
  • 11
0
votes
1 answer

Calling Perl code from a WPF application

I want some sample code to learn how to call a perl module which processes input from a xml file in a Winform/WPF application and returns a transformed XML(basically I use Perl's data munging features) file in the directory or returns an error if…
iceman
  • 4,211
  • 13
  • 65
  • 92
0
votes
1 answer

Use a dataframes column to select rows from another frame in the same order

Need some pandas jump start here: Consider two data frames A and B. Both contain a column id with identifier values: A: id valA 8 ? 2 ? 4 ? B: id valB valC 1 ? ? 4 ? ? …
clstaudt
  • 21,436
  • 45
  • 156
  • 239
0
votes
1 answer

Building Sentences from a dataframe in R

Im trying to generate sentences from a dataframe Below is the dataframe # Code mycode <- c("AAABBB", "AAABBB", "AAACCC", "AAABBD") mycode <- sample(mycode, 20, replace = TRUE) # Date mydate…
John Smith
  • 2,448
  • 7
  • 54
  • 78
0
votes
1 answer

Split text using any suggested method

I have a plain text like this: Cart ID: A3N42M / Copy: A3N42P PO: 5000021337 Invoice: 3110021337 Cart ID: A3N3ZW / Copy: A3N3ZX/ PO: 5000021335 Invoice: 3110021335 Cart ID: A3N3ZL / Copy: A3N3ZM PO: 5000021336 Invoice: 3110021336 Original: A3N444…
ryoishikawa74
  • 177
  • 3
  • 11
0
votes
0 answers

SAS dataset count winning streak on baseball

Hi: I am processing baseball dataset. I want to count the winning streak of a team winning. I created a variable called win, if team A wins it is 1, else it is 0. I want to create a variable called winstreak, if team A wins 1 time, it is 1, if team…
Richard Li
  • 21
  • 1
  • 9
0
votes
0 answers

Execute data munging steps on each components of a list in parallel

I have a list with two data.table objects in it. To give an idea, one table got 400,000 rows & 7 variables, other got 750,000 rows & 12 variables. Those two tables don't have same columns. I do a lot of munging (different steps for each) on them.…
0
votes
3 answers

Data munging and data import scripting

I need to write some scripts to carry out some tasks on my server (running Ubuntu server 8.04 TLS). The tasks are to be run periodically, so I will be running the scripts as cron jobs. I have divided the tasks into "group A" and "group B" - because…
morpheous
  • 16,270
  • 32
  • 89
  • 120
-1
votes
2 answers

Create new column based on other columns from a different dataframe

I have 2 dataframes: df1 Time Apples Pears Grapes Peachs 10:00 3 5 5 2 11:00 1 0 2 9 12:00 20 2 7 3 df2 Class Item Factor A Apples 3 A Peaches 2 A …
star_it8293
  • 399
  • 3
  • 12
-1
votes
2 answers

Data manipulation in pandas on monthly, quarterly and annual level on multiple columns

I need to create a function which take an input as dictionary and update column values in the dataframe. My data looks…
-1
votes
1 answer

pandas group many columns to one column where every cell is a list of values

I have the dataframe df = c1 c2 c3 c4 c5 1. 2. 3. 1. 5 8. 2. 1. 3. 8 4. 9. 1 2. 3 And I want to group all columns to a single list that will be the only columns, so I will get: df = l [1,2,3,1,5] [8,2,1,3,8] [4,9,1,2,3] (Shape of df was…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
-1
votes
1 answer

Pandas how to explode several items of list for each new row

I have a dataframe: c1. c2. c3. l 1. 2. 3 [1,2,3,4,5,6,7] 3. 4. 8. [8,9,0] I want explode it such that every 3 elements from each list in the column l will be a new row, and the column for the triplet index within the original list.…
Cranjis
  • 1,590
  • 8
  • 31
  • 64
1 2 3
15
16