The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.
Questions tagged [data-munging]
236 questions
0
votes
1 answer
Combining different column values in data.table from a table to form row values of another table
I have the following table DT-1. The columns represents different states
id col1 col2 col3 col4 col5 col6
qw-1 ABC XYZ QRT RWQ OIP KIJ
qw-2 WET ERT YUP TIP IUR ETY
qw-3 QRT ERT RWQ YUP 0 0
qw-4 XYZ …

amjear
- 75
- 1
- 1
- 4
0
votes
1 answer
rounding of numbers using R
I want to convert the following numbers in this way I tried to use all possible methods but i am unable to get the value which i expected
value round off value
0.0 - 4.9 0
5.0 - 5.9 6
6.0 - 6.9 7
7.0 - 7.9 …

Dr.jeon
- 15
- 4
0
votes
1 answer
Converting Summary Table of Binary Outcome to Long Tidy DataFrame
I want to convert a table that has several categorical variables, as well as summary of the result of a binary experiment to long format to easily run a logistic regression model.
Is there an easy way to do this that does not involve just making a…

user3658457
- 275
- 3
- 11
0
votes
1 answer
Convert formatted hours to rounded decimal hours
What I want is this:
These numbers are extracted from a script that I run on Selenium IDE. The problem is that those numbers come as text.
What I need is to convert, for example, "56h 29m" to "56.5" ... "27.75h" to "27.75".
Some of them are easy…

ryoishikawa74
- 177
- 3
- 11
0
votes
1 answer
Calling Perl code from a WPF application
I want some sample code to learn how to call a perl module which processes input from a xml file in a Winform/WPF application and returns a transformed XML(basically I use Perl's data munging features) file in the directory or returns an error if…

iceman
- 4,211
- 13
- 65
- 92
0
votes
1 answer
Use a dataframes column to select rows from another frame in the same order
Need some pandas jump start here:
Consider two data frames A and B. Both contain a column id with identifier values:
A: id valA
8 ?
2 ?
4 ?
B: id valB valC
1 ? ?
4 ? ?
…

clstaudt
- 21,436
- 45
- 156
- 239
0
votes
1 answer
Building Sentences from a dataframe in R
Im trying to generate sentences from a dataframe
Below is the dataframe
# Code
mycode <- c("AAABBB", "AAABBB", "AAACCC", "AAABBD")
mycode <- sample(mycode, 20, replace = TRUE)
# Date
mydate…

John Smith
- 2,448
- 7
- 54
- 78
0
votes
1 answer
Split text using any suggested method
I have a plain text like this:
Cart ID: A3N42M / Copy: A3N42P PO: 5000021337 Invoice: 3110021337
Cart ID: A3N3ZW / Copy: A3N3ZX/ PO: 5000021335 Invoice: 3110021335
Cart ID: A3N3ZL / Copy: A3N3ZM PO: 5000021336 Invoice: 3110021336
Original: A3N444…

ryoishikawa74
- 177
- 3
- 11
0
votes
0 answers
SAS dataset count winning streak on baseball
Hi: I am processing baseball dataset. I want to count the winning streak of a team winning. I created a variable called win, if team A wins it is 1, else it is 0. I want to create a variable called winstreak, if team A wins 1 time, it is 1, if team…

Richard Li
- 21
- 1
- 9
0
votes
0 answers
Execute data munging steps on each components of a list in parallel
I have a list with two data.table objects in it. To give an idea, one table got 400,000 rows & 7 variables, other got 750,000 rows & 12 variables. Those two tables don't have same columns. I do a lot of munging (different steps for each) on them.…

JeanVuda
- 1,738
- 14
- 29
0
votes
3 answers
Data munging and data import scripting
I need to write some scripts to carry out some tasks on my server (running Ubuntu server 8.04 TLS). The tasks are to be run periodically, so I will be running the scripts as cron jobs.
I have divided the tasks into "group A" and "group B" - because…

morpheous
- 16,270
- 32
- 89
- 120
-1
votes
2 answers
Create new column based on other columns from a different dataframe
I have 2 dataframes:
df1
Time Apples Pears Grapes Peachs
10:00 3 5 5 2
11:00 1 0 2 9
12:00 20 2 7 3
df2
Class Item Factor
A Apples 3
A Peaches 2
A …

star_it8293
- 399
- 3
- 12
-1
votes
2 answers
Data manipulation in pandas on monthly, quarterly and annual level on multiple columns
I need to create a function which take an input as dictionary and update column values in the dataframe. My data looks…

Amit Kumar
- 154
- 12
-1
votes
1 answer
pandas group many columns to one column where every cell is a list of values
I have the dataframe
df =
c1 c2 c3 c4 c5
1. 2. 3. 1. 5
8. 2. 1. 3. 8
4. 9. 1 2. 3
And I want to group all columns to a single list that will be the only columns, so I will get:
df =
l
[1,2,3,1,5]
[8,2,1,3,8]
[4,9,1,2,3]
(Shape of df was…

Cranjis
- 1,590
- 8
- 31
- 64
-1
votes
1 answer
Pandas how to explode several items of list for each new row
I have a dataframe:
c1. c2. c3. l
1. 2. 3 [1,2,3,4,5,6,7]
3. 4. 8. [8,9,0]
I want explode it such that every 3 elements from each list in the column l will be a new row, and the column for the triplet index within the original list.…

Cranjis
- 1,590
- 8
- 31
- 64