The process of taking raw data and parsing, filtering, extracting, organizing, combining, cleaning or otherwise converting it into a consistent useable form for further processing or input to an algorithm or system.
Questions tagged [data-munging]
236 questions
0
votes
0 answers
Data Munging Challenge. How do I join the correct coefficients to the correct observation in a summarized table
Before I start, the a basic answer to this question can be found here:
Correctly binding coefficients to summarized table
This question is different in the fact that I need to correctly join the correct coefficients to the correct position in the…

Jordan
- 1,415
- 3
- 18
- 44
0
votes
1 answer
Correctly binding coefficients to summarized table
I have a glm model and a summarized dataset that requires I bind the coefficients, standard error and p.value from the summary of the model to the summarized dataset. For an example, I used the mtcars data set. I added columns to the final unioned…

Jordan
- 1,415
- 3
- 18
- 44
0
votes
1 answer
data cleaning - conversion to tidyverse
I am curious if the following code could be converted to tidyverse code. I've tried dplyr::mutate and haven't been able to get it to work quite right.
df$Gender[df$Gender == "M"] <- "Man"
df$Gender[df$Gender == "Male"] <- "Man"
df$Gender[df$Gender…

AB Cross
- 13
- 1
0
votes
1 answer
Replacing elements in a dataframe not contained in a vector
Simple problem, but I couldn't find a solution: How to replace all elements in a dataframe not contained in a vector with a specific string?
My dataframe looks like this:
ID <- sample(1:8)
Country <- c("USA", "RUS", "Unknown", "Not specified",…

guillem
- 23
- 3
0
votes
1 answer
MySQL add period to name initials if no period exists
I have a dataset like this:
Juan Corona
Jane L Doe
John Q. Public
R S Fitzgerald
I need to clean this up so it's:
Juan Corona
Jane L. Doe
John Q. Public
R. S. Fitzgerald
But since MySQL doesn't support regex search and replace I feel like I'm in a…

Slam
- 3,125
- 1
- 15
- 24
0
votes
1 answer
Getting the counts of the number of months a user has been using a particular service in SQL
I have data like below:
user_id month_id service
895 201612 S
262 201612 V
5300 201612 BB
Now there can be users who have used more than one service in a year, and I would like to have a query which gives me that. For…

Shuvayan Das
- 1,198
- 3
- 20
- 40
0
votes
1 answer
Regular Expressions not working with Pandas Dataframe
I've got a Pandas Dataframe that's made up of emails that I'm needing to clean using regex. However, my attempts to clean the column, aren't actually being applied to the text.
Example data is below:
|subeject | description …

James C
- 23
- 3
0
votes
1 answer
Elegently reposition values within a dataframe
I'm working with the text layer of a PDF and have some minor corrections to make...
The tidy dataframe I've generated has one or two data values that are off by a row. I have the 'coordinates' of the incorrectly positioned values (defined by a…

joga
- 207
- 2
- 4
- 10
0
votes
3 answers
split vector or data.frame into intervals by condition and print interval's first and last value
I have data.frame which looks like this:
v1 <- c(1:10)
v2 <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)
dfb <- data.frame(v1, v2)
> dfb
v1 v2
1 1 FALSE
2 2 FALSE
3 3 TRUE
4 4 FALSE
5 5 FALSE
6 6 FALSE
7…

Wakan Tanka
- 7,542
- 16
- 69
- 122
0
votes
1 answer
One to Many merge row level
I have run into a data problem that I sense many have encountered. I Currently have a data set which contains transaction information. Based on the Transaction_Number I will find how long each person involved in transaction spent on their part.
The…

CodeNoob
- 9
- 4
0
votes
0 answers
Apply factor column values to new columns in R
Did some extensive searching but couldn't find a solution. I have a dataframe that looks like this:
FAC | NUM | VAL
A | 1 | 100
A | 2 | 200
B | 1 | 300
B | 2 | 200
And I want it to look like this:
NUM | A | B
1 | 100 | 300
2 …

Stu Richards
- 141
- 1
- 11
0
votes
2 answers
Regex help for find and replace
I have a file in notepad with this text as example:
*Given* I get an user ID from XXX
*And* I set header "Authorization" with value "invalid_token"
*When* I send a POST request to api/endpoint/"documentlibraryID"/"identity_id"/root/"new_name"
*Then*…

ryoishikawa74
- 177
- 3
- 11
0
votes
3 answers
Data munging with python: transforming string into rows
I'm fairly new to python, and I need to perform some data munging. I want some advice on the best practice for this: libraries, modules, better code to implementment, or just direction.
So I have text file with data organised in the following…

tyrfingnir
- 11
- 3
0
votes
3 answers
Retrieve a flattened array of values taken from an object array that exists within a top-level object of which I am given an array
Please feel free to modify title, it was rather hard for me to explain and thus search.
var booking = [
{
x: "1",
y: "2",
days: [
{
hours: 8
},
]
},
{...}
]
var hoursBooked = [8, 2, 4, 8, 2, 8, 3,…

aspirant_sensei
- 1,568
- 1
- 16
- 36
0
votes
0 answers
Tika messed up the structure of my document, how to fix it?
After extract the textual content from some PDF files I noted that tika misaligned my document's text, for example my original PDF doc looks like this:
Animal name: Cat
Food stock: …

tumbleweed
- 4,624
- 12
- 50
- 81