Questions tagged [data-wrangling]
1242 questions
-1
votes
1 answer
Python build dict from a mixture of dict keys and list values
Input body is:
{'columns': ['site_ref', 'site_name', 'region'], 'data': [['R005000003192', 'AIRTH DSR NS896876', 'WEST'], ['R005000003195', 'AIRTHREY DSR NS814971', 'WEST']]}
How could I build a new dict that will take the column values as keys and…

RAH
- 395
- 2
- 9
-1
votes
1 answer
resampling dataframe/survey troubleshoot
I am having trouble resampling my dataframe. Need some help?
I have a household survey in country X. Country X is divided into 3000 counties of different population sizes. The % of sampled households varied by county size. Smaller counties were…

YouLocalRUser
- 309
- 1
- 9
-1
votes
2 answers
Dataframe to a dictionary as some columns in a list as keys and one as value
I have a pandas dataframe df that looks like this:
col1 col2 col3
A X 1
B Y 2
C Z 3
I want to convert this into a dictionary with col1 and col2 in a list as key and col3 as value. So, the output would look…
Scratch
-1
votes
1 answer
Assign a "value" to a particular observation in R
I have frequency counts that line up with a set number of states of the world
Data=
S <- c("a","b","c","d","e")
n <- c(1,2,3,4,5)
df<- data.frame(S,n)
I want to create some values that line up with the n values for each, named with the relevant…

Gilrob
- 93
- 7
-1
votes
2 answers
Create a list of namedtuples from a dataframe
I have a dataframe like this:
df1
Name Category Age
Harry A 11
James B 23
Will A 19
I want to create a list of tuples using namedtuple from collections. The list should be like this:
output_list =…

star_it8293
- 399
- 3
- 12
-1
votes
2 answers
Create new column based on other columns from a different dataframe
I have 2 dataframes:
df1
Time Apples Pears Grapes Peachs
10:00 3 5 5 2
11:00 1 0 2 9
12:00 20 2 7 3
df2
Class Item Factor
A Apples 3
A Peaches 2
A …

star_it8293
- 399
- 3
- 12
-1
votes
1 answer
R Concatenate Columns from Excel file based on sheet name and Column's name
Hello Guys I have an excel file that has multiple sheetnames and these sheet names dont always have the same structure I wanna be able to read the excel file, read only some specifics sheets, select some specific columns and then create a…

R_Student
- 624
- 2
- 14
-1
votes
3 answers
R Aggregate data frame based on column values
I have a data set that looks like this:
> newex
Name Volume Period
1 oil 29000 Jun 21
2 gold 800 Mar 22
3 oil 21000 Jul 21
4 gold 1100 Sep 21
5 gold 3000 Feb 21
6 depower 3 Q1 21
7 oil…

Saïd Maanan
- 511
- 4
- 14
-1
votes
1 answer
Pandas deleting partly duplicate rows with wrong values in specific columns
I have a large dataframe from a csv file which has a few dozen columns. I have another csv file which I concatenated to the original. Now, the second file has exactly the same structure but a particular column may have incorrect values. I want to…

darzan
- 17
- 4
-1
votes
1 answer
Add column in first data frame based upon two columns in second data frame
I am trying to add a column to a first data frame based upon a second data frame.
Basically, in the data frame 1, I have values, that are existing in data frame 2 but with additional information that I would like to extract into data frame 1.
Down…

U_jex
- 83
- 6
-1
votes
1 answer
Compare column name with a row value and getting other row value
I have a dataframe like this:
request_created_at sponsor_tier is_active status cash_in 2019/10 ... 2021/07
0 2019/10 2019/10 2.0 True 1 8901.00 ... …

h1tom1
- 1
- 2
-1
votes
2 answers
Can anyone explain me what actually the value inside third brackets / [2] after str.split("|", expand=True) means?
df1["state"] = df1["place_with_parent_names"].str.split("|",expand=True)[2]
what [2] actually indicate of a string split method.

makt
- 89
- 2
- 15
-1
votes
2 answers
How to unite multiple columns (character data) without concatenating?
Within my data I have a subset of data that look like this:
Incident | Year | Person1 |Person2|
:---- |:---: |:------: | -----:|
1| 2014 | A | B |
2| 2014 | A | |
3| 2016 | B | C |
…

burphound
- 161
- 7
-1
votes
1 answer
For every unit increase in one column value , another column entries increase
I have a simulation dataset with 500 replicates - each replicate contains 300 ids. When rep = 1, id ranges from 1-300; when rep = 2, id again ranges from 1-300 and so on.
I want to get the following: when rep = 1: id 1-300; when rep = 2: id 301-600…

andy dey
- 1
-1
votes
1 answer
Data wrangling in Python, calculate value from some conditions
I have a dataframe in Python below:
import pandas as pd
df = pd.DataFrame({
'CRDACCT_DLQ_CYC_1_MNTH_AGO' : [3, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
…

Anwar San
- 93
- 10