Questions tagged [data-wrangling]

1242 questions
0
votes
1 answer

Transforming a variable using the if- else if function

I have a data set that is that I want to calculate z scores by their year. Example: Year Score 1999 120 1999 132 1998 120 1997 132 2000 120 2002 132 1998 160 1997 142 ....etc What I want is: Year …
JeffB
  • 139
  • 1
  • 10
0
votes
2 answers

Creating a new dataframes based on two conditions from two other dataframes

I'm fairly new to coding languages and have been asked to create a new dataframe based on two existing dataframes. Dataframe 1 is the original and dataframe 2 is a subset of the original. The new data frame needs to be a copy of the original with…
Victoria
  • 1
  • 1
0
votes
1 answer

How to add every 2 characters in a pandas series df

I have a pandas df that looks like this: I'm trying to split up the Gun_Time and Net_Time columns by adding a ':' after every 2 numbers. I've tried some regex with a simple function but have been unable to come up with the correct solution to…
Dasax121
  • 23
  • 8
0
votes
1 answer

Adding Column into data frame

I want to add a column into my dataframe that identifies which group my data is from (in this case, year). What I have is: Var1 Var2 Var3 X X X X X X X X X What I want is: Var1 Var2 Var3 Year X X X …
JeffB
  • 139
  • 1
  • 10
0
votes
1 answer

Updating a File in R by adding a column/vector

Is there any way that I can update an existing .csv file by adding a column/vector that I have scraped from the web. I have a webscraper that pulls COVID-19 data and I am trying to create a file that has positive cases in columns and each column is…
0
votes
2 answers

How to Iterate & Count Each Categorical Value Based on Some Condition

I was working on a dataset & I want to iterate through each value to find the count of job & marital status based on the deposit Example:
0
votes
1 answer

I want to rearrange my data in R Studio (sorting and cbind)

If I have a data frame as follows: R/C DD CC1 CC2 RR1 a 36 37 RR1 b 21 22 RR1 c 24 25 RR1 d 196 198 RR2 e 37 38 RR2 f 17 17 RR2 g 16 16 RR2 h 48 51 RR3 i 89 90 RR3 j 79 80 RR3 k 26 26 RR3 h 48 51…
0
votes
0 answers

How to reset the row index of dataset

I'm a newbie to the whole Data thing... And I'm trying to clean my data and get them down to a form with which they can work. df = pd.read_excel('https://query.data.world/s/s3t37yqxxeoabyocyh6g33fojskwvq') df.head() What should I do to re-store the…
gyungrokna
  • 25
  • 4
0
votes
3 answers

How to remove an entire row if it contains zero in a specific range of columns in R?

If my data frame is: A1 A2 B1 B2 C1 C2 row1 67 8 0 99 67 84 row2 8 22 25 5 72 0 row3 0 83 35 68 17 13 row4 69 37 52 93 67 78 row5 68 64 68 90 61 38 row6 16 30 2 19 40 1 row7 …
0
votes
2 answers

Stack/Melt Sets of Columns

I'm trying to take a single table with 10 columns and merge/union/stack into 2 columns. The current layout is like this ID_1 | Name_1 | ID_2 | Name_2 | ID_3 | Name_3 I'm trying to get this into the format with column headers "ID" and "Name" ID |…
0
votes
3 answers

R Rearrange columns in dataframe based on date values in column names

I've got a dataframe that has monthly survey scores for a certain hospitals. Each month, we store the score obtained by the hospital (_Score column) and the corresponding average score for all hospitals for that month (_Average column). Here's a…
Varun
  • 1,211
  • 1
  • 14
  • 31
0
votes
2 answers

Create a random binary variable for a subset of observations assigning 1 to a specific proportion of rows

I have a dataframe... df <- tibble( id = 1:10, family = c("a","a","b","b","c", "d", "e", "f", "g", "h") ) Families will only contain 2 members at most (so they're either individuals or pairs). For individuals (families with only one row,…
Tom
  • 279
  • 1
  • 12
0
votes
1 answer

Creating A Bar Chart using ggplot with a Manual dataframe one row 5 columns

This is my data frame. I could not find a way to construct a bar plot using ggplot with the column names and x as the values indicated.
0
votes
1 answer

Problem with pipe within purrr:map2 and mutate

nested_numeric <- model_table %>% group_by(ano_fiscal) %>% select(-c("ano_estudo", "payout", "div_ratio","ebitda", "name.company", "alavancagem","div_pl", "div_liq", "div_total")) %>% nest() nested_numeric # A tibble: 7 x 2 #…
0
votes
1 answer

dataframe using list vs dictionary

import pandas as pd pincodes = [800678,800456] numbers = [2567890, 256757] labels = ['R','M'] first = pd.DataFrame({'Number':numbers, 'Pincode':pincodes}, index=labels) print(first) The above code gives me the following…