Questions tagged [data-wrangling]
1242 questions
0
votes
3 answers
Pandas DataFrame, divide a column having multiple values into multiple columns and remove null values
I have a dataframe, whose one particular column has temperature values like shown below
'35-40',
'35-40',
'40-45',
'40-45',
'45-50',
'40-45',
'40-45',
nan,
'40-45',
nan,
'40-45',
'40-45',
'35-40',
I am trying to create a new column…

Moaz Mohammed Husain
- 97
- 1
- 1
- 7
0
votes
2 answers
Transforming a categorical column into columns in python
I am trying to calculate the time period in seconds that cars were not available. I have the following table:
╔═════════════════════╦═══════════╦══════╦═════════════╗
║ statusDateTime ║ shift ║ car ║ isAvaliable ║
║ 2019-04-02 02:58:39 ║…

Shokan
- 13
- 4
0
votes
1 answer
approximate character matching using R
I have two datafiles. One of the files contains only one column with the name of the company (usually a hospital) and the other one contains a list of companies with the respective adresses. The problem is that the company names do not exactly…

Nneka
- 1,764
- 2
- 15
- 39
0
votes
1 answer
Azure Data Factory: Dataset Dynamic DB Table name not resolving in Data Wrangling Flow
I created a DataSet which points to a table in my database. The name of the table is set as dynamic content: @concat(dataset().db_prefix, '_Baseline_CIs'). This works when checking in the Dataset through 'Preview Data'. The table contents are…

Denis Schlesinger
- 65
- 1
- 1
- 4
0
votes
1 answer
Splitting one column into two columns using data wrangling with R
I would really appreciate your help in using R for data wrangling. I have a data where I want to split one column (variable) into two whenever applicable as conditioned by other variables. For example, as per the sample below, the data represents…

azizi tamimi
- 53
- 1
- 8
0
votes
2 answers
Conditional if statement based on row values in r
I am new to R and I would really appreciate your assistance in this.
I have a dataframe,with 2 levels being 'Y' AND 'N' indicators on 11 variables.
I would like to have a new column, which concatenates column names when row value equals to 'Y'
i.e.

Siyabonga Mbonambi
- 23
- 5
0
votes
1 answer
How to map the mean imputation results from a training set to a test set?
I have a vector:
mean_imputed_values_trainining_set <- c(0.5247570, 0.4077914,0.1393320,0.8233340, 0.3610365,0.1805526, 0.2375011, 9.8848462 )
I tried creating a custom function, where the results from a vector would impute NA values. First…

Loncar
- 125
- 8
0
votes
1 answer
Calculating number of observations per group in R
I would like to calculate column D based on the date column A. Column D should represent the number of observations grouped by column B.
Edit: fake data below
data <- structure(list(date = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 9L,
10L, 11L,…

user6883405
- 393
- 3
- 14
-1
votes
1 answer
Stacked bar graph with combined subgroups
I have a data like this:
df <- data.frame(
groups = c(rep("A", 5),
rep("B", 3),
rep("C", 2),
rep(c("D","E","F","G","H"), 1)),
subgroups = paste0("Subgroup", 1:15),
length = c(103,112,141,152,50,
…

cookiemonster
- 109
- 7
-1
votes
1 answer
dplyr fill columns based on rows
I have a data that looks like this (with more rows and more columns, but summarising here):
class section
a NA
b s1
c NA
d NA
a NA
b s2
c NA
d NA
a NA
b s3
c NA
d NA
Class a always comes before b, and c/d always comes after b. The data works as…

cookiemonster
- 109
- 7
-1
votes
1 answer
Referencing multiple columns and rows to calculate new value in a new column
Here is my data.frame {sf}. I converted this to longform, so the UID represents the polygon. In each UID there is anywhere from 0 to 3 species present. Sum percent for each UID will be 100 or…

Crippycajes
- 13
- 2
-1
votes
1 answer
How to manipulate a dataframe in Python?
I have the following sample dataframe:
I need to transform it to look like :
Pls note that:
In the new column, the last 2 columns in original data frame have been replaced by the district variables of second last column of original df
The…

user3087182
- 63
- 1
- 1
- 8
-1
votes
2 answers
R: How do I paste a value from a colum in a df based on a match in two different columns?
I have a data frame where I have NAs in one column (B). My goal is to fill in these NAs with the corresponding value of column E where column D has the same value as column C. I want to do that within each ID tier.
my data frame looks like this:
…

luise
- 3
- 3
-1
votes
1 answer
How to turn two column dataframe into a word cooccurrence matrix?
I have a R dataframe that consists of two columns, id and text, and I want to turn it into a cooccurrence matrix of word pairs that appear together in the same id's list of words.
So, this dataframe:
df <- data.frame(id = c(1, 1, 1, 2, 2, 2), text =…

nlplearner
- 115
- 1
- 10
-1
votes
1 answer
How to refactor hard coded mapping of population for different ages to automated function?
I have two datasets df_population_by_age (has estimated population proportion by sex and age) and df_population_bracket (has actual population per age group). The idea is to use the estimated proportions from df_population_age, to calculate the…

Mazil_tov998
- 396
- 1
- 13