Questions tagged [data-wrangling]

1242 questions
0
votes
3 answers

Pandas DataFrame, divide a column having multiple values into multiple columns and remove null values

I have a dataframe, whose one particular column has temperature values like shown below '35-40', '35-40', '40-45', '40-45', '45-50', '40-45', '40-45', nan, '40-45', nan, '40-45', '40-45', '35-40', I am trying to create a new column…
0
votes
2 answers

Transforming a categorical column into columns in python

I am trying to calculate the time period in seconds that cars were not available. I have the following table: ╔═════════════════════╦═══════════╦══════╦═════════════╗ ║ statusDateTime ║ shift ║ car ║ isAvaliable ║ ║ 2019-04-02 02:58:39 ║…
Shokan
  • 13
  • 4
0
votes
1 answer

approximate character matching using R

I have two datafiles. One of the files contains only one column with the name of the company (usually a hospital) and the other one contains a list of companies with the respective adresses. The problem is that the company names do not exactly…
Nneka
  • 1,764
  • 2
  • 15
  • 39
0
votes
1 answer

Azure Data Factory: Dataset Dynamic DB Table name not resolving in Data Wrangling Flow

I created a DataSet which points to a table in my database. The name of the table is set as dynamic content: @concat(dataset().db_prefix, '_Baseline_CIs'). This works when checking in the Dataset through 'Preview Data'. The table contents are…
0
votes
1 answer

Splitting one column into two columns using data wrangling with R

I would really appreciate your help in using R for data wrangling. I have a data where I want to split one column (variable) into two whenever applicable as conditioned by other variables. For example, as per the sample below, the data represents…
azizi tamimi
  • 53
  • 1
  • 8
0
votes
2 answers

Conditional if statement based on row values in r

I am new to R and I would really appreciate your assistance in this. I have a dataframe,with 2 levels being 'Y' AND 'N' indicators on 11 variables. I would like to have a new column, which concatenates column names when row value equals to 'Y' i.e.
0
votes
1 answer

How to map the mean imputation results from a training set to a test set?

I have a vector: mean_imputed_values_trainining_set <- c(0.5247570, 0.4077914,0.1393320,0.8233340, 0.3610365,0.1805526, 0.2375011, 9.8848462 ) I tried creating a custom function, where the results from a vector would impute NA values. First…
Loncar
  • 125
  • 8
0
votes
1 answer

Calculating number of observations per group in R

I would like to calculate column D based on the date column A. Column D should represent the number of observations grouped by column B. Edit: fake data below data <- structure(list(date = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 9L, 10L, 11L,…
user6883405
  • 393
  • 3
  • 14
-1
votes
1 answer

Stacked bar graph with combined subgroups

I have a data like this: df <- data.frame( groups = c(rep("A", 5), rep("B", 3), rep("C", 2), rep(c("D","E","F","G","H"), 1)), subgroups = paste0("Subgroup", 1:15), length = c(103,112,141,152,50, …
-1
votes
1 answer

dplyr fill columns based on rows

I have a data that looks like this (with more rows and more columns, but summarising here): class section a NA b s1 c NA d NA a NA b s2 c NA d NA a NA b s3 c NA d NA Class a always comes before b, and c/d always comes after b. The data works as…
-1
votes
1 answer

Referencing multiple columns and rows to calculate new value in a new column

Here is my data.frame {sf}. I converted this to longform, so the UID represents the polygon. In each UID there is anywhere from 0 to 3 species present. Sum percent for each UID will be 100 or…
-1
votes
1 answer

How to manipulate a dataframe in Python?

I have the following sample dataframe: I need to transform it to look like : Pls note that: In the new column, the last 2 columns in original data frame have been replaced by the district variables of second last column of original df The…
user3087182
  • 63
  • 1
  • 1
  • 8
-1
votes
2 answers

R: How do I paste a value from a colum in a df based on a match in two different columns?

I have a data frame where I have NAs in one column (B). My goal is to fill in these NAs with the corresponding value of column E where column D has the same value as column C. I want to do that within each ID tier. my data frame looks like this: …
luise
  • 3
  • 3
-1
votes
1 answer

How to turn two column dataframe into a word cooccurrence matrix?

I have a R dataframe that consists of two columns, id and text, and I want to turn it into a cooccurrence matrix of word pairs that appear together in the same id's list of words. So, this dataframe: df <- data.frame(id = c(1, 1, 1, 2, 2, 2), text =…
nlplearner
  • 115
  • 1
  • 10
-1
votes
1 answer

How to refactor hard coded mapping of population for different ages to automated function?

I have two datasets df_population_by_age (has estimated population proportion by sex and age) and df_population_bracket (has actual population per age group). The idea is to use the estimated proportions from df_population_age, to calculate the…
Mazil_tov998
  • 396
  • 1
  • 13