Questions tagged [data-analysis]

Data Analysis involves extracting meaning and insights from raw data. It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions.

Data Analysis involves extracting meaning and insights from raw data.

It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions and insights.

Typically, data analysis involves a series of steps. Starting with measuring some parameters of interest, collecting the data, cleaning it, storing it in meaningful ways, then summarizing and examining it, and also testing various hyoptheses about the data.

More information can be found the Wikipedia's Data Analysis page.

4642 questions
1
vote
1 answer

TfIDf Vectorizer weights

Hi I have a lemmatized text in the format as shown by lemma. I want to get TfIdf score for each word this is the function that I wrote: import numpy as np import pandas as pd from sklearn.feature_extraction.text import…
MaryTJ
  • 19
  • 1
  • 5
1
vote
0 answers

Why good local validation gives bad score on Kaggle Competition?

This might be a general question. I was trying to build a predictive model in a Kaggle competition. I used some of the traditional methods like Xgboost Lightgbm and Random Forest. I tried to split the train data to train and validation into…
1
vote
1 answer

How do I pivot my dataframe multiple times in pandas while creating a new column merging multiple columns?

I find this a rather complex challenge, as I need to merge multiple columns in a dataframe together and then pivot the table multiple times (I think). So the provided input is this dataframe which I have filtered out: name …
oldselflearner1959
  • 633
  • 1
  • 5
  • 22
1
vote
1 answer

TypeError: data type "category" not understood

In solving some problem when i am trying to use dtype='category' then i am getting error . I had read previous answers and applied but none of answer is solving problem because they all are very old My code is , train =…
Shubham Sharma
  • 2,763
  • 5
  • 31
  • 46
1
vote
2 answers

How to find delay between two sets of data in Matlab?

I have two sets of data taken from experiments, and they look very similar, except there is a horizontal offset between them, which I believe is due to some bugs in the instrument setting. Suppose they have the form y1=f(x1) and y2=f(x2)= f(x1+c),…
Physicist
  • 2,848
  • 8
  • 33
  • 62
1
vote
1 answer

Feature extraction from data stored in PostgreSQL database

I have some data stored in PostgreSQL database, which contains fields like cost, start date, end date, country, etc. Please take a look at the data here. Now what I want to do is extract some of the important features/fields from this data and store…
1
vote
0 answers

matching values between two dataframes with a condition in pandas

I have two dataframes, df1, Values 0 Sri 1 pyd 2 NaN 3 sri, is 4 keyboard 5 kumar,cricketer df2, Values | Names Sri | Sri is a good player NaN | NaN sri, is | Sri is a good…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
0 answers

Aggregate all columns with data.table using 2 fixed columns

I have a custom function I would like to apply to a data table such as follows: DT = data.table(x = rep(c("a","b","c"), each = 2), x2 = rep(c("h","j"), each = 3), y = c(1,3), v = 1:6, …
Aus_10
  • 670
  • 7
  • 15
1
vote
1 answer

What is the proper way of combining time series data with metadata in pandas?

I have two csv files: customer.csv: id name birthday 1 Martin 28.04.1990 2 Twain 30.11.1835 .... and purchases.csv: purchase_id customer_id item price 1 1 About the ugly…
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
1
vote
2 answers

matching rows between dataframes in pandas in python

I have two dataframes, df1, Names one two three Sri is a good player Ravi is a mentor Kumar is a cricketer df2, values sri NaN sri, is kumar,cricketer I am trying to get the row in df1 which contains the all the items in df2 My expected…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

Python/Pandas - check if column's values is the same by another column values

I have DataFrame like this: product_id dt products_qty stock_qty 0 8225 2017-10-16 12.000 13.000 1 8280 2017-10-16 0.000 11.000 2 8225 2017-10-17 0.000 41.000 3 …
user8407067
1
vote
0 answers

How to connect the Power BI Cloud to PostgreSql DB(on Azure)

I have my ERP database - PostgreSql on my Azure VM. I want to connect the Power BI Cloud to pull the dash board and reports. Is that possible right now? I have checked the Cloud Power BI and could not found any solution to PostgreSql. Is there any…
Sajeev
  • 783
  • 3
  • 14
  • 46
1
vote
2 answers

value matching between two DataFrames using pandas in python

Hi I have two DataFrames like below DF1 Alpha | Numeric | Special and | 1 | @ or | 2 | # lol ok | 4 | & DF2 with single column Content boy or girl school @ morn pyc LoL ok…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

how to apply set and ignorecase in a single datacolumn in pandas

I have a df, Keys one, ONE ram, Ram kumar Raj,rAj cricket level,LeVel kum,num first I want to apply set and ignore case on df["Keys"], make it a single value and achieve df Name one ram kumar raj cricket level kum,num 2nd…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

how to get for multiple columns

I have a data frame like this: Id row Date BuyTime SellPrice App 1 1 2017-10-30 94520 0 9:00:00 1 2 2017-10-30 94538 0 9:00:00 1 3 2017-10-30 94609 0 …
ary
  • 151
  • 1
  • 2
  • 14