Questions tagged [data-analysis]

Data Analysis involves extracting meaning and insights from raw data. It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions.

Data Analysis involves extracting meaning and insights from raw data.

It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions and insights.

Typically, data analysis involves a series of steps. Starting with measuring some parameters of interest, collecting the data, cleaning it, storing it in meaningful ways, then summarizing and examining it, and also testing various hyoptheses about the data.

More information can be found the Wikipedia's Data Analysis page.

4642 questions
1
vote
2 answers

Adding text between data column in python

I have a df, Names Values TC Ram 2 TC Count pechi 2 TC Count Sunil 1 TC Count Ravi 1 TC Count sri 1 TC Count I want to add some texts between the data columns I tried df.join but…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

Python/Pandas - Delete duplicate rows by column value

I have DataFrame like this: sale_id dt receipts_qty 31 196.0 2017-02-19 95.0 32 203.0 2017-02-20 101.0 33 196.0 2017-02-21 105.0 34 196.0 …
user8407067
1
vote
0 answers

Finding most common path in order in R

I'm trying to find a common route (most visited) or path from a list of visited areas by many participants in order format. Here is a toy data set looks like: In this dataset the values times are in seconds and the smallest time means first visit of…
1
vote
1 answer

mapping matching word count on a column using pandas in python

I have a df, Name Step Description Ram 1 Ram is oNe of the good cricketer Ram 2 gopal one Sri 1 Sri is one of the member Sri 2 ravi good Kumar 1 Kumar is a keeper Madhu 1 good…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

spotfire multiple over statements in one custom expression

I have a table of travel expenses for analysis. I would like to create a calculated column with a value for the maximum count of records with a certain category for each employee on any given day. For example, if the category being reviewed is…
cookiemnstr247
  • 121
  • 3
  • 14
1
vote
2 answers

Pandas combining same row into new column while preserving the row(not a simple group-by)

I am a pandas beginner and in need of some help. I have the following pandas dataframe: ID Val-A Val-B aab12 lower -30 dbc11 lower -10 aab12 upper 50 dbc11 upper 20 I want to produce a new dataframe from the…
Tom
  • 343
  • 3
  • 12
1
vote
0 answers

$ operator not defined for this S4 class

I have been working on working on applying top.topic.words function and I have got this error.Would anyone be able to help me out with this?. > top.words <- top.topic.words(l$V1, 5, by.score=TRUE) Error in l$V1 : $ operator not defined for this S4…
1
vote
1 answer

Can you merge two principal components?

I am doing a regression on the big 5 personality traits, and how birth order affect those traits. First I am trying to build 5 variables based on surveys that captures those traits. I have thought about creating dummies for each question in the…
MissM
  • 11
  • 2
1
vote
2 answers

Mapping keyword with a dataframe column using pandas in python

I have a dataframe, DF, Name Stage Description Sri 1 Sri is one of the good singer in this two 2 Thanks for reading Ram 1 Ram is one of the good cricket player ganesh 1 good driver and a…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

How to map sevaral keywords with a dataframe column values using pandas in python

Hi I have a list of keywords. keyword_list=['one','two'] DF, Name Description Sri Sri is one of the good singer in this two Ram Ram is one of the good cricket player I want to find the rows which are having all the…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

Partition multiple table based on primary key using Apache spark or any big data tool

I have a data of 75 e-commerce customer account data in a csv file. Also, I have transaction records in another file. Here, Account number is a primary key. Every account is having average 500 transactions. Now, I want to process this data and…
1
vote
0 answers

Retrieve classification performance from "crossval"

I am classifying some data as a dummy-test against a zero vector, using a Support Vector Machine (SVM), as follows: kernel = 'linear'; C =1; class1 = double(data(labels==1,:)); class2 = zeros([size(class1,1),size(class1,2)]); data =…
TestGuest
  • 593
  • 1
  • 4
  • 16
1
vote
1 answer

How to generate a list of contemporaneous loans from loan level data?

I am trying to detect multiple borrowing in a loan level data set that looks as follows: d = {'start_month': [1,2,4,1,14], 'customer': ['A','A','A','C','C'], 'branch': [1,2,3,2,1], 'maturity_month': [13,14,16,13,26]} df = pd.DataFrame(data=d) I…
Ruth
  • 11
  • 3
1
vote
0 answers

When comparing timeseries, how to make up for temporal misalignments in real-time?

Picture: Alignment of two timeseries using DTW. Source: Wikipedia I'm trying to extract features and learn patterns from timeseries automatically. Certain algorithms however expect the training samples to be temporally aligned. Using DTW…
1
vote
2 answers

IF I wanted to predict future purchases in online shopping using historical data, do I need data science or data analysis or big data?

I wanted to learn to predict future events like......being able to predict number of plane crashes in 2018 using past two decades of plane crash data.....or.....predict how many tee-shirts with justin beibers face on it will be sold by 2018…
Raja
  • 75
  • 1
  • 1
  • 10