Questions tagged [data-analysis]

Data Analysis involves extracting meaning and insights from raw data. It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions.

Data Analysis involves extracting meaning and insights from raw data.

It involves methods and algorithms that examine, clean, transform and model the data to obtain conclusions and insights.

Typically, data analysis involves a series of steps. Starting with measuring some parameters of interest, collecting the data, cleaning it, storing it in meaningful ways, then summarizing and examining it, and also testing various hyoptheses about the data.

More information can be found the Wikipedia's Data Analysis page.

4642 questions
1
vote
1 answer

Pattern Recognition in Datasets without Visualisation for Data Analysis

Using Machine Learning How to recognise a pattern in a data without using data visualisation so that the machine recognises patterns on its own so I can use those patterns for further analysis without needing to analyse visualisations on my own…
1
vote
1 answer

Seaborn automatic hide yticks if there are too many yticks

I'm using seaborn to draw a heatmap. But if there are too many yticks, some of them will be automatically hidden. The result looks like: As you can see, the yticks only shows 1, 3, 5, 7.... 31, 33 How can I let seaborn or matplotlib show all of…
sunnwmy
  • 143
  • 2
  • 9
1
vote
1 answer

Numpy np.newaxis

saleprice_scaled = / StandardScaler().fit_transform(df_train['SalePrice'][:,np.newaxis]); Why is newaxis being used here? I know newaxis, but I can't figure out it's use in this particular situations.
1
vote
1 answer

how to remove entire column if a particular row has duplicate values in a dataframe in python

I have a dataframe like this, df, Name City 0 sri chennai 1 pedhci pune 2 bahra pune there is a duplicate in City column. I tried: df["City"].drop_duplicates() but it gives only the particular column. my…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
1 answer

R: Shiny scatter plot that cycles through time variable

Using shiny and ggplot2 I am trying to create a scatter plot that will go through the Time variable (starting from 00:00 and ending at 24:00) and plot X and Y when the time for that coordinate comes up. Which will create a frame by frame reference…
t00T
  • 25
  • 3
1
vote
3 answers

Subtracting rows between data frames in pandas

I have two dataframes, df1 Name | std kumar | 8 Ravi | 10 Sri | 2 Ram | 4 df2, Name | std Sri | 2 Ram | 4 I want to subtract df2 rows from df1 and I tried, df1.subtract(df2,fill_value=None) but I am getting error, TypeError:…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
2 answers

Grouping pandas dataframe into groups based on a repeating sequence of values in one column

I have 2 columns in my dataframe: x and y. x is continually repeating between 1-4 and I need to find out some statistics about the sections where x=2, e.g. mid-point and average etc. I have created a third column using .shift(-1): …
AM94
  • 13
  • 8
1
vote
1 answer

How to get the count of occurence of a list of keywords on a datacolumn in a dataframe in python

my_list=["one","is"] df Out[6]: Name Story 0 Kumar Kumar is one of the great player in his team 1 Ravi Ravi is a good poet 2 Ram Ram drives well if anyone of the items in my_list is present in the "Story"…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
2 answers

Markov Chain Model

How to generate the transition matrix and predict the next 2 Events using Markov Chain model ? I have the data in the form shown below…
1
vote
1 answer

Sorting dataframe

I have two dataframes of same dimensions. But as shown I want to order the dataframe datasuch that the id_num is ordered in the order of id_num of reference. The output should look like the dataframe required. How can I do it? reference <-…
sid
  • 113
  • 1
  • 9
1
vote
1 answer

Convert a 28 level categorical variable to matrix

I have a data set that has one column company, I will do regression modelling for this dataset. Should I convert it using model.matrix or just assign values from 1-28 in one column. What is the relevance of converting it to 28 columns when lm…
Ankit Katiyar
  • 2,631
  • 2
  • 20
  • 30
1
vote
2 answers

selecting multiple index from vector

I have a vector containing say 30000 elements. I want to get a new vector out of it which will have 15000 elements from index 1:5, 11:15, 21:25 and so on till 29991:29995. How can I do it using "R programming"?
sid
  • 113
  • 1
  • 9
1
vote
1 answer

How to doubly normalize scientific data by baseline time point and then controls in R

I have a data.table with a bunch of parameters (amplitude, rate, area, etc..there are 23 in total) that belong to specific wells (singular experiment, if you will, there are 48 in total), grouped by treatments (there are usually ~10 in total), and…
JVP
  • 309
  • 1
  • 11
1
vote
0 answers

Replacing NA in Rolling mean for observations that don't have preceding observations equal to window size

structure(list(Score = c(2, 5, 1, 36, 69, 8, 54, 25, 2, 2, 2, 5, 5, 4, 1), ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)), .Names = c("Score", "ID"), row.names = c(NA, -15L), class = "data.frame") Suppose I take group_by (window=3),…
SriM
  • 11
  • 4
1
vote
1 answer

Value error with labelencoder and OneHotEncoder

I'm trying to turn a categorical string column into several dummy variable binary columns, but I'm getting a valueerror. Here's the code: import sys, os import numpy as np import matplotlib.pyplot as plt import pandas as pd from dateutil import…