Questions tagged [eda]

event-driven architecture

Event-driven architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of, and reaction to events.

108 questions
0
votes
1 answer

Why am I getting TypeError while creating a CDF?

For below I am using a Haberman's Dataset & LINK FOR DATASET - https://www.kaggle.com/gilsousa/habermans-survival-data-set/version/1 df_1 = df.loc[df["survival_status"] == "1"]; #here ,I have put this "1" is from dataset,1 means survive ,means 1…
0
votes
1 answer

Replace values in a pandas dataframe

I have a pandas dataframe which is generated based on events. each event has an unique ID and it generates repeated rows in the dataframe. The problem is that some of these repeated rows contains random values whih they are different from each…
0
votes
1 answer

How do I merge the "Holidays" column into my masterdata, I seem to be getting a KeyError

https://ibb.co/b7VHCc3 [DataFrames] Whenever I try to run this I keep on getting KeyError masterdata = cabdata.merge(transaction, on= 'Transaction ID').merge(customer, on ='Customer ID').merge(city, on = 'City').merge(holidaydata, on='Holidays') The…
0
votes
1 answer

pandas profiling with dask-dataframe. IndexError

I get an IndexError (IndexError: only integers, slices (:), ellipsis, nmpy.newaxis and integer or bolean arays are valid indices) while pandas profiling with dask. data: 290170 x 55 import dask.dataframe as dd from pandas_profiling import…
0
votes
1 answer

Working on Data exploratory and I am now in the Cleaning data level

I am wondering when I have a date in my dataset and there are NULL values in date columns what is the best way to impute the date Null values in a datetime dtype?! As for the floats values I have already imputed them with the mean but I am stuck…
user18257965
0
votes
0 answers

How do I determine cutline when distribution is exponential-like?

I am currently working on e-commerce customer data analysis. I plotted customer-click distribution and got the image below. X = number of clicks / Y = number of customers It seems like exponential-distribution to me..(Not exactly, but similar) I…
nowheretogo
  • 125
  • 1
  • 5
0
votes
2 answers

EDA for loop on multiple columns of dataframe in Python

Just a random q. If there's a dataframe, df, from the Boston Homes ds, and I'm trying to do EDA on a few of the columns, set to a variable feature_cols, which I could use afterwards to check for na, how would one go about this? I have the following,…
daniness
  • 363
  • 1
  • 4
  • 21
0
votes
1 answer

KubeMQ Omits Acks / Acknowledgements From Streams

This functionality: https://kafka.apache.org/10/documentation/streams/developer-guide/config-streams.html#acks seems to be missing from KubeMQ streams. Are we missing something or have they just omitted it? This would fundamentally change our…
Daisy Day
  • 652
  • 7
  • 19
0
votes
1 answer

how to get a specific tag text from a similar type of tag in XML in python?

I have below tags as - Applicants:

Fortune V Separate Account

FILING DATES:

The…

0
votes
2 answers

I want to only those names who are having value count > 15 but its giving true and false for all the names

df['player_of_match'].value_counts()>15 i am getting this output CH Gayle True AB de Villiers True MS Dhoni True ... BA Bhatt False WPUJC Vaas False AD Mascarenhas False when I am…
0
votes
2 answers

Should event driven architecture be targeted for all data & analytics platforms?

For example, You have an IT estate where a mix of batch and real-time data sources exists from multiple systems, e.g. ERP, Project management, asset, website, monitoring etc. The aim is to integrate the datasources into a cloud environment…
0
votes
2 answers

How to map two column values in a pandas dataframe (location-id, location-name) and spot errors in the dataframe?

My dataset have two columns name location-id and location-name. Each location name is given a unique location id. location-id location-name 234 SL 456 IN 234 SL 123 EN As each location has a unique…
user15962699
0
votes
1 answer

How to change dtype of multiple specific columns

How do i change multiple columns to a float when i also need to change , to. DF looks like this except i dropped all the NAN values as i dont need those rows. This is the dtypes And this is the way im doing it now, but it takes hella long time. I…
0
votes
1 answer

How to handle multi-value row of numerical value with unknown size in python?

I was actually trying to solve analytics vidya recent Hackathon LTFS(Bank Data), and there I faced something unique problem, actually not too unique. Let me explain Problem There are few columns in a Bureau dataset named REPORTED DATE - HIST, CUR…
Darkstar Dream
  • 1,649
  • 1
  • 12
  • 23
0
votes
1 answer

Adding empty columns to python DataFrame

I am trying to add couple of empty columns to a python Dataframe , the columns to be added are in the form of list, how could I do…