Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to .

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions
1
vote
1 answer

Unzipping tokenizers\punkt.zip in nltk.download('punkt')

I have integrate ntlk in my python project but after installing punkt by nltk.download('punkt') is is showing Unzipping tokenizers\punkt.zip. I have check the nltk-data download location for confirmation but nothing happened.
1
vote
2 answers

How to convert monthly data to yearly data with the value as the mean average over the 12 months? (Python Pandas)

This is what my data looks like: month total_mobile_subscription 0 1997-01 414000 1 1997-02 423000 2 1997-03 431000 3 1997-04 479000 4 1997-05 …
Lim Ze Kai
  • 13
  • 2
1
vote
0 answers

How to predict sales data as total and sub category wise?

I have a data set of a tea export company and it includes total export and tea types and weight categories. It looked like this Date Type Weight Quantity Price 2016-01-01 black bags 1734136.51 1131.30 2016-01-01 black …
John Snape
  • 79
  • 5
1
vote
2 answers

Pandas: combining information from rows with consecutive duplicate values in a column

The title might be a bit confusing here, but in essence: We have a dataframe, let’s say like this: SKUCODE PROCCODE RUNCODE RSNCODE BEGTIME ENDTIME RSNNAME 0 218032 A 21183528 1010 2020-11-19 04:00 2020-11-19 04:15 Bank…
1
vote
1 answer

High ROC-AUC and recall, but low precision and accuracy in balanced dataset

I'm using titanic dataset so it's pretty balanced (about 60:40) and the GaussianNB model (standard parameters) has accuracy of 0.659. When I plotted F1, precision and recall I discovered the reason for such a low score. F1, precision and recall of…
1
vote
1 answer

Can I use the output of tf.keras.utils.image_dataset_from_directory to train an autoencoder?

To put it simply, I'd like to be able to use a keras dataset created from a local image directory to train an autoencoder. To clarify, this is a model that approximates the Identity function for images : ideally, the output is exactly equal to the…
user3170530
  • 416
  • 3
  • 13
1
vote
1 answer

How to count text event type and transform it into country-year data using pandas?

I am trying to convert a dataframe where each row is a specific event, and each column has information about the event. I want to turn this into data in which each row is a country and year with information about the number and characteristics about…
taraamcl
  • 25
  • 5
1
vote
1 answer

Which statistical test to use?

I have dataset containing two columns X and Y. Column Y is binary with values 0 and 1. There is also a range for column Y (150, 400) which are the standard results. Which statistical test should I use to find if values in column X outside the given…
kaniosx
  • 11
  • 2
1
vote
1 answer

how to plot top k rows by a given column as a bar plot in FacetGrid (with code dummy data and solution in matplotlib)

the task is to plot the top 5 NBA players according to accumulated points, as a bar plot, and to compare between east and west players i implemented the solution in matplotlib but i want to know how to do it with FacetGrid import numpy as np import…
enter_thevoid
  • 113
  • 1
  • 11
1
vote
1 answer

How to use pythons in operator in a dataframe to search for a string and return boolean in a new column in the same dataframe

I have a dataframe df which contains movies data. . I want to create a new column in df called "drama_movie" which contains the value True for the movies that are Dramas and False for if they are not. I tried it with following…
1
vote
1 answer

A different merge

So I have two tables and thoses are the…
Yago Dias
  • 25
  • 5
1
vote
0 answers

Feed the output of a polars groupby aggregation back in to an operation on the original dataframe?

I have some timeseries data in a polars LazyFrame where I am detecting events based on extended periods in which some criteria is true. The signal I am basing this on isn't reliable, so if there is a very small amount of time between events, I roll…
1
vote
1 answer

How can I merge my columns into a single one using a multiindex

I have a DataFrame looking like this: year 2015 2016 2017 2018 2019 2015 2016 2017 2018 2019 ... 2015 2016 2017 2018 2019 2015 2016 2017 2018 2019 PATIENTS PATIENTS PATIENTS …
Apollo
  • 164
  • 8
1
vote
0 answers

why is the minimum value is 18K but my original data has maximum value 12K

my data has a maximum value of 12K but when i visualize the data it says that the minmum value is 18K which is WRONG the problem is with the A(s) variable this is the code in python date CLEARANCE PERMITS INSPECTION Facilities licenses …
kjnk
  • 19
  • 3
1
vote
1 answer

Looping over and mapping slices of a DataArray in xarray

Say I have a multidimensional DataArray and I would like to loop over slices according to some dimension and change them. For example, the first dimension is time and I would like for each time to receive a DataArray that represents a slice of that…
davegri
  • 2,206
  • 2
  • 26
  • 45
1 2 3
99
100