Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to .

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions
1
vote
1 answer

Selecting row with highest value based on two different columns

I have a dataframe with 3 columns: I want to make a rule that if for a same city and same id, pick the maximum value and drop the row with lower value. eg: City ID Value London 1 12.45 Amsterdam 1 14.56 Paris 1 16.89 New…
1
vote
0 answers

Unable to get same dimensions of the resulting data even using same preprocessing process

I am a beginner in data science. So, I started from Kaggle's Titanic competition. Everything was good until I came to the stage to preprocess the dataset. I am unable to get the same dimensions of the resulting train and test data when using the…
1
vote
1 answer

Getting an error from hdbscan while importing bertopic

I'm trying to import bertopic but it gives the following error. I tried different versions and re create a new environment. But it's still same. I'm using Apple M2 Pro…
Salihcan
  • 91
  • 13
1
vote
1 answer

sklearn OrdinalEncoder default ordering (categories='auto')

What is the default rule used by sklearn OrdinaleEcoder to determine the order of the categories when categories='auto'? Is it just sorted lexicographically? couldn't find it in the docs
nivniv
  • 3,421
  • 5
  • 33
  • 40
1
vote
0 answers

How to Deal with Multiple Patterns in Financial Pattern Classification?

I have a question regarding my project on financial pattern classification using deep learning. Currently, I have organized my data into sequences of 30 days, and within each 30-day sequence, there are multiple patterns. I'm uncertain about how to…
1
vote
1 answer

I am trying to get top5 products from a list of products

So i have a data like this Category subcategory crn product1 product2 product3 product4 product5 product6 product7 A X 1 1 1 1 1 1 1 1 A Y 1 1 1 1 1 1 1 1 A Z 1 1 1 1 1 1 1 1 B X 1 1 1 1 1 1 1 1 And i want to showcase output like…
Alok
  • 11
  • 5
1
vote
0 answers

ImageDataGenerator.flow_from_dataframe still has problems with Overfitting

I have an image dataset of 2432 images, each with a category of a total of 3. The labels are stored in a csv file with the image id and the label (T1). The distribution of data is: negative 1695 positive 648 neutral 89 I'm trying to…
1
vote
1 answer

have two dictionary columns with different lengths want to use series explode but it mismatch

I have tow columns in my data frame that have multiple dictionarise in it and i whant to expand it into multiple coulmns but the proplem when i use explode seriese it mismatch example: Column A Column B Columns C Cell 1 {"a":0.5}…
1
vote
4 answers

Combine two Pandas rows into one with duplicated columns for time series

I have the following problem that I am trying to solve. I have two Pandas Dataframe rows with the same columns: Column A Column B Cell 1 Cell 2 Cell 3 Cell 4 I want to combine both rows into one single row by appending the…
1
vote
1 answer

Why Remove Trend and Seasonality in Time Series Forecasting?

I am struggling to understand why we need to remove trend and seasonality components from non-stationary time series data when performing time series forecasting in Python. Won't removing these components affect the accuracy of the forecasted data,…
Gautham A
  • 19
  • 4
1
vote
1 answer

How to increase pandas explode function performance?

I have data as below in a dataframe FID SID_START SID_END 404915 1 3 and this should be expanded as below FID SID 404915 1 404915 2 404915 3 So I can group by SID to get the count I have around 480 million rows and I am…
abcd_1234
  • 29
  • 5
1
vote
2 answers

Difference between Word2Vec and contextual embedding

am trying to understand the difference between word embedding and contextual embedding. below is my understanding, please add if you find any corrections. word embedding algorithm has a global vocabulary (dictionary) of words. when we are performing…
tovijayak
  • 11
  • 3
1
vote
0 answers

Anaconda new environment error: Paths don't have the same drive

I want to create a new environment in Anaconda, but I get an error: ValueError("Paths don't have the same drive") file "Q:\CI_Anlisten\Users\lkh004\Conda\lib\ntpath.py, line 804 in commonpath raise ValueError("Paths don't have the same drive)
1
vote
1 answer

Sum of column that resets to zero through out a process

Is there an easy way to go about a total for a column that increments but can reset back to zero through out the dataset? I have started to go down the path of a for loop and keeping track of previous value if it isn't a zero and using multiple…
Parker3306
  • 63
  • 3
  • 9
1
vote
3 answers

Getting a ModuleNotFoundError with librosa

I am trying to load the audio files into the NumPy array using this code #%% import librosa import matplotlib.pyplot as plt import IPython.display as ipd import os, os.path import time import joblib import numpy as np #%% fname =…