Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data-mining.

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions

vote

1 answer

Selecting row with highest value based on two different columns

I have a dataframe with 3 columns: I want to make a rule that if for a same city and same id, pick the maximum value and drop the row with lower value. eg: City ID Value London 1 12.45 Amsterdam 1 14.56 Paris 1 16.89 New…

python pandas dataframe numpy data-science

asked Jul 03 '23 at 15:55

Vivek Sukhala

vote

0 answers

Unable to get same dimensions of the resulting data even using same preprocessing process

I am a beginner in data science. So, I started from Kaggle's Titanic competition. Everything was good until I came to the stage to preprocess the dataset. I am unable to get the same dimensions of the resulting train and test data when using the…

python machine-learning data-science

asked Jul 02 '23 at 07:01

Hamadullah Bijarani

vote

1 answer

Getting an error from hdbscan while importing bertopic

I'm trying to import bertopic but it gives the following error. I tried different versions and re create a new environment. But it's still same. I'm using Apple M2 Pro…

python anaconda data-science topic-modeling

asked Jun 30 '23 at 12:56

Salihcan

vote

1 answer

sklearn OrdinalEncoder default ordering (categories='auto')

What is the default rule used by sklearn OrdinaleEcoder to determine the order of the categories when categories='auto'? Is it just sorted lexicographically? couldn't find it in the docs

scikit-learn data-science

asked Jun 26 '23 at 13:22

nivniv

3,421
5
33
40

vote

0 answers

How to Deal with Multiple Patterns in Financial Pattern Classification?

I have a question regarding my project on financial pattern classification using deep learning. Currently, I have organized my data into sequences of 30 days, and within each 30-day sequence, there are multiple patterns. I'm uncertain about how to…

deep-learning nlp conv-neural-network data-science recurrent-neural-network

asked Jun 26 '23 at 00:44

Mouad

vote

1 answer

I am trying to get top5 products from a list of products

So i have a data like this Category subcategory crn product1 product2 product3 product4 product5 product6 product7 A X 1 1 1 1 1 1 1 1 A Y 1 1 1 1 1 1 1 1 A Z 1 1 1 1 1 1 1 1 B X 1 1 1 1 1 1 1 1 And i want to showcase output like…

python pandas dataframe machine-learning data-science

asked Jun 23 '23 at 06:45

Alok

vote

0 answers

ImageDataGenerator.flow_from_dataframe still has problems with Overfitting

I have an image dataset of 2432 images, each with a category of a total of 3. The labels are stored in a csv file with the image id and the label (T1). The distribution of data is: negative 1695 positive 648 neutral 89 I'm trying to…

python tensorflow data-science imbalanced-data imblearn

asked Jun 22 '23 at 23:48

Javier Romero.

vote

1 answer

have two dictionary columns with different lengths want to use series explode but it mismatch

I have tow columns in my data frame that have multiple dictionarise in it and i whant to expand it into multiple coulmns but the proplem when i use explode seriese it mismatch example: Column A Column B Columns C Cell 1 {"a":0.5}…

python pandas dataframe data-science data-analysis

asked Jun 21 '23 at 08:07

SHAIMA ALJAHANI

vote

4 answers

Combine two Pandas rows into one with duplicated columns for time series

I have the following problem that I am trying to solve. I have two Pandas Dataframe rows with the same columns: Column A Column B Cell 1 Cell 2 Cell 3 Cell 4 I want to combine both rows into one single row by appending the…

python pandas dataframe data-science data-manipulation

asked Jun 18 '23 at 19:32

bktllr

vote

1 answer

Why Remove Trend and Seasonality in Time Series Forecasting?

I am struggling to understand why we need to remove trend and seasonality components from non-stationary time series data when performing time series forecasting in Python. Won't removing these components affect the accuracy of the forecasted data,…

python time-series data-science forecasting

asked Jun 17 '23 at 17:11

Gautham A

vote

1 answer

How to increase pandas explode function performance?

I have data as below in a dataframe FID SID_START SID_END 404915 1 3 and this should be expanded as below FID SID 404915 1 404915 2 404915 3 So I can group by SID to get the count I have around 480 million rows and I am…

python pandas dataframe numpy data-science

asked Jun 16 '23 at 14:32

abcd_1234

vote

2 answers

Difference between Word2Vec and contextual embedding

am trying to understand the difference between word embedding and contextual embedding. below is my understanding, please add if you find any corrections. word embedding algorithm has a global vocabulary (dictionary) of words. when we are performing…

machine-learning deep-learning nlp data-science

asked Jun 14 '23 at 08:33

tovijayak

vote

0 answers

Anaconda new environment error: Paths don't have the same drive

I want to create a new environment in Anaconda, but I get an error: ValueError("Paths don't have the same drive") file "Q:\CI_Anlisten\Users\lkh004\Conda\lib\ntpath.py, line 804 in commonpath raise ValueError("Paths don't have the same drive)

anaconda data-science

asked Jun 07 '23 at 15:42

Leyla Elkhamlichi

vote

1 answer

Sum of column that resets to zero through out a process

Is there an easy way to go about a total for a column that increments but can reset back to zero through out the dataset? I have started to go down the path of a for loop and keeping track of previous value if it isn't a zero and using multiple…

python pandas dataframe data-science

asked Jun 07 '23 at 12:51

Parker3306

vote

3 answers

Getting a ModuleNotFoundError with librosa

I am trying to load the audio files into the NumPy array using this code #%% import librosa import matplotlib.pyplot as plt import IPython.display as ipd import os, os.path import time import joblib import numpy as np #%% fname =…

python python-3.x data-science librosa modulenotfounderror

asked Jun 06 '23 at 20:42

DaddyMuffin

Prev 1 2 3

…

99 100 Next