Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data-mining.

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions

vote

3 answers

How to pass only necessary features to pipeline after SelectKBest

I have a regular tabular dataset, 100 features from the database are added I want to push it into a regular sklearn.pipeline in which there will be preprocessing, encoding, some custom transformers, etc. Penultimate estimator would be…

scikit-learn data-science feature-engineering mlops data-engineering

asked Aug 19 '23 at 08:06

Nikitosiwe

vote

1 answer

How to get average/mean with mapping in pandas dataframe?

I have a dataframe that looks something like this: Birthyear Weight 1992 2 1993 2.2 1992 3 1993 2.5 1994 2.4 1993 1.8 1994 2.1 Note: This is an example, I have +100k of rows and years I want to get a new DataFrame in which I…

python pandas dataframe data-science mean

asked Aug 14 '23 at 14:07

Tabare De Los Santos

vote

0 answers

All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough'

I was studying In Depth: k-Means Clustering section from the textbook Jake VanderPlas's Python Data Science Handbook and I came across the following code block: from sklearn.datasets import load_digits from sklearn.manifold import TSNE from…

scikit-learn data-science k-means scikit-learn-pipeline tsne

asked Aug 07 '23 at 08:02

nakoshimati

vote

0 answers

I'm getting an import error with ydata-profiling-4.4.0: `BaseSettings` has been moved to the `pydantic-settings` package

I know that Pydantic V2 introduced new things which make it incompatible with V1, so I switched from pandas_profiling to ydata_profiling. Because of that, I had to switch versions of the dependencies, but now I'm getting a complex error which makes…

python pandas machine-learning data-science

asked Aug 06 '23 at 02:32

Atharva Rao

vote

0 answers

Trying to find Optimal threshold using Youden Index and ROC curve, but its accuracy,f1 score is much lower than most of thresholds?

Youden’s J statistic J = Sensitivity + Specificity – 1 J = Sensitivity + (1 – FalsePositiveRate) – 1 J = TruePositiveRate – FalsePositiveRate Goal is to get - > Maximum TPR and Minimum FPR fpr, tpr, thresholds =…

python scikit-learn statistics data-science roc

asked Aug 03 '23 at 16:45

Sauron

vote

1 answer

How to download XLSX file from DOI link?

I want to download two files automatically from Python for a reproducible statistical analysis. These links https://doi.org/10.1371/journal.pone.0282068.s001 https://doi.org/10.1371/journal.pone.0282068.s002 I tried import requests url =…

python data-science doi

asked Aug 01 '23 at 14:32

Galen

1,128
1
14
31

vote

0 answers

Seisbench can't download dataset because Firewall

I use anaconda_jupyter notebook for doing some data science stuff, when i want to download the data using data = sbd.Iquique() I face this log 2023-08-01 20:50:49,209 | seisbench | WARNING | Check available storage and memory before downloading and…

pytorch anaconda data-science

asked Aug 01 '23 at 14:00

Lyfora

vote

1 answer

Folium popup not working when rendering HTML

I want to do HTML formatting into a folium map popup. When I try to render html by using def format_popup_content(row) function then the map does not display. How do I format popup? This is what I have tried so far def format_popup_content(row): …

python data-science folium

asked Jul 24 '23 at 07:30

Ocean Vue

vote

0 answers

Clustering Algorithms with Periodic Boundary Conditions

I've been working on a project that involves the clustering of data with periodic boundary conditions. So, I am looking for clustering algorithms that can effectively handle datasets where periodicity plays a significant role. My data is 3D and I am…

python-3.x algorithm data-science cluster-analysis

asked Jul 17 '23 at 20:45

Saha_1994

vote

2 answers

How to calculate time differences without a date and only with times?

import pandas as pd stoptimes_df = pd.DataFrame({ 'trip_id': ['1', '1', '1', '2', '2', '2'], 'arrival_time': ["12:10:00", "12:20:00", "12:30:00", "27:32:00", "27:39:00", "27:45:00"], 'departure_time': ["12:10:00", "12:20:00",…

python pandas dataframe datetime data-science

asked Jul 14 '23 at 17:22

leolumpy

vote

1 answer

Fill NaN values in Polars using a custom-defined function for a specific column

I have this code in pandas: df[col] = ( df[col] .fillna(method="ffill", limit=1) .apply(lambda x: my_function(x)) ) I want to re-write this in Polars. I have tried this: df = df.with_columns( …

python pandas data-science python-polars

asked Jul 12 '23 at 07:05

Honio

vote

3 answers

Using linear optimisation, how do I minimize the Total Cost in a dataframe

I have a Pandas dataframe with 3 columns (Product, Weight, Total Cost) as follows (expanded to make it clearer): df = { 'Product': ['Product 1', 'Product 2', 'Product 3', 'Product 4', 'Product 1', 'Product 2', 'Product 3',…

python pandas data-science linear-programming

asked Jul 09 '23 at 12:34

t24opb

vote

0 answers

8bit Quantization: Prediction outputs uncorrelated to underlying model

I quantized a basic TFLite regression model to int8 but the prediction output seems to be highly uncorrelated with the actual underlying model prior to quantizing it. All the code and steps taken to train and quantize the model are seen below to…

tensorflow data-science quantization tflite 8-bit

asked Jul 08 '23 at 23:30

Bemz

vote

1 answer

How to convert Pandas Dataframe to the shape of a correlation matrix

I have a pandas dataframe which looks vaguely like this: Out[130]: xvar yvar meanRsquared 0 filled_water precip 0.119730 1 filled_water snow 0.113214 2 filled_water …

python pandas dataframe data-science correlation

asked Jul 06 '23 at 22:56

yeet_man

vote

1 answer

How to avoid NaN values when I use frame['Colum'].map(dict)

I have the following dataset frame1 Color Item Red Shirt White Shoes Yellow Shirt Green Shoes I want to set all the colors for Shoes item to be "Blue", I use map x = {"Shoes": "Blue"} fr1["Color"] = fr1["Item"].map(x) I expected…

pandas dataframe numpy data-science

asked Jul 06 '23 at 00:30

Josue Medina

Prev 1 2 3

…

99 100 Next