Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to .

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions
1
vote
1 answer

Scaling a dataset in matplotlib on x and y axis relative to another dataset?

I am trying to scale two different sets of data to be visually equivalent. Green data set has extreme Y values and significantly more data points. Hence Orange data set falls flat and short. What functions exist that allow me to scale them…
1
vote
1 answer

AttributeError: module 'keras.preprocessing.sequence' has no attribute 'pad_sequences'

I'm getting this error : AttributeError: module 'keras.preprocessing.sequence' has no attribute 'pad_sequences' import keras from keras import preprocessing from keras.utils import pad_sequences tokenizer =…
1
vote
1 answer

fillna not working in pandas for a specific column

i asigned TEMPERATURE column of NaN value to be zero but not working properly all other columns working properly but this is not senter image description here expecting to be temperature column nan values to be 0 but not working enter image…
1
vote
2 answers

Converting multi column header df from wide to long format in python

I would like to convert an excel file from wide format into long format. I'm reading an excel file which has not only two rows of headers, it also includes merged cells in the header. Input Example: | Task | Name | May,2022 | Jun,2022 | Jul,2022 …
Jogibaer
  • 15
  • 4
1
vote
1 answer

'Vect' not defined sklearn logistic regression error message

So I have this pipeline i used for a text classifier that works fine. from sklearn.feature_extraction.text import TfidfTransformer from sklearn.feature_extraction.text import CountVectorizer from sklearn.metrics import accuracy_score from…
1
vote
1 answer

Mitigation for imblearn pipelines

I'm trying to mitigate unfairness for a model I trained using an imblearn pipeline with ADASYN. My pipeline looks like this: loaded_model = Pipeline(steps=[('feature_scaler', StandardScaler()), ('adasyn_resampling',…
1
vote
1 answer

Remove numbers in tuples and enter them in new rows in csv

so I have an ugly CSV file with one column and only 2 rows, but it has many tuples in it, that looks like: Column A (1, 2, 3)(4, 5, 6)(7, 8, 9) (3, 2, 1)(5, 3, 6)(9, 8, 7) and I want to have it looks like Column A Column B Column…
alwan01
  • 25
  • 4
1
vote
1 answer

Problem with data type declaration in a function in Julia

I am currently trying to write some code to simulate a Lotka-Volterra model about a population of preyes and predators. I have written this code: function PP_gen(N_prey::Int128, N_pred::Int128, max_time::Int128, λ_breed::Float64,…
Kotatsu
  • 11
  • 2
1
vote
1 answer

Can't append nor concat two GeoDataFrames even though datas Share same CRS

Basically I was going through this tutorial where we did import data from OSM and throughout modifying the data, There was basically a command to add missing Highway speed limits to Unclassified Roads. Here's the tutorial and while recreating the…
kaydubz
  • 11
  • 3
1
vote
1 answer

trough detection algorithm returns peak data

hi i have made an algorithm to detect the time taken for a wave from beginning of a trough to next through to calculate duration all separate waves but the through function keeps returning some peaks the algorithm import pandas as pd import numpy…
zahab
  • 31
  • 1
  • 5
1
vote
0 answers

How to access camera or webcam in kaggle?

I tried using a webcam in Kaggle to take capture for the face detection project. but it's not working. code working well in Jupyter Notebook in the system here is the code that I'm using : cap = cv2.VideoCapture(0) for imgnum in…
Alireza Atashnejad
  • 612
  • 1
  • 6
  • 22
1
vote
0 answers

need to find peak duration of the data

I have a stream of data in a csv file that signifies the time with date in the 1st column and value in the 2nd column. The data is plotted below. I need to write an algorithm that gives me an array with time according to how long the peak…
zahab
  • 31
  • 1
  • 5
1
vote
1 answer

What is the issue here ? i have already installed geopandas and folium from pip install command from powershell

I was getting ready to perform a EDA on Tsunami Data as I was importing the following module geopandas module gave the error of not found so i installed it from the powershell by using pip but then too the error persisted and the same is happening…
1
vote
1 answer

Assign a number vector to each group of unique values in R df

I have a df with 2 million (2m) rows, which contains a column of different entries grouped by unique IDs. For e.g df <- df %>% mutate(A = c(1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6 6, ... , 2m, 2m, 2m, 2m) I want to…
Davidmac
  • 11
  • 1
1
vote
0 answers

Web scraping: collect chart data

I am completely new to web scraping, and I've decider to go for it, by learning some basis of Python. The data I would like to collect is the chart on the following website :"https://www.amundi-ee.com/entr/product/view/QS0009102334" (no need for a…