Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to .

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions
1
vote
1 answer

Why does the algorithm sometimes not behave as intended?

we are currently working on a college project. We have been tasked to optimize the maintenance schedule for repairs on bikes from a bike sharing service. The bikes can only be rented from and returned to bike docking stations. We need to calculate…
1
vote
0 answers

ModuleNotFoundError: No module named 'sklearn.ensemble._bagging'

ModuleNotFoundError: No module named 'sklearn.ensemble._bagging' Which version is suitable of scikit learn for the above error? I am facing this issue when I am using the python 3.7 version. And I can't update the version.
1
vote
0 answers

Create tables using OptBinning with custom bins

I want to use the library optbinning to create tables with all the metrics, but under the assumption that I already have all the bins. I don't want to optimize the binning process, I just want the tables with my current bins. Despite the fact that…
TomasLeon
  • 11
  • 1
1
vote
0 answers

Troubleshooting 'Notebook validation failed: data.cells' error in Jupyter Notebook

How to resolve the error "Notebook validation failed: data.cells[{data__cells_x}] must be valid exactly by one definition (0 matches found)" in Jupyter Notebook? How can I resolve the "Notebook validation failed: data.cells[{data__cells_x}] must be…
1
vote
1 answer

What does ' ::Page{} ' do in R/RStudio?

What does ::Page{} do in R/RStudio ? I'm studying Data Science through IBM certification course in coursera and the notes contain this line of code in all the code blocks and no explanation to what the "Page" function is doing #load ggplot…
1
vote
1 answer

Python can recognize header position and extract header info

I have a csv file in which there are 3 or more headers in one csv file. I want python or pandas to be able to recognize the header position and extract the header info in the csv file. Here I give an example of a csv file that I have. "Level and…
1
vote
1 answer

AttributeError: 'FloatProgress' object has no attribute 'style'

import numpy as np import pandas as pd import torch from torch.utils.data import Dataset import stanza stanza.download('en') nlp = stanza.Pipeline(lang='en') above code used for Creating a Pipeline Stanza provides a plethora of pre-trained NLP…
1
vote
2 answers

How do I drop and change dtype in a Pipeline with sklearn?

I have some scraped data that needs some cleaning. After the cleaning, I want to create a "numerical and categorical pipelines" inside a ColumnTransformer such as: categorical_cols = df.select_dtypes(include='object').columns numerical_cols =…
Odiseon
  • 23
  • 3
1
vote
2 answers

How to read a column and apply a function to each cell as a tuple?

I'm trying to analyze a database with coordinates (X,Y). I need to read each data in that column and classify it as either North or South if it's "Y" or East or West if it's "X". So basically what I want to do is read each data in that column and…
rosvend
  • 21
  • 2
1
vote
3 answers

Index a different range of indicies from each row of numpy array

I have two arrays of incidies with shape m. I need to take the mean of the values inbetween the indicies from an array with shape m x n. Can this be done without iterating through each row? What is the fastest way to do this? idx0 = np.array([1, 3,…
dotto
  • 45
  • 5
1
vote
2 answers

How can I double unpivot data like in this example in SQL?

I saw sometimes you can use a cross apply, but I feel it won't work in this case as I have 10 columns for "Days" (for 10 years) and 10 columns for "Discharges" (for 10 years).....so I need this pivoted into 10 different row per zip and age…
manavjn
  • 15
  • 4
1
vote
1 answer

Efficiently iterating over a list to extract count for multiple variables

I have a dataset of medical insurance variables, and am interested in understanding how the proportion of smokers ('yes', 'no') differ between regions ('northwest', 'northeast', 'southwest', 'southeast'). I have used a for loop to iterate over each…
whorrodwi
  • 33
  • 4
1
vote
1 answer

Collapse rows by common variable of list

I want to collapse the rows of dataframe to create the orthologe group of each othologe and its corresponding genes. For example: Column A Column B Ortho1 gene1 Ortho2 gene2, gene3 Ortho3 gene4, gene5, gene6 Ortho4 gene5,…
Jin_soo
  • 65
  • 6
1
vote
0 answers

I ran a command that was supposed to show me the data about my object detection ai but i get an error that i can't solve

basically i have this command: python Tensorflow\models\research\object_detection\model_main_tf2.py --model_dir=Tensorflow\workspace\models\my_ssd_mobnet --pipeline_config_path=Tensorflow\workspace\models\my_ssd_mobnet\pipeline.config…
1
vote
0 answers

pandas group memory usage reduction

Hello i have some code that is utilizing a high amount of memory regressor_df is a df that has over 14 million elements. when i remove the location from the group by the amount of ram needed to process goes down by about 26gb. how can i run this…
wezzie
  • 11
  • 1