Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to .

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions
1
vote
0 answers

How to make a visualization that shows every seconds data?

I get data in my job which contains datetime, Temperature, Current(mA), Brake release. This data is of a crane motor. In one day there are about 80k-100k rows in excel generated by the tags. I have to make a visulization where the datetime, current,…
1
vote
2 answers

Apply a multiplication to random elements on my dataframe

So what I want is to randomly chose a given amount of elements on my dataframe, and to those elements, apply an operation (which will be a multiplication by a number which will also be randomly chosen between a range) to an 'eta' column. I'm stuck…
Alberto
  • 91
  • 7
1
vote
1 answer

How to fix ValueError in Mixed Effect Linear Regression (Python)?

I have to analyze my dataframe on python. I have uploaded an image of what my table looked like. I need to fit the "stage"(dependent variable), "overallscore"(independent variable), "spatialreasoning"(independent variable) &…
Anusha
  • 11
  • 2
1
vote
3 answers

Create new column that counts each string match

I aim to create a new column based on multiple string matches. This new column will return a 1 number for every match on the event column. Date Event 0 2022-11-01 Breakfast 1 2022-11-01 Breakfast 2 2022-11-01 …
1
vote
0 answers

NMF reconstruction on test data

I am trying to train NMF with a training set and apply it to a test set. But I encounter the incompatible shape problem in the second step. I wrote a python code to do NMF: def cost(X, W, H): """ Compute the Euclidean distance-based…
1
vote
1 answer

How to evaluate the quality of PCA returned by torch.pca_lowrank()?

I use the following piece of code: U, S, V = torch.pca_lowrank(A, q=self.n_components) self.V = V self.projection = torch.matmul(A, V) How to compute the cumulative percent variance or any other accuracy metric (single value between 0 and 100%)…
Serge Rogatch
  • 13,865
  • 7
  • 86
  • 158
1
vote
1 answer

Compute weighted cumulative sum for all pair of start point / end point in numpy

I have a numpy array signal and I try to compute the following value: C = (signal[t_begin:t_end + 1] * (np.arange(t_end - t_begin + 1) + 1)).sum() For all pairs t_begin, t_end such that 0 <= t_begin <= t_end < len(signal) without using a for…
1
vote
2 answers

In R: Why can't I color the dots on this plot?

In R, I use a self-generated data frame of parties with their seats in an election. The problem is that the parliament plot is generated correctly, but the points that form the plot are shown in black and I cannot display them with the color that…
1
vote
2 answers

How to convert date and time inside data frame to float64 datatype?

I have this Excel file data as in the image below ] following this tutorial with data mentioned (https://i.stack.imgur.com/kbI7C.png) I use colab notebook : by writing code down import numpy as np import pandas as pd import matplotlib.pyplot as…
Mohammed
  • 346
  • 1
  • 12
1
vote
2 answers

How to count people who are below a position

I'm looking to count how many people are below a given user of the data frame. Employee Manager A - B A C A D A E A F B G B H C I C I would like to get in the output: I, H, G, F, E and D have no employees below C has two…
1
vote
1 answer

How to remove columns that contain all the same value

I have count data (columns) in the form of presence/absence (1/0) of various genes in different samples that belong to one of two categories. I am doing a Fisher's (fisher.test) for each gene, but I get an error whenever that gene is present (1) or…
ABee
  • 17
  • 4
1
vote
0 answers

Not reading / opening of root file from uproot

I am trying to read a file.root using ** uproot** but it's throwing back this error :- File "/Users/siddharthkeshar/opt/anaconda3/lib/python3.9/site-packages/uproot/source/chunk.py", line 370, in wait raise OSError( OSError: expected Chunk of length…
1
vote
1 answer

What object is a sklearn.pipeline.Pipeline that applies a ColumnTransformer actually fitting on when fit(X, Y) is called on it

I am trying to get an idea of the inner workings of a scikit learn Pipeline. Consider the below data set and pipeline construction. data = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Age' : [30, 40, 37], 'City': ['Amsterdam',…
gebruiker
  • 115
  • 5
1
vote
2 answers

Can I count unique values of a column and add counts as seperate columns for each unique value, preferably seperately for each subject?

For example, I have my data in the following form: Group Product 0 1 A 1 1 A 2 1 B 3 2 A 4 2 B 5 2 C 6 3 A 7 3 C 8 3 C What I would like to…
1
vote
1 answer

How to fill gaps in a time series that is formatted in columns, using Python

I have time series data with each year's data stored in different columns. This data is known to have some data errors which are causing minor holes in the time series data that I'd like to correct. For reference, the data looks like this df =…
mar56
  • 11
  • 1