Questions tagged [mlops]

This tag is for programming questions about MLOps, which is the application of DevOps principles in the design and deployment of Machine Learning (ML) systems.

See also:

Related tags

  • mlflow
  • kubeflow
  • feature-store
228 questions
1
vote
0 answers

Need Guidance for deploying Huggingface model with Flask on Kubernetes with Ingress and GPU support

So, I have developed a chatbot based application using multiple services (used multiple NodeJs servers + flask servers) dockerize and deployed as kubernetes pod and used minikube Ingress-Nginx Controller. The problem I am facing is that my Chatbot…
Ishan Joshi
  • 151
  • 3
  • 19
1
vote
0 answers

RLEstimator class in Sagemaker referencing python scripts that I can't alter when using Vowpal Wabbit image

I'm trying to create a Sagemaker hosted endpoint using RLEstimator class and the Vowpal Wabbit image to create a contextual bandit. Example here Reinforcement Leaning with Sagemaker My code when creating the training job works fine vw_image_uri =…
1
vote
3 answers

An error occurred while loading the model. No module named 'pandas.core.indexes.numeric'. Error in Databricks classification model serving endpoint

I'm currently struggling with setting up Serving endpoints for classification models in Azure Databricks. I've tried this for a few different classification models, such as the following example provided by databricks themselves, using…
MattC
  • 11
  • 1
1
vote
1 answer

Prometheus: Integrate history data

I want to use grafana and prometheus to monitor some ML models in production. I already have a connector that exports metrics stored in MLFlow and makes them visible to prometheus. I can now query these metrics on prometheus but they are all showed…
natbb06
  • 11
  • 1
1
vote
0 answers

How to serve a model doing image classification or object detection using mlflow serving

Context: I am training a Yolov8 model and have successfully registered it in MlFlow. The model that is used for object detection supports input in the following formats: a string form of NumPy array (obtained after opening the image with OpenCV)…
1
vote
1 answer

How do you access DVC remote storage to view the file content?

I am very new to DVC and I encounter a few problems with remote storage. I stored my data into dvc remote storage here (.dvc/config file): [core] remote = dvc-remote ['remote "dvc-remote"'] url = /tmp/dvc-storage Questions: Where can I…
cyMLOps
  • 23
  • 3
1
vote
1 answer

How to change or specify a DVC experiment name?

How do I change the name of the experiment? I tried to use dvc exp run -n to name the project then use git to push to github. However the experiment name is still SHA. Tried: I tried to use dvc exp run -n to name the project then use git to push to…
cyMLOps
  • 23
  • 3
1
vote
1 answer

`mlflow server` - Difference between `--default-artifact-root` and `--artifacts-destination`

I am using mlflow server to set up mlflow tracking server. mlflow server has 2 command options that accept artifact URI, --default-artifact-root and --artifacts-destination . From my understanding, --artifacts-destination is used when the…
wavingtide
  • 1,032
  • 4
  • 19
1
vote
0 answers

Kedro, running inference on user input

I have a pipeline with the model I want to use. Outside of the project, I have an app.py file where I'm going to create the UI/UX for my users to run my model. Right now I'm just using a sample string but later on, you can imagine that there will be…
João Areias
  • 1,192
  • 11
  • 41
1
vote
1 answer

Metrics for any step of Sagemaker pipeline (not just TrainingStep)

My understanding is that in order to compare different trials of a pipeline (see image), the metrics can only be obtained from the TrainingStep, using the metric_definitions argument for an Estimator. In my pipeline, I extract metrics in the…
duff18
  • 672
  • 1
  • 6
  • 19
1
vote
1 answer

How to Update a Azure ML Dataset with a new pandas DataFrame and How to Revert to a Specific Version if Needed

Is there a way that we could update an Existing Azure ML Dataset using a pandas Dataframe and update the version? The default Dataset is stored in a blob as a csv file.How can we approach this? Also let's say we want to change the latest version to…
Imperial_J
  • 306
  • 1
  • 7
  • 23
1
vote
0 answers

Project couldn't be created in SageMaker MLOps Project Walkthrough Using Third-party Git Repos in AWS

i tried to SageMaker MLOps Project Walkthrough Using Third-party Git Repos using AWS pipline. I am begginer in AWS. This is errror. "Your project couldn't be created Studio encountered an error when creating your project. Try recreating the project…
Udan
  • 39
  • 3
1
vote
1 answer

Vertex pipeline model training component stuck running forever because of metadata issue

I'm attempting to run a Vertex pipeline (custom model training) which I was able to run successfully in a different project. As far as I'm aware, all the pieces of infrastructure (service accounts, buckets, etc.) are identical. The error appears in…
1
vote
0 answers

How to run inference of a custom pytorch model (madule) on pyspark dataframe?

How to run inference of a custom pytorch model (madule) on pyspark dataframe ? I have a class that use pytorch model : def get_model_obj(model): model = model.module if hasattr(model, "module") else model return model class…
1
vote
1 answer

ml_metadata.errors.AlreadyExistsError: Given node already exists

I ran into a problem using TFX, MLMD, and Apache-Airflow as the orchestrator. Local-dag-runner, provided by TFX, works fine, resulting in distinct artifacts for each pipeline component run. The problem arises when airflow is used as the…
Parham Davari
  • 379
  • 4
  • 6