11

I was looking at Databricks because it integrates with AWS services like Kinesis, but it looks to me like SageMaker is a direct competitor to Databricks? We are heavily using AWS, is there any reason to add DataBricks into the stack or odes SageMaker fill the same role?

L Xandor
  • 1,659
  • 4
  • 24
  • 48

2 Answers2

17

SageMaker is a great tool for deployment, it simplifies a lot of processes configuring containers, you only need to write 2-3 lines to deploy the model as an endpoint and use it. SageMaker also provides the dev platform (Jupyter Notebook) which supports Python and Scala (sparkmagic kernal) developing, and i managed installing external scala kernel in jupyter notebook. Overall, SageMaker provides end-to-end ML services. Databricks has unbeatable Notebook environment for Spark development.

Conclusion

  1. Databricks is a better platform for Big data(scala, pyspark) Developing.(unbeatable notebook environment)

  2. SageMaker is better for Deployment. and if you are not working on big data, SageMaker is a perfect choice working with (Jupyter notebook + Sklearn + Mature containers + Super easy deployment).

  3. SageMaker provides "real time inference", very easy to build and deploy, very impressive. you can check the official SageMaker Github. https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/scikit_learn_inference_pipeline

seninus
  • 256
  • 2
  • 5
  • 1
    Thanks! Is it an either or kind of thing or is there any reason why you would use both in the same pipeline? – L Xandor Mar 20 '19 at 21:46
  • 2
    If you are working on big data analytics using Spark. i recommend to use Databricks + SageMaker. (I think Databricks is more expensive for very large analytic project). Spark ML pipeline + SageMaker endpoint deployment and cloudwatch monitoring are perfect. But if you are working on small data. Databricks is not necessary. Jupyter notebook with SageMaker is enough. – seninus Mar 21 '19 at 02:23
  • Is sagemaker studio labs a direct response to the notebook environment that databricks excels in? https://aws.amazon.com/sagemaker/studio-lab/ – sam yi Jan 05 '22 at 16:08
  • @samyi Studio Lab is a simple UI for experimentation and getting started in ML. It is not meant for production tasks. So, no - studio lab is not a response to databricks. It is more akin to Google Colab notebooks. – Dileep Kumar Patchigolla Mar 24 '22 at 09:57
9

Having worked in both environments within the last year, I specifically remember:

  • Databricks having easy access to stored databases/tables to query out of and use Scala/Spark within the Jupyter Notebooks. I remember how nice it was to just see and preview the schemas and query quickly and be off to the races for research. I also remember the quick functionality to set up a timed job on a Notebook (re-run every month) and re-scale to job instance types (much cheaper) with some button clicks. These functionalities might exist somewhere in AWS, but I remember it being great in Databricks.

  • AWS SageMaker + Lambda + API Gateway: Legitimately, today, I worked through the deployment of AWS SageMaker + Lambda + API Gateway, and after getting used to some syntax and specifics of the Lambda + API Gateway it was pretty straightforward. Doing another AWS deployment wouldn't take more than 20 minutes (pending unique specificities). Other things like Model Monitoring and CloudWatch are nice as well. I did notice Jupyter Notebook Kernels for many languages like Python (what I did it in), R, and Scala, along with specific packages already pre-installed like conda and sagemaker ml packages and methods.

kevin_theinfinityfund
  • 1,631
  • 17
  • 18