As 2020 begins, there has been limited cloud data science announcements so I put together some predictions. Here are 3 things I believe will happen in 2020.
1. Cloud Collaboration
I think we are going to see more interoperability between the major cloud providers. For example, Azure Arc now allows you to run Azure products on a kubernetes container running anywhere (even in Amazon Web Services or Google Cloud) and AWS Outposts runs AWS on-premise.
Large enterprise organizations already have complex, multi-provider environments. It only makes sense for the big cloud providers to start working with each other. I think the days of choosing a single cloud provider are limited. Organizations need to move quickly and that means choosing the right cloud tools for the job, regardless of the provider. Organizations do not have the time to adopt a new cloud provider every time the requirements change.
2. AutoML Drama
I did not know what else to call this. Automated Machine Learning (AutoML) is really popular right now. If you are unfamiliar with the term, here is a description. AutoML is technique which takes raw data as an input and automatically creates a predictive model. It does model and feature selection automatically. It even does some feature engineer. Plus, it is only getting better as time goes by.
I believe 2020 will bring some large and possibly heated debates about using AutoML. It is a nice new technology with a ton of promise. However, the questions below will need to be addressed.
- Is AutoML better than humans?
- When should AutoML be used?
- Does every project need AutoML?
- What is AutoML good at?
- Will AutoML replace data scientists?
3. A move away from Jupyter Notebooks
Wait, Wait, before you send me some unkind messages, please hear me out. Jupyter notebooks were created as an environment for data scientists to quickly run analysis and share results and code with others. Jupyter is awesome for that, and it is way better than dealing with software installations and out-dated versions.
However, because of its success amongst data scientists, it has become an enterprise tool. It nows does the following things:
- Code reuse between projects
- Setup a deployment pipeline
- Integrate with source control
- Auto-scheduling jobs
Unfortunately, I don’t feel it does those things well. Jupyter notebooks were not designed for those tasks and they were tacked on after it was built and they appear a bit clunky. Thus, I believe 2020 will bring some better tools for doing enterprise data science. I don’t know if it will be via an IDE or some other tool, but I think something new is coming.
I did make a video with my thoughts on the same topic.