Playing with H2O

Testing out H2O on a Kaggle Get Started Tutorial (Slightly Outdated)

In recent years, there has been a tremendous effort from different technology companies trying to create data science platforms to streamline machine learning and data science operationalization. For instance: KNIME, H2O, Databricks etc. I have decided to test out H2O, one of the leading machine learning platform.

Installation and Getting Started

Installation is easy following the download instructions. H2O flow can be opened within localhost, providing a notebook-style (kind of like Jupyter Notebook) for performing data science.

The UI is simple to follow and does not require any prior programming experience to use. Most of the data preparation procedures are point and click, and are very intuitive.

Model Training

AutoML is also very straight forward in this case.

I was able to input the raw data from this compeition without any preprocessing such as performing one-hot-encoding etc. The data contains both numerical values and string values. AutoML takes care of all this and was able to perform predictions. I have chosen not to exclude any models from being excluded from training and set the training time to be 1 hour.

Results

I uploaded the prediction results onto Kaggle and received a decent score. This prediction outperforms my previous attempt using AutoML Tables on GCP and places me in the upper 46%.

Summary

Overall H2O is very simple to pick up and can be used by anyone without programming experience but a strong interest in data science. The entire processing from data ingestion to prediction only took 1.5 hours (including installation), with 1 hour spent on training. The prediction also outperforms AutoML Tables services provided on GCP. I highly recommend trying out H2O.