Prototype Data Models rapidly with Autoprototype: our newest open source contribution - Ideas2IT

Prototype Data Models rapidly with Autoprototype: our newest open source contribution

Share This

And am I so happy to blog about our latest open source contribution – Autoprototype.

As a data scientist, we always spend a few hours or sometimes a few days finding the best prototype model for our data! Also, selection of the model hyperparameters is yet another important job and often requires minute attention to address overfitting and better learning of the model. The fixit to this problem is here!!! Ideas2IT brings to you the automatic prototyping library “Autoprototype”. True to its name, this module automates:

  1.  the tedious rapid prototyping process for some given data and 
  2. selects for you the respective appropriate hyperparameters. 

This is easy and requires only a few lines of code. Apart from the input data, only few other parameters are required by the module. All the processes in rapid prototyping are automated thereafter, through this module. The structure is based on some default values for parameters spaces required for the optimization. However, users would also have liberty to dynamically construct the search spaces for the hyperparameters.

This module is a wrap around the popular Hyper-parameter Optimization Tool called Optuna. Optuna is used for the optimization process using iterative trails. This module takes the data as the primary input and suggests the user the model based on this Optuna trials. Optuna enables efficient hyperparameter optimization by adopting state-of-the-art algorithms for sampling hyper-parameters and pruning efficiently unpromising trials.

We kept in mind some of the most important features of Optuna while constructing the library. Namely,

  1. Efficient Optimization Algorithms : Optuna enables efficient hyperparameter optimization by adopting state-of-the-art algorithms for sampling hyperparameters and pruning efficiently unpromising trials.
  2. Pythonic Search Space : Search spaces can be defined for all spaces including parameters of type : categorical, integer and floating point.
  3. Lightweight and versatile : Optuna is entirely written in Python and has few dependencies and hence any real time data is easily applicable to this,

So let’s get started with the module!

Installation

This installation is fairly simple, and comes with fairly very small dependencies which are installed alongside the main installation. This command to install this package is:

$pip install autoprototype

Supported Libraries

The first release of the library comes with support for all popular SKlearn models and Tensorflow Keras  ANN and CNN models. In the tables below, are the listed model structure and hyperparameters which are available.

Sklearn

Model Structure Hyperparameters Type Description
Decision Tree min_sample_leaf model parameter min number of sample leaf
Logistic Regression Max Iteration model parameter max iterations for convergence of the model
Ridge Regression alpha model parameter penalty /tuning parameter
Lasso Regression alpha model parameter penalty /tuning parameter
Linear Regression Normalize data parameter Boolean for normalizing the data
SVM c model parameter penalty /tuning parameter
Random Forest rf_max_dept model parameter max depth of the RF trees
n_estimators model parameter number of estimator trees

Apart from the above parameters, a cross validation parameter “k” to determine the number of cross validation required is also suggested for each model. 

Let’s look at the code that is required!

from autoprototype.sklearn import sklearnopt
hpo = sklearnopt(X_train,y_train)
trial , params , value = hpo.get_best_params(n_trials=trials)

That’s it! It’s just two lines of code. A sample output is given below:

Prototype Data Models rapidly Autoprototype

In the trial example, the best model is Logistic Regression, with maximum iterations as 1132 and has an accuracy score of 96.42% upon 8 fold cross validation.

To run the examples follow the steps here.

Tensorflow Keras

Model Parameters Type Description
ANN n_layers hidden layer structure number of hidden layers
units hidden layer structure number of units in each hidden layer
dropout rate hidden layer structure dropout rate
learning rate optimizer parameter learning rate for the optimizer
optimizer name model compilation parameter Optimizer used to compile the model
CNN nconv hidden layer structure number of convolution layers(except the first layer)
filters hidden layer structure filters in each of these conv layer
kernal size hidden layer structure size of the kernal in each of the conv layer
strides hidden layer structure number of strides value in each conv layer
activation hidden layer structure activation function in each conv layer
dropout rate hidden layer structure dropout rate
n_fullly_con_layers full connection layer structure number of fully connected layers
units_fcl full connection layer structure number of units in each fully connected layers
learning rate optimizer parameter learning rate for the optimizer
optimizer name model compilation parameter Optimizer used to compile the model

Code required is again pretty simple! Just feed in your data and few mandatory arguments to run the optimization. To look at the arguments and types you can refer here!

For ANN

from autoprototype.tf_keras import kerasopt
hpo = kerasopt(x_train,y_train,EPOCHS=10,classes=CLASSES)
trial , params , value = hpo.get_best_params(n_trials=n_trials)

Sample output:

Prototype Data Models rapidly Autoprototype

For CNN models, two other mandatory parameters(arch and input_shape) are required as follows:

hpo = kerasopt(x_train,y_train,EPOCHS=10,classes=120,
max_units_fcl=400, max_conv_filters=1000,
arch=“cnn”,input_shape=(128,128,3),steps_per_epoch=10)

Note: You are required to set arch= “cnn” to run the CNN optimization and input shape should be provided. 

Let’s see the sample output:

Prototype Data Models rapidly Autoprototype

You are now ready with your prototyped model!

The manual and tedious job of finding the best model structure and the hyperparameters is reduced to only a few minutes with this module! Just put your data in! We will suggest what is the best for you. Life’s easy?

Please follow the examples, for better understanding of the usage of the module! 

The source code is available as a public repository in the Ideas2IT main repository. We are open to suggestions and changes through PRs in the repository. The pip release can be found here.