Prototype Data Models Rapidly with Auto prototype: Our Newest Open Source Contribution

And am I so happy to blog about our latest open source contribution - Autoprototype.As data scientists, we always spend a few hours or sometimes a few days finding the best prototype model for our data! Also, the selection of the model hyperparameters is yet another important job and often requires minute attention to address overfitting and better learning of the model. The Fixit to this problem is here!!! Ideas2IT brings to you the automatic prototyping library “Autoprototype”. True to its name, this module automates:

the tedious rapid prototyping process for some given data and
selects for you the respective appropriate hyperparameters.

This is easy and requires only a few lines of code. Apart from the input data, only few other parameters are required by the module. All the processes in rapid prototyping are automated thereafter, through this module. The structure is based on some default values for parameters spaces required for the optimization. However, users would also have liberty to dynamically construct the search spaces for the hyperparameters.This module is a wrap around the popular Hyper-parameter Optimization Tool called Optuna. Optuna is used for the optimization process using iterative trails. This module takes the data as the primary input and suggests the user the model based on this Optuna trials. Optuna enables efficient hyperparameter optimization by adopting state-of-the-art algorithms for sampling hyper-parameters and pruning efficiently unpromising trials.We kept in mind some of the most important features of Optuna while constructing the library. Namely,

Efficient Optimization Algorithms : Optuna enables efficient hyperparameter optimization by adopting state-of-the-art algorithms for sampling hyperparameters and pruning efficiently unpromising trials.
Pythonic Search Space : Search spaces can be defined for all spaces including parameters of type : categorical, integer and floating point.
Lightweight and versatile : Optuna is entirely written in Python and has few dependencies and hence any real time data is easily applicable to this,

So let’s get started with the module!

Installation

This installation is fairly simple, and comes with fairly very small dependencies which are installed alongside the main installation. This command to install this package is:$pip install autoprototype

Supported Libraries

The first release of the library comes with support for all popular SKlearn models and Tensorflow Keras ANN and CNN models. In the tables below, are the listed model structure and hyperparameters which are available.SklearnModel StructureHyperparametersTypeDescriptionDecision Treemin_sample_leafmodel parametermin number of sample leafLogistic RegressionMax Iterationmodel parametermax iterations for convergence of the modelRidge Regressionalphamodel parameterpenalty /tuning parameterLasso Regressionalphamodel parameterpenalty /tuning parameterLinear RegressionNormalizedata parameterBoolean for normalizing the dataSVMcmodel parameterpenalty /tuning parameterRandom Forestrf_max_deptmodel parametermax depth of the RF treesn_estimatorsmodel parameternumber of estimator treesApart from the above parameters, a cross validation parameter “k” to determine the number of cross validation required is also suggested for each model. Let’s look at the code that is required!from autoprototype.sklearn import sklearnopthpo = sklearnopt(X_train,y_train)trial , params , value = hpo.get_best_params(n_trials=trials)That's it! It's just two lines of code. A sample output is given below:

Prototype Data Models rapidly Autoprototype

In the trial example, the best model is Logistic Regression, with maximum iterations as 1132 and has an accuracy score of 96.42% upon 8 fold cross validation.To run the examples follow the steps here.

Tensorflow Keras

ModelParametersTypeDescriptionANNn_layershidden layer structurenumber of hidden layersunitshidden layer structurenumber of units in each hidden layerdropout ratehidden layer structuredropout ratelearning rateoptimizer parameterlearning rate for the optimizeroptimizer namemodel compilation parameterOptimizer used to compile the modelCNNnconvhidden layer structurenumber of convolution layers(except the first layer)filtershidden layer structurefilters in each of these conv layerkernal sizehidden layer structuresize of the kernal in each of the conv layerstrideshidden layer structurenumber of strides value in each conv layeractivationhidden layer structureactivation function in each conv layerdropout ratehidden layer structuredropout raten_fullly_con_layersfull connection layer structurenumber of fully connected layersunits_fclfull connection layer structurenumber of units in each fully connected layerslearning rateoptimizer parameterlearning rate for the optimizeroptimizer namemodel compilation parameterOptimizer used to compile the modelThe code required is again pretty simple! Just feed in your data and few mandatory arguments to run the optimization. To look at the arguments and types you can refer here!

For ANN

from autoprototype.tf_keras import kerasopthpo = kerasopt(x_train,y_train,EPOCHS=10,classes=CLASSES)trial , params , value = hpo.get_best_params(n_trials=n_trials)Sample output:

For CNN models, two other mandatory parameters(arch and input_shape) are required as follows:hpo = kerasopt(x_train,y_train,EPOCHS=10,classes=120,max_units_fcl=400, max_conv_filters=1000,arch="cnn",input_shape=(128,128,3),steps_per_epoch=10)Note: You are required to set arch= “cnn” to run the CNN optimization and input shape should be provided. Let's see the sample output:

You are now ready with your prototyped model!The manual and tedious job of finding the best model structure and the hyperparameters is reduced to only a few minutes with this module! Just put your data in! We will suggest what is the best for you. Life’s easy?Please follow the examples, for better understanding of the usage of the module! The source code is available as a public repository in the Ideas2IT main repository. We are open to suggestions and changes through PRs in the repository. The pip release can be found here.About Ideas2IT,Are you looking to build a great product or service? Do you foresee technical challenges? If you answered yes to the above questions, then you must talk to us. We are a world-class custom .NET development company. We take up projects that are in our area of expertise. We know what we are good at and more importantly what we are not. We carefully choose projects where we strongly believe that we can add value. And not just in engineering but also in terms of how well we understand the domain. Book a free consultation with us today. Let’s work together.

Ideas2IT Team