Development and Release of Type4Py: Machine Learning-based Type Auto-completion for Python

Development and Release of Type4Py: Machine Learning-based Type Auto-completion for Python

Type4Py is an ML-based type auto-completion for Python. It assists developers to gradually add type annotations to their codebases and benefit from the advantages of static typing such as better code completion, early bug detection, program repair, and type inference.

Roadmap

Future work

VSCode Extension

The extension sends an opened Python source file to the server and receives a JSON response.

Releasing

The development environment is used to test, debug, and profile Type4Py’s server-side components before releasing new features/fixes into the production code

Kubernetes

Deploy ML applications using K8s

Containerzing ML applications

Containerization is packaging an application with all its dependencies and required run-time libraries

Dataset

The ManyTypes4Py dataset contains 5.2K Python projects and 4.2M type annotations.

Feature Extraction

Extract three kinds of type hints: identifiers, code context, and visible type hints (VTHs)

Implementation

The server-side components are all written in Python

Overview

There is a VSCode extension at the client-side (developers) and the Type4Py model and its pipeline are deployed on our servers.

Monitoring ML applications

Use Prometheus to monitor your application.

Wrapping Up

The hope that this post was useful and you can now deploy your first ML model somewhere so that people can try it.

Acknowledgments

The Type4Py model, its pipeline, and VSCode extension are all designed and developed at the Software Analytics Lab of SERG, Delft University of Technology.

Deploying an ML model

High-level steps to deploy

Deployment

To deploy the Type4Py model for the production environment, we convert the pre-trained PyTorch model to an ONNX model which allows us to query the model on both GPUs and CPUs with very fast inference speed and lower VRAM consumption.

Model Architecture & Training

The Type4Py model consists of two RNNs with LSTM units, one for identifiers and another for code context, which are concatenated into a single vector, which is passed through a fully-connected linear layer, and the final linear layer maps the learned type annotation into a high-dimensional feature space, called Type Clusters.

Exporting ML models

Oftentimes, ML frameworks are optimized to speed up the model training, not prediction/inference

Creating a REST API to query a model

You need to create a tiny REST API with a prediction endpoint.

Next steps

To know more, there is a course by Andrew Ng on ML engineering for production, which you can take

Source

Get in