Deploying deep neural networks is an essential endeavor in ML system design. OpenVINO™, developed by Intel, is a toolkit designed specifically for this purpose.
It stands out for its ability to optimize and accelerate neural network performance on a range of Intel hardware. This makes it an invaluable tool for developers looking to deploy AI models more effectively.
OpenVINO™ supports many well-known model formats and offers tools for model optimization, performance tuning, and easy deployment.
In this tutorial, we’ll explore a core component of this toolkit: the OpenVINO™ Converter, introduced in the 2023.1 OpenVINO™ release. It was known as an optimizer, which is now considered a legacy.
Let’s get started.
OpenVINO™ Converter
The OpenVINO™ converter is a methodolgy that significantly enhances the performance of deep learning models. Unlike other optimizers, it explicitly targets Intel architectures (like all standard CPUs), ensuring that models are optimized in general and finely tuned to leverage the full capabilities of Intel processors.
According to my experiment, it gives an additional 22 to 47% improvement, depending on the CPU type. It’s something not trivial and worth checking.
It works on Intel hardware, which includes CPUs, integrated GPUs, and VPUs like the Intel Movidius Neural Compute Stick. However, it is essential for the CPU, as most cloud providers still use Intel processors for their dedicated server offerings. That’s opening the door for cheap optimization and inference on uncomplicated models.
The converter (or optimizer) achieves this by converting and compressing AI models into an intermediate representation (IR) format. This process includes crucial steps like layer fusion and horizontal fusion, which streamline the neural network for more efficient computation. Doing so ensures reduced latency and higher throughput for AI models, particularly crucial for real-time applications.
In the next section, we will download and install the toolkit.
How to install
From the 2022.1 release, we can install the OpenVINO™ Development Tools via PyPI.
# Installing OpenVINO™
# Make sure you got the newest version, after installation
pip install openvino
And that’s it. It was harder to install using some shell scripts in the past, but now it is just the pip install.
Convert to Intermediate Representation (IR)
Some general steps to use OpenVINO™ optimizer effectively:
- Model preparation: Ensure your model is in a supported format like TensorFlow, Caffe, or ONNX.
- Model conversion: Use the Model Converter API to convert your model to OpenVINO’s Intermediate Representation (IR) format. This creates
.xml
and.bin
files (the IR). - Model optimization (optional): Fine-tune the conversion process with flags for batch size, precision, and other model-specific optimizations.
- Verification: Test the IR model using OpenVINO’s Inference Engine to ensure it’s performing as expected.
Converting TensorFlow model
Here, we can add an example of converting MobileNet_v2 image classification model from TensorFlow Hub.
First, we need to install the dependencies:
# Install the dependencies
# Make sure to have the latest version of OpenVino
pip install tensorflow_hub tensorflow pillow numpy matplotlib
pip install "openvino>=2023.2.0"
Then, perform the conversion:
from pathlib import Path
import os
from urllib.request import urlretrieve
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
import tensorflow_hub as hub
import tensorflow as tf
import PIL
import numpy as np
import matplotlib.pyplot as plt
import openvino as ov
tf.get_logger().setLevel("ERROR")
IMAGE_SHAPE = (224, 224)
IMAGE_URL, IMAGE_PATH = "https://storage.googleapis.com/download.tensorflow.org/example_images/grace_hopper.jpg", "data/grace_hopper.jpg"
MODEL_URL, MODEL_PATH = "https://www.kaggle.com/models/google/mobilenet-v1/frameworks/tensorFlow2/variations/100-224-classification/versions/2", "models/mobilenet_v2_100_224.xml"
model = hub.KerasLayer(MODEL_URL, input_shape=IMAGE_SHAPE + (3,))
# Perform the actual conversion to IR format
if not Path(MODEL_PATH).exists():
converted_model = ov.convert_model(model)
ov.save_model(converted_model, MODEL_PATH)
And you should be able to see the .xml
and .bin
files locally.
What type of models can we convert
You can convert a significant number of architectures to the intermediate format. For some of them, it can be more challenging. For example, converting the PyTorch model will require first converting to ONNX and then IR. This tutorial will help you do that.
In general, most of Keras classification models can be easily converted. You can also try with:
- Semantic segmantation;
- ConvNeXt;
- YOLOv8;
- Speech recognition models (wav2vec2);
- TensoFlow object detection models;
- TensorFlow Lite models;
- PaddlePaddle models;
- PyTorch models.
Case studies
In the real world, OpenVINO has shown remarkable success in various use cases.
One notable example I worked on is optimizing and serving a computer vision model for an e-commerce application. By leveraging the OpenVINO™ converter, the model’s inference speed increased significantly, allowing real-time analysis of customers behavior in online store.
The next step, after converting into the intermediate format, is the deployment. It can be very similar to deploying a model with TensorFlow serving, which we covered in another tutorial. I plan to make a dedicated article for OpenVINO™ serving, so stay tuned.
In conclusion, OpenVINO™ is a robust solution for optimizing and deploying deep learning models, offering significant performance improvements and broad applicability in real-world scenarios.