Introduction
In a world increasingly driven by visual content, the ability to seamlessly transition between images and numerical representations is more crucial than ever. Whether it’s for a semantic search platform to sift through vast datasets or finding similar images among a plethora of visuals, converting images to vectors is a fundamental step. This transformation, however, can be a daunting task especially when dealing with an entire folder of images.
In this tutorial, we unfold an effortless journey from images to vectors using Vector Forge, a library I created to simplify this process, coupled with the power of CLIP ViT-B/32 model from OpenAI. The tale begins with a folder bustling with images, each holding unseen patterns and information. Our quest is to unveil this hidden treasure by transforming these images into 512-sized vectors, ready to be harnessed for various applications including semantic search platforms and image similarity analyses.
Choosing the right model
The beauty of Vector Forge lies in its flexibility to work with different models, each catering to varied project needs. The model you choose for vectorization plays a crucial role as different models produce vectors of different sizes. For instance, in this tutorial, we are using the CLIP ViT-B/32 model which yields vectors of size 512. However, Vector Forge also supports other models like CLIP ViT-L/14 producing vectors of size 768, VGG16 and VGG19 each producing vectors of size 512, and Xception that produces vectors of size 2048. This versatility allows you to choose a model that aligns well with the requirements of your project, whether it’s the vector size or the model’s performance in handling text and/or image data. The ease of switching between models in Vector Forge makes it a powerful tool, adaptable to a wide range of vectorization tasks.
Preparing the environment
Before diving into the process of vectorizing images, it’s important to have a well-prepared environment. This ensures that all the required libraries are accessible, providing a smooth ride throughout our journey. Let’s start by setting up a virtual environment and then installing Vector Forge.
Setting up a virtual environment
A virtual environment is a way to keep dependencies required by different projects separate. It’s a good practice to create a virtual environment for each project to avoid potential conflicts between dependencies. Here’s how you can set it up:
- First, ensure you have Python 3.10 installed on your machine. You can download it from the official Python website.
- Open a terminal and run the following command to install
virtualenv
if you don’t have it installed yet:
pip install virtualenv
- Now, navigate to the directory where you want to create your project, and run the following command to create a virtual environment named
venv
:
virtualenv -p python3.10 venv
- Activate the virtual environment. On macOS and Linux, use the following command:
source venv/bin/activate
- On Windows, use the following command:
venv\Scripts\activate
Now that the virtual environment is activated, all the packages installed will be confined to this environment, keeping your system tidy.
Installing Vector Forge
With our virtual environment ready, it’s time to install Vector Forge, the library that will assist us in vectorizing images.
- In the same terminal with the activated virtual environment, run the following command to install Vector Forge:
pip install vector_forge
And that’s it! With just a few commands, we now have a clean, isolated environment with Vector Forge installed, ready to take on the task of processing a folder full of images and transforming them into vectors. In the next section, we will delve deeper into the script that will carry out this task, exploring the code that will drive our image vectorization endeavor.
Getting to know the script
Now that our environment is set up, it’s time to delve into the script that will be doing the heavy lifting of processing our folder of images and converting them into vectors. This script is the core of our project, and understanding its structure and functionality is crucial. Let’s break down the main components of the script to get a better understanding of how it operates.
Understanding the arguments
The script is designed to be run from the command line, and it takes two arguments:
- The path to the folder containing the images to be processed (
-i
or--input
). - The path to the output CSV file where the vector data will be saved (
-o
or--output
).
These arguments are specified using Python’s argparse
module, which makes it easy to write user-friendly command-line interfaces. The argparse
module also generates help and usage messages and issues errors when users give the program invalid arguments.
Here’s a snippet from the script illustrating the argument parsing section:
# Define the argument parser to read the command line arguments parser = argparse.ArgumentParser(description='Process images in a folder and save the vectors to a CSV file.') parser.add_argument("-i", "--input", type=str, help='The path to the folder containing the images to process.') parser.add_argument("-o", "--output", type=str, help='The path to the output CSV file.') # Parse the arguments args = vars(parser.parse_args())
With these arguments, users can easily specify the input folder and output file path when running the script, making it flexible and easy to use.
Diving into the process_images
Function
The process_images
function is the heart of our script. It takes in the folder path and the output file path as arguments, processes the images in the specified folder, and writes the vector data to the output CSV file.
Here’s a step-by-step breakdown of what happens inside the process_images
function:
- Creating the Resize Function:
- A lambda function
resize_fn
is created to resize images to a width of 500 pixels using theresize_image
function from Vector Forge.# Create the resize function with appropriate width resize_fn = lambda img: resize_image(img, width=500)
- A lambda function
- Initializing the Vectorizer:
- The
Vectorizer
instance is created with the CLIP model, the custom image preprocessor, and normalization enabled.# Initialize the vectorizer with the CLIP model, custom image preprocessor, and normalization enabled vectorizer = Vectorizer(model=Models.CLIP_B_P32, image_preprocessor=resize_fn, normalization=True)
- The
- Opening the Output File:
- The output CSV file is opened in write mode, and a CSV writer is created to write data to the file.
# Open the output file in write mode with open(output_file, mode='w', newline='') as file: writer = csv.writer(file)
- The output CSV file is opened in write mode, and a CSV writer is created to write data to the file.
- Processing Images:
- The
load_from_folder
method of theVectorizer
instance is used to process each image in the specified folder. - The vector and file information for each image are extracted, and the data is written to the CSV file.
-
# Iterate through each image in the specified folder # Extract the vector and file information using the built-in load_from_folder method for vector_str, file_info in vectorizer.load_from_folder(folder_path, return_type="str", file_info_extractor=get_file_info): file_name, file_size = file_info["file_name"], file_info["file_size"] # Extract file info print(f"[INFO] {file_name} is processed") writer.writerow([file_name, file_size, vector_str]) # Write the information to the CSV file
This function encapsulates the entire process of reading images from a folder, vectorizing them, and saving the vector data to a CSV file. By modularizing this process into a single function, the script remains organized, easy to understand, and easy to use.
- The
The journey of transforming images into meaningful vectors begins with organizing your images effectively. A well-structured folder not only makes the process smooth but also ensures that every image is accounted for during the vectorization process. In this section, we will explore how to organize your images for processing.
Organizing your images for processing
Before running the script, it’s essential to have your images neatly organized in a folder. This organization facilitates a trouble-free process as the script will traverse through the specified folder to find and process each image. Here’s a simple structure that you could follow:
tree --dirsfirst . ├── images │ ├── image-1.jpg │ ├── image-2.jpg │ └── image-3.jpg └── process_folder.py
In this structure:
- All the images to be processed are placed within a folder named
images
. - The script
process_folder.py
is located in the processing directory.
Putting everything together
This section will provide you with the full script for your reference, followed by step-by-step instructions on how to run it to process a folder of images and convert them into vectors.
import argparse import csv from vector_forge import Vectorizer, Models from vector_forge.image_preprocessors import resize_image from vector_forge.info_extractors import get_file_info def process_images(folder_path: str, output_file: str): # Create the resize function with appropriate width resize_fn = lambda img: resize_image(img, width=500) # Initialize the vectorizer with the CLIP model, custom image preprocessor, and normalization enabled vectorizer = Vectorizer(model=Models.CLIP_B_P32, image_preprocessor=resize_fn, normalization=True) # Open the output file in write mode with open(output_file, mode='w', newline='') as file: writer = csv.writer(file) # Iterate through each image in the specified folder # Extract the vector and file information using the built-in load_from_folder method for vector_str, file_info in vectorizer.load_from_folder(folder_path, return_type="str", file_info_extractor=get_file_info): file_name, file_size = file_info["file_name"], file_info["file_size"] # Extract file info print(f"[INFO] {file_name} is processed") writer.writerow([file_name, file_size, vector_str]) # Write the information to the CSV file # Define the argument parser to read the command line arguments parser = argparse.ArgumentParser(description='Process images in a folder and save the vectors to a CSV file.') parser.add_argument("-i", "--input", type=str, help='The path to the folder containing the images to process.') parser.add_argument("-o", "--output", type=str, help='The path to the output CSV file.') # Parse the arguments args = vars(parser.parse_args()) # Call the process_images function with the provided arguments process_images(args["input"], args["output"]) print(f"[INFO] all images processed. Result file: {args['output']}")
And to run the script on your folder:
python process_folder.py -i images/ -o result.csv
A note on performance
As with many robust tools, Vector Forge has its own set of performance characteristics that are worth understanding to ensure a smooth and efficient operation. In this section, we’ll discuss the initial startup time and the warming up phase which are inherent to Vector Forge’s operation.
Initial startup time of Vector Forge
When you run the script for the first time, you might notice a delay before the processing of images begins. This delay is due to the initial startup time of Vector Forge, which is spent on downloading the model specified for vectorization. The models, like CLIP ViT-B/32 or others, are crucial for the vectorization process, and they need to be downloaded and loaded into memory before the vectorization can commence.
This download is a one-time operation; once the model is downloaded, it’s cached on your machine for future use. Therefore, the next time you run the script or any other script that utilizes the same model with Vector Forge, this delay won’t occur, leading to a much quicker startup.
Understanding the warming up phase
Apart from the initial startup time, there’s a warming up phase where the model is loaded into memory and gets ready for inference. Although the actual inference—where images are processed and converted into vectors—is fast, this warming up phase might add to the time it takes for the script to run, especially the first time you run it.
The warming up phase is a common characteristic of machine learning models, ensuring that everything is in place for efficient and accurate vectorization. Once the model is warmed up, the process of iterating through the folder and vectorizing the images is swift.
Understanding these performance aspects helps set the right expectations regarding the script’s execution time and ensures a smoother user experience. With the knowledge of what happens behind the scenes, you can better plan your vectorization tasks and manage your time effectively.
A glimpse into the output
Having traversed the path from organizing images to processing them into vectors, it’s now time to peek into the bounty of our endeavor – the output CSV file. This file is the tangible result of our script’s execution, housing the vectorized data of our images. Let’s explore the structure of the produced CSV file and glean some insights into the vectorized data.
Structure of the produced CSV file
The CSV file generated as the output is structured in a way to encapsulate essential information about each processed image alongside its vectorized data. Here’s a snapshot of how the data is organized within the CSV file:
file_name, file_size, vector_data image-1.jpg, 34567, [0.0345, 0.1234, ..., 0.4567] image-2.jpg, 45678, [0.0456, 0.2345, ..., 0.5678] ...
- file_name: This column holds the name of the image file.
- file_size: This column records the size of the image file in bytes.
- vector_data: This column stores the vectorized data of the image, represented as a list of numbers.
Each row corresponds to an image from the input folder, capturing its name, size, and the precious vector data that encapsulates the image’s essence in a format ready for numerous applications like similarity search, machine learning, and more.
Insights into the vectorized data
The vector data is the heart of the output, embodying the essence of each image in a numerical format. Each vector is a list of numbers, with its length corresponding to the size of the vectors produced by the chosen model (e.g., 512 for the CLIP ViT-B/32 model).
These vectors are more than just strings of numbers; they are high-dimensional representations that capture the underlying patterns and features of the images. With the right tools, like similarity measures or machine learning algorithms, these vectors can be leveraged to uncover relationships between images, categorize them, or even use them as input for further processing in machine learning pipelines.
The journey from raw images to insightful vectors is a transformative one. As you delve into the vectorized data, you’re stepping into a realm where images transcend their visual form, morphing into numerical avatars ready to tell tales of their inherent features and similarities. This glimpse into the output is your first step into unraveling the stories that lie within your data, setting the stage for deeper exploration and discovery in your future projects.
Future work
The voyage from images to vectors is just the beginning of a myriad of possibilities. With a CSV file brimming with vector data, you’re now standing at the gateway to a realm of exciting explorations. In this section, we will briefly touch upon the prospects of inserting this vector data into a PostgreSQL database and dabbling in vector operations using the pg-vector extension.
Preview of inserting the data into a PostgreSQL database
With the vector data snugly stored in a CSV file, the next logical step could be to insert this data into a PostgreSQL database. Storing vector data in a database like PostgreSQL not only provides a structured and secure storage solution but also opens the door to powerful database operations.
The process of inserting the data could be as simple as creating a table with appropriate columns for file name, file size, and vector data, and then using a SQL COPY
command or a script to load the data from the CSV file into the database.
This setup lays a solid foundation for harnessing the full power of vector data by leveraging SQL’s querying capabilities, making data retrieval, analysis, and manipulation a breeze.
Introduction to vector Operations with pg-vector extension
Once your vector data is nestled within a PostgreSQL database, the real fun begins. By employing the pg-vector extension, you can perform a variety of vector operations right within the database.
The pg-vector extension is tailored for handling vector data, providing functionalities like similarity search, nearest neighbor search, and many other vector operations which are indispensable for tasks like image similarity, semantic search, and more.
With the pg-vector extension, operations that would typically require extracting data and processing it externally can now be performed right within the database, saving time and resources.
In future tutorials, we will delve deeper into how to set up the pg-vector extension, insert vector data into a PostgreSQL database, and perform various vector operations. These adventures will further extend the capabilities of your projects, leading to more robust and efficient systems.
Conclusion
Embarking on the journey from images to vectors, we’ve navigated through setting up the environment, understanding the script, harnessing the power of Vector Forge, and eventually, witnessing the transformation of images into meaningful vectors. The output CSV file we have now is a reservoir of vectorized data, ready to be channeled into many applications such as semantic search platforms, image similarity tasks, and much more.
The ease and efficiency of this process are largely attributed to Vector Forge, a tool that simplifies the vectorization of images. Our script, too, played a pivotal role in orchestrating this transformation seamlessly. The structure of our folder, the choice of vectorization model, and the organization of the script all converged to make this endeavor a success.
As we peered into the produced CSV file, we saw a glimpse of the potential that this vectorized data holds. The vectors are not just numerical representations; they are the essence of the images, ready to partake in further computational adventures. The prospect of inserting this data into a PostgreSQL database and exploring vector operations with the pg-vector extension unveils a horizon of possibilities awaiting exploration.
This venture has laid a robust foundation for future tutorials where we will delve deeper into utilizing the vector data, performing vector operations in a database environment, and exploring more advanced use cases. The realms of image vectorization and analysis are vast, and what we’ve explored is just the tip of the iceberg. As you step into the exciting domain of vector operations and analysis, a universe of discovery and innovation unfolds.
With the knowledge and tools at your disposal, you’re well-equipped to further explore, experiment, and innovate. The road from images to insightful vector data is a rewarding one, filled with learning, exploration, and the potential to unlock new avenues in your projects and research. As you venture forth, may the vectors be with you!
References
- Vector Forge Library:
- Models Used:
- OpenAI’s CLIP Models: CLIP: Connecting Text and Images Using Contrastive Learning
- Database and Vector Operations:
- PostgreSQL: Official Website
- pg-vector extension: GitHub Repository
- Further Reading:
These references provide a foundation for understanding the tools and techniques employed in this tutorial, and serve as a gateway for those keen on diving deeper into the realms of image vectorization, database management for vector data, and further analysis and utilization of vectorized data.
Citation
Emanuilov, S. “Converting Images to Vectors using Vector Forge and CLIP”, UnfoldAI, 2023, https://unfoldai.com/images-to-vectors-using-vector-forge-and-clip/
@incollection{Emanuilov_2023_ImagesToVectors, author = {Simeon Emanuilov}, title = {Converting Images to Vectors using Vector Forge and CLIP}, booktitle = {UnfoldAI}, year = {2023}, url = {https://unfoldai.com/images-to-vectors-using-vector-forge-and-clip/}, }