The problem: we want to take a picture of a document with our phones, and send it privately to a recipient. With the current article, we are trying to solve the current issues, which can appear in real life:
- taking a picture of a document can be messy — not all is reading fine, there is a bunch of not needed data;
- the picture can be captured via a third party (like a hosting, image sharing service, etc.). We want all to be private;
- we want to have a fast way to show the image in a browser, directly converted from the ciphertext.
Is this a real-life problem? taking a picture of an image and “repair” the perspective is not something new, but it is something that appears very frequently in the real wife. It is a common thing — someone to ask: “Please scan this contract and send it to me today” or “Please scan this declaration and send it to me in an hour”, but what we do if we don’t have a scanner? Of course — take a picture with our phone and send a messy picture. We then have a response — “Hey, do you have a scanner? I don’t want to have your shoes on the picture” and so on.
How we will solve the problem — of course with Math. We will use some steps to go throw the process. We will use detect angles, change of perspective, and crypto libraries to hide the important information and easily transmit.
What libraries we will use — we will use and combine various type of libraries:
- numpy — as a standard in Python numerical operations
- OpenCV — image processing in Python;
- blurhash-python — to get a placeholder of the image. It is not very useful for documents, but maybe we will want to extend this private sharing image service in feature. If we do so — placeholders will be very, very important. The reader can view only the initial preview, but don’t have a way to see the whole picture if he doesn’t have a password.
- imutils — a series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, displaying Matplotlib images, sorting contours, detecting edges, and much easier with OpenCV.
- pyaes — a pure-Python implementation of the AES (FIPS-197) block-cipher algorithm and common modes of operation (CBC, CFB, CTR, ECB, OFB) with no dependencies beyond standard Python libraries. See README.md for API reference and details.
- pbkdf2 — password-based key derivation function, PBKDF2, specified in RSA PKCS#5 v2.0.
What cryptographic algorithms we will use for securing data?
- base64 — for converting images to text and do so in a reverse way. It is useful for sending a picture to browsers;
- blurhash — for generating placeholders of the image;
- AES — for the text (base64) encryption;
Change of perspective in math
Why changing the perspective is so important in this text? In the current article, we will take a look at the practical change of perspective. But getting deeper into this concept and mathematical fundamentals is crucial to understand the whole picture.
Here you can find an article, which gives a deep overview of mathematics, behind the perspective transformation: https://www.math.utah.edu/~treiberg/Perspect/Perspect.htm Those mathematical ideas that occur in art and computer graphics.
This question prompted the development of a new subject, projective geometry whose exponent was Girard Desargues (1591–1661).
Parallel transformation of points
The perspective transformations that describe how a point in three space is mapped to the drawing plane can be simply explained using elementary geometry. We begin by setting up coordinates. A projection involves two coordinate systems. A point in the coordinate system of an object to be drawn is given by X=(x, y, z) and the corresponding in the imaging system (on the drawing plane) is P=(u, v). If we use the standard right handed system, then x and y correspond to width and depth and z corresponds to height. On the drawing plane, we let u be the horizontal variable and v the vertical.
We can measure the distances between pairs of points in the usual way using the Euclidean metric.
If
and
and so on, then:
The projection from X to P is called a parallel projection if all sets of parallel lines in the object are mapped to parallel lines on the drawing. Such a mapping is given by an affine transformation, which is of the form
where T is a fixed vector in the plane and A is a 3 x 2 constant matrix. Parallel projection has the further property that ratios are preserved. That is if X (1, 2, 3, 4) are collinear points in the object, then the ratio of distances is preserved under parallel projection
Of course denominators are assumed to be nonzero.
Full process
Step 0. Requirements
It is always a pain to start a Python scripts, when you don’t know the required libraries and version. That’s way I create a requirements.txt file:
opencv-python==4.2.0.34 numpy==1.16.4 imutils==0.5.3 blurhash-python==1.0.2 pyaes==1.6.1 pbkdf2==1.3
Step 1. Read the image
At this stage we need to make the imports, we will use them further in this article. Please don’t forget to make the imports for everything to work as expected. Also, we define some of the functions, which will be useful for use in the future. Those are basic operations with OpenCV, which can be repeated many times and it is good practice to have them in functions (like read_image, show_image_opencv, save_image_opencv, etc). We also make a function get_current_dir, which can help us if we don’t know current dir, or we want to include the image from a different location.
Please keep in mind, that for *nix systems (like Mac), show_image_opencv can not work very well. It can “freeze” in the part of destroyAllWindows();
We read our input file, called bulsatcom.png, which is placed in the same directory as the course project files. Then we can make a variable holding the input file + one copy.
# import required libraries import os import numpy as np import cv2 import imutils import base64 import blurhash import pyaes, pbkdf2, binascii, secrets # Constants FILES_DIR = "resources" # Where we will put the temp files def get_current_dir(is_print=False): """ Helper function to get current dir, in case you need it to make full path to image """ if is_print: # Print the path print(os.getcwd()) else: # Return it for function usage return os.getcwd() def read_image(input_image): """ Function to read an image and return OpenCV object """ try: # Read the image with OpenCV image = cv2.imread(input_image) except AttributeError: print(f"Your input file '{input_image}' doesn't seems to be a valid.") except: print("Unknown error, sorry.") return image def show_image_opencv(image_instance, name="Image in OpenCV"): """ Function to show an image in OpenCV popup. It is possible to have some problems in *nix systems. """ try: cv2.imshow(name, image_instance) cv2.waitKey(0) cv2.destroyAllWindows() except: print("Unknown error, sorry.") def save_image_opencv(image_instance, target_name=os.path.join(FILES_DIR, "result.jpg")): """ Save a file from OpenCV image instance. """ try: cv2.imwrite(target_name, image_instance) except: print(f"Unknown error, sorry. Your provided instance: {image_instance} with target: {target_name}") # Get the input image in OpenCV object input_image = read_image(os.path.join(FILES_DIR, "bulsatcom.png")) # Make a copy of the image original_image = input_image.copy() # Save the image, even there is no big sense doing so in current stage (only to show it like expected result) save_image_opencv(input_image, os.path.join(FILES_DIR, "input_image.jpg"))
Original file:
The expected result on this step: We now have the OpenCV object, holding the image. We also have a copy of the image in input_image.png
Step 2. Identify the edges
Every image has some noise and our goal in this step is to perform a cleaning. One of the approaches for doing so is to convert the colored imaged into a gray one. After that, we apply a blur function to blur the image with (3, 3) filter. Blurring reduces any high-frequency noise and makes detection of contours easier.
We have only one function here detect_edges, it accepts the input image and returns an instance with edges.
Maybe the most interesting part here is the Canny Edge Detection. Canny Edge Detection is a popular edge detection algorithm. It was developed by John F. Canny in 1986. It is a multi-stage algorithm and the steps in short are:
- Noise Reduction;
- Finding Intensity Gradient of the Image;
- Non-maximum Suppression;
- Hysteresis Thresholding.
So what we finally get is strong edges in the image. The first argument is the image instance (already gray and blurred), second and third arguments are our minVal and maxVal respectively.
def detect_edges(input_image): """ Function to return an edged image from input of normal OpenCV image """ # Convert the image to gray scale # On that way we should be able to remove color noise # https://docs.opencv.org/2.4/modules/imgproc/doc/miscellaneous_transformations.html#cvtcolor gray_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY) # Blur the image to remove the high frequency noise # This will help us with the task for find and detect contours in the gray image (we made above) # https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_filtering/py_filtering.html#averaging gray_image_blured = cv2.blur(gray_image, (3, 3)) # Perform Canny edge detection # https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_canny/py_canny.html edged_image = cv2.Canny(gray_image_blured, 100, 400, 3) return edged_image # Use our function and perform edge detection to input image edged_image = detect_edges(input_image) # Saving the image in order to show it below for demo purposes save_image_opencv(edged_image, os.path.join(FILES_DIR, "edged_image.jpg"))
The expected result on this step: We have only one function here, but a very important one. We perform some cleaning of the noise in the image, applying filters.
Additional methods, articles & approches for edge detection:
Step 3. Detect document edges in the image
One of the most interesting parts is to find the contours in the image. It is also a challenge (but very important) to find out the contour with the highest area. On that, we will exclude some big letters or images inside the paper. We only need the largest are, a.k.a the whole document.
We make a function calculate_draw_contours where we use some of the functions, built-in OpenCV, like findContours. This function returns
def calculate_draw_contours(edged_image, target_image): """ Function to calculate and draw the contours. """ # Find the contours # https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html#findcontours all_contours = cv2.findContours(edged_image.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) all_contours = imutils.grab_contours(all_contours) # Make sort by contourArea and get the largest element. Sort in reverse. # https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html#contourarea all_contours = sorted(all_contours, key=cv2.contourArea, reverse=True)[:1] # Calculates a contour perimeter or a curve length. # https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html#arclength contour_perimeter = cv2.arcLength(all_contours[0], True) # Approximates a polygonal curve(s) with the specified precision. # https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html#approxpolydp approximated_poly = cv2.approxPolyDP(all_contours[0], 0.02 * contour_perimeter, True) # Draw the contours to the target image # https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin.html#how-to-draw-the-contours cv2.drawContours(target_image, [approximated_poly], -1, (0,255,0), 2) return approximated_poly, contour_perimeter # Use the function to draw our contours approximated_poly, contour_perimeter = calculate_draw_contours(edged_image, input_image) # Saving the image in order to show it below for demo purposes save_image_opencv(input_image, os.path.join(FILES_DIR, "contoured_image.jpg"))
The expected result on this step: We have contours of the image.
Step 4. Identify and extract document boundary/edges
This is one of the hardest moments in this article. We have the coordinates of all the corners of our document and it is crucial to arrange them and know which coordinate to correspond to a corner.
Images are composed of pixels. When we have a gray picture, we don’t have a depth of color, which is a dimension also. So we can work with such pictures in two dimensions — width and height.
# Reshape the coordinates array approximated_poly = approximated_poly.reshape(4, 2) # A list to hold coordinates rectangle = np.zeros((4, 2), dtype="float32") # Top left corner should contains the smallest sum, # Bottom right corner should contains the largest sum s = np.sum(approximated_poly, axis=1) rectangle[0] = approximated_poly[np.argmin(s)] rectangle[2] = approximated_poly[np.argmax(s)] # top-right will have smallest difference # botton left will have largest difference diff = np.diff(approximated_poly, axis=1) rectangle[1] = approximated_poly[np.argmin(diff)] rectangle[3] = approximated_poly[np.argmax(diff)] # Top left (tl), Top right (tr), Bottom right (br), Bottom left (bl) (tl, tr, br, bl) = rectangle def calculate_max_width_height(tl, tr, br, bl): """ Function to calculate max width and height. Accepting the coordinates. """ # Calculate width width_a = np.sqrt((tl[0] - tr[0])**2 + (tl[1] - tr[1])**2 ) width_b = np.sqrt((bl[0] - br[0])**2 + (bl[1] - br[1])**2 ) max_width = max(int(width_a), int(width_b)) # Calculate height height_a = np.sqrt((tl[0] - bl[0])**2 + (tl[1] - bl[1])**2 ) height_b = np.sqrt((tr[0] - br[0])**2 + (tr[1] - br[1])**2 ) max_height = max(int(height_a), int(height_b)) return max_width, max_height max_width, max_height = calculate_max_width_height(tl, tr, br, bl)
Step 5. Apply perspective transform
When we have the dimensions, we can construct the destination points. We can use getPerspectiveTransform function from OpenCV, which calculates a perspective transform from four pairs of the corresponding points. After that, we can use warpPerspective, which applies a perspective transformation to an image.
# Set of destinations points # Dimensions of the new image destinations = np.array([ [0,0], [max_width - 1, 0], [max_width - 1, max_height - 1], [0, max_height - 1]], dtype="float32") # Calculates a perspective transform from four pairs of the corresponding points. # https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html#getperspectivetransform transformation_matrix = cv2.getPerspectiveTransform(rectangle, destinations) def apply_transformation(image_instance, transformation_matrix, max_width, max_height): # Applies a perspective transformation to an image # https://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html#warpperspective scan = cv2.warpPerspective(image_instance, transformation_matrix, (max_width, max_height)) return scan # Apply the transformation from our function scanned_image = apply_transformation(original_image, transformation_matrix, max_width, max_height) # Save the temp files save_image_opencv(scanned_image, os.path.join(FILES_DIR, "scanned_image.jpg"))
The expected result on this step: An almost scanned image, which better perspective to show.
Step 6. Encode the image in base64
But what Is Base64?
Base64 is a way in which 8-bit binary data is encoded into a format that can be represented in 7 bits. This is done using only the characters A-Z, a-z, 0–9, +, and / in order to represent data, with = used to pad data. For instance, using this encoding, three 8-bit bytes are converted into four 7-bit bytes.
The term Base64 is taken from the Multipurpose Internet Mail Extensions (MIME) standard, which is widely used for HTTP and XML, and was originally developed for encoding email attachments for transmission.
Why do we use Base64?
Base64 is very important for binary data representation, such that it allows binary data to be represented in a way that looks and acts as plain text, which makes it more reliable to be stored in databases, sent in emails, or used in text-based format such as XML. Base64 is basically used for representing data in an ASCII string format.
Why we DON’T use Base64 everywhere?
It is good that Base64 can do some important things for us, but we must keep in mind that we should not use base64 for every place, especially in web development. Here you can find an interesting article about this.
def image_encode_base64(image_path): """ Function to convert the image to Base64. It will be necessairy to do so, if we want to achieve the image->text transformation """ try: image = open(image_path, 'rb') image_read = image.read() except: return f"Image path '{image_path}' is not correct or there is error in reading." return base64.encodebytes(image_read) def image_decode_base64(base64_image, target_file="result.jpg"): """ Reverse to the function above, it will save an image from Base64 string """ try: image_64_decode = base64.decodebytes(base64_image) image_result = open(target_file, 'wb') image_result.write(image_64_decode) except: return "Unknown error, sorry" return f"File ready: {target_file}" # Use our function to encode the image in Base64 encoded_image = image_encode_base64(os.path.join(FILES_DIR, "scanned_image.jpg")) print(encoded_image) # Save the result in file, if we want so image_decode_base64(base64_image=encoded_image, target_file=os.path.join(FILES_DIR, "base64_decoded.jpg"))
Step 7. Get also the blurhash value of the image
BlurHash is a compact representation of a placeholder for an image. I find it useful in projects, where I want to save bandwidth and show a placeholder until the image is actually loaded. Also, it can be a good fit for this article, as we can calculate the BlurHash value of a picture and store it in a DB. We can after that show “preview” in the browsers of users, which are not allowed to view the full picture/document.
It can be used for something like a secret variant of an image with some data on it, but not enough to read or identify patterns.
More links information about it
Step 8. Encrypt with AES
The example below will illustrate a simple password-based AES encryption (PBKDF2 + AES-CTR) without message authentication (unauthenticated encryption). I find this useful for this article, as we will want to encode the base64 equivalent of the image and make it “password protected”, without the ability someone to see the content, event he owns the servers, or read our message somehow.
Useful links for such operations: https://cryptobook.nakov.com/symmetric-key-ciphers/aes-encrypt-decrypt-examples
# Example pass to derive key password = "ToPs3cr3t*c0d3123" def derive_encryption_key(password): """ Derive a 256-bit AES encryption key from the password """ # Generate random salt passwordSalt = os.urandom(16) key = pbkdf2.PBKDF2(password, passwordSalt).read(32) aes_encryption_key = binascii.hexlify(key) print(f"AES encryption key: {aes_encryption_key}") return key # Check our function key = derive_encryption_key(password) # AES encryption key: b'5e201ffa89337ce2c13a9cc5d9185643dd72362d8150b2c177c83ec2d47f0081' def encrypt_text(key, text_to_encrypt): """ Encrypt the plaintext with the given key ciphertext = AES-256-CTR-Encrypt(plaintext, key, iv) """ # Random counter iv = secrets.randbits(256) # "Text for encryption" plaintext = text_to_encrypt aes = pyaes.AESModeOfOperationCTR(key, pyaes.Counter(iv)) ciphertext = aes.encrypt(plaintext) print(f"Encrypted: {binascii.hexlify(ciphertext)}") return ciphertext, iv # Use our function to base64 image ciphertext, iv = encrypt_text(key, encoded_image) def decrypt_text(key, iv, ciphertext): """ Decrypt the ciphertext with the given key: plaintext = AES-256-CTR-Decrypt(ciphertext, key, iv) """ aes = pyaes.AESModeOfOperationCTR(key, pyaes.Counter(iv)) decrypted = aes.decrypt(ciphertext) print(f"Decrypted: {decrypted}") return decrypted # Decrypt the image decrypted_text = decrypt_text(key, iv, ciphertext) # Save as a target file image_decode_base64(base64_image=decrypted_text, target_file=os.path.join(FILES_DIR, "result.jpg"))
Step 9. Send the cipher text and visualize in browsers
This step is optional and we are not going to go deep inside this topic. The idea is that when we have encrypted image + blurhash to show in the browser (short preview), the user with the password can encrypt the ciphertext and see the base64 string. He can also convert it to an image. It is very easy to make a JavaScript library, which accepts BlurHash value + ciphertext and after a successful password entry — it visualizes the base64 image (natively in HTML).
Example library, that can be used for such operations (AES decrypt in browser) can be found here: https://github.com/ricmoo/aes-js
Summary
What do we want to make in short in this article?
- take a picture of an image with our phone;
- repair the perspective to get almost scanned document;
- code it in base64;
- get the blurhash value;
- encrypt with AES;
- send the ciphertext;
- show a blurhash preview;
- decode in browsers with libraries available.
It will solve some problems with private document/picture sharing + repairing perspective of a picture of a document. We use various techniques to obtain this, this approaches can be easily made to an API, I tried to make it in the biggest part like functions, which can be transformed to endpoints.
Similar articles/researches
- https://github.com/AmeyCaps/Document_Scanner
- https://towardsdatascience.com/document-scanner-using-computer-vision-opencv-and-python-20b87b1cbb06
- https://github.com/andrewdcampbell/OpenCV-Document-Scanner
What we have more in this article?
- extend the idea with private document sharing with encryption methods;
- a descriptive explanation for functions, steps and math concepts;
- some tests of the functions, which will help us if something is broken in calculations.