Accelerating KBY-AI SDKs with Kubernetes Configuration

KBY-AI’s server SDKs can run on a Kubernetes configuration to enable acceleration and handle multiple requests efficiently

If you are using a Kubernetes configuration, you can send multiple requests in parallel and receive responses simultaneously. This approach significantly reduces API response time and optimizes performance efficiently.

To validate the performance, we tested the KBY-AI ID Document Liveness SDK by measuring the response time when sending multiple requests in parallel.

Creating EKS Cluster

EKS stands for Elastic Container Service for Kubernetes. It's a managed container service offered by Amazon Web Services (AWS) that allows users to run Kubernetes without having to manage the underlying infrastructure.

It simplifies deploying, managing, and scaling containerized applications using Kubernetes on AWS infrastructure.

AWS provides official documentation for creating an EKS cluster. You can follow their step-by-step guide to create EKS cluster on AWS console.

EKS cluster dashboard on AWS console

Adding Node Group To Cluster

Once you have created the EKS cluster, you need to add a node group to it.

We added a node group with 20 nodes to the cluster to measure the response time of the KBY-AI ID Document Liveness Detection SDK under multi-threading conditions. Each node was configured with 2 CPU cores and 8GB of RAM.

Node group dashboard on AWS

We allocated CPU, RAM, and pods to each node as shown in the diagram below.

Capacity allocation on node

Preparing Python Script To Measure API Response Time

We provided a Python script to measure API response time when sending 1,000 requests simultaneously using a Base64 image.

To run the script, you need to prepare the Base64 image data (base64.txt file).

import requests
import time
from concurrent.futures import ThreadPoolExecutor

# API Endpoint
api_url = "http://ad90b13a512d447b38c87a1570b3660c-657463739.us-east-1.elb.amazonaws.com:80/process_image_base64"
# api_url = "http://44.218.215.247:9001/process_image_base64"

# Load Base64 Data from File
base64_file = "base64.txt"

try:
    with open(base64_file, "r") as file:
        base64_data = file.read().strip()  # Read and remove any extra spaces/newlines
except FileNotFoundError:
    print(f"Error: {base64_file} not found.")
    exit(1)

# Example Payload
payload = {
    "base64": base64_data
}

# Number of concurrent threads
num_threads = 1000

# Function to make an API call
def call_api(thread_id):
    start_time = time.time()
    try:
        response = requests.post(api_url, json=payload, timeout=1000)  # Set timeout
        elapsed_time = time.time() - start_time
        print(f"Thread {thread_id}: {response.status_code} - {response.text[:100]} (Time: {elapsed_time:.2f}s)")
    except requests.exceptions.RequestException as e:
        print(f"Thread {thread_id}: Request failed - {e}")

# Execute concurrent requests
if __name__ == "__main__":
    start_time = time.time()
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        futures = [executor.submit(call_api, i) for i in range(num_threads)]
        for future in futures:
            future.result()  # Ensure all tasks complete
    total_time = time.time() - start_time
    print(f"Total Execution Time: {total_time:.2f}s")

Result

We ran the Python script against the API from both the EKS cluster and an EC2 instance to compare the response time between sending requests simultaneously and sending them serially from the EC2 instance.

It took 1,381.77 seconds to receive all 1,000 responses through the API from the EC2 instance when sending requests sequentially.

ID document liveness detecton API response time on EC2 instance

In contrast, it took 57.25 seconds to receive all 1,000 responses through the API from the EKS Kubernetes cluster when sending all requests simultaneously.

ID document liveness detection response time on EKS cluster

As you can see, deploying the KBY-AI ID Document Liveness Detection SDK to an EKS cluster with a node group significantly reduced API response time.

To learn more about accelerating our SDKs, please contact us.

Last updated