# Accelerating KBY-AI SDKs with Kubernetes Configuration

If you are using a `Kubernetes` configuration, you can send multiple requests in parallel and receive responses simultaneously. This approach significantly reduces `API` response time and optimizes performance efficiently.

To validate the performance, we tested the[ KBY-AI ID Document Liveness SDK ](https://docs.kby-ai.com/help/product/id-document-liveness-detection-sdk)by measuring the response time when sending multiple requests in parallel.

## Creating EKS Cluster

`EKS` stands for `Elastic Container Service` for `Kubernetes`. It's a managed container service offered by `Amazon Web Services (AWS)` that allows users to run `Kubernetes` without having to manage the underlying infrastructure.

It simplifies deploying, managing, and scaling containerized applications using Kubernetes on AWS infrastructure.

`AWS` provides official documentation for creating an `EKS` cluster. You can follow their step-by-step guide to create `EKS` cluster on `AWS` console.

<figure><img src="https://2589216230-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1WwtQK0VFwKRGmIjGA3I%2Fuploads%2FTVxwMYMFL0RMwqG7MoAN%2Feks.png?alt=media&#x26;token=cacfee1b-326c-4802-a6a6-a70fbe68fda1" alt=""><figcaption><p>EKS cluster dashboard on AWS console</p></figcaption></figure>

## Adding Node Group To Cluster

Once you have created the `EKS` cluster, you need to add a node group to it.

We added a `node` group with `20` nodes to the cluster to measure the response time of the [KBY-AI ID Document Liveness Detection SDK](https://kby-ai.com/id-document-liveness-detection-sdk/) under multi-threading conditions. Each `node` was configured with `2` `CPU` cores and `8GB` of `RAM`.

<figure><img src="https://2589216230-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1WwtQK0VFwKRGmIjGA3I%2Fuploads%2FH1ZkmVBXbIxqWcJZYblm%2Fnode_group.png?alt=media&#x26;token=b6983178-d985-4677-9df8-4d9e38b80455" alt=""><figcaption><p>Node group dashboard on AWS</p></figcaption></figure>

We allocated `CPU`, `RAM`, and `pods` to each `node` as shown in the diagram below.

<figure><img src="https://2589216230-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1WwtQK0VFwKRGmIjGA3I%2Fuploads%2FiTNNVp3E38xlNfN64hd8%2Fnode_capacity.png?alt=media&#x26;token=741b3db9-c065-45a8-89f2-b667a6d5a57b" alt=""><figcaption><p>Capacity allocation on node</p></figcaption></figure>

## Preparing Python Script To Measure API Response Time

We provided a `Python` script to measure `API` response time when sending `1,000` requests simultaneously using a `Base64` image.

To run the script, you need to prepare the `Base64` image data (`base64.txt` file).

```python
import requests
import time
from concurrent.futures import ThreadPoolExecutor

# API Endpoint
api_url = "http://ad90b13a512d447b38c87a1570b3660c-657463739.us-east-1.elb.amazonaws.com:80/process_image_base64"
# api_url = "http://44.218.215.247:9001/process_image_base64"

# Load Base64 Data from File
base64_file = "base64.txt"

try:
    with open(base64_file, "r") as file:
        base64_data = file.read().strip()  # Read and remove any extra spaces/newlines
except FileNotFoundError:
    print(f"Error: {base64_file} not found.")
    exit(1)

# Example Payload
payload = {
    "base64": base64_data
}

# Number of concurrent threads
num_threads = 1000

# Function to make an API call
def call_api(thread_id):
    start_time = time.time()
    try:
        response = requests.post(api_url, json=payload, timeout=1000)  # Set timeout
        elapsed_time = time.time() - start_time
        print(f"Thread {thread_id}: {response.status_code} - {response.text[:100]} (Time: {elapsed_time:.2f}s)")
    except requests.exceptions.RequestException as e:
        print(f"Thread {thread_id}: Request failed - {e}")

# Execute concurrent requests
if __name__ == "__main__":
    start_time = time.time()
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        futures = [executor.submit(call_api, i) for i in range(num_threads)]
        for future in futures:
            future.result()  # Ensure all tasks complete
    total_time = time.time() - start_time
    print(f"Total Execution Time: {total_time:.2f}s")
```

## Result

We ran the `Python` script against the `API` from both the `EKS` cluster and an `EC2` instance to compare the response time between sending requests simultaneously and sending them serially from the `EC2` instance.

It took `1,381.77` seconds to receive all `1,000` responses through the `API` from the `EC2` instance when sending requests sequentially.

<figure><img src="https://2589216230-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1WwtQK0VFwKRGmIjGA3I%2Fuploads%2FJJdytN1dTaHiSXkL0o4v%2Fec2_result.png?alt=media&#x26;token=ecdabd0b-5c31-441f-a853-ea1fddaa0807" alt=""><figcaption><p>ID document liveness detecton API response time on EC2 instance</p></figcaption></figure>

In contrast, it took `57.25` seconds to receive all `1,000` responses through the `API` from the `EKS` `Kubernetes` cluster when sending all requests simultaneously.

<figure><img src="https://2589216230-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1WwtQK0VFwKRGmIjGA3I%2Fuploads%2FeFtA9DaczBH78JfL1ubv%2Feks_result.png?alt=media&#x26;token=8117bb76-3d62-4083-8d78-56cc90a6c9ca" alt=""><figcaption><p>ID document liveness detection response time on EKS cluster</p></figcaption></figure>

As you can see, deploying the [KBY-AI ID Document Liveness Detection SDK](https://kby-ai.com/id-document-liveness-detection-sdk/) to an `EKS` cluster with a `node` group significantly reduced `API` response time.

To learn more about accelerating our `SDK`s, please [contact us](https://kby-ai.com/contact-us/).
