# Accelerating KBY-AI SDKs with Kubernetes Configuration

If you are using a `Kubernetes` configuration, you can send multiple requests in parallel and receive responses simultaneously. This approach significantly reduces `API` response time and optimizes performance efficiently.

To validate the performance, we tested the[ KBY-AI ID Document Liveness SDK ](/help/product/id-document-liveness-detection-sdk.md)by measuring the response time when sending multiple requests in parallel.

## Creating EKS Cluster

`EKS` stands for `Elastic Container Service` for `Kubernetes`. It's a managed container service offered by `Amazon Web Services (AWS)` that allows users to run `Kubernetes` without having to manage the underlying infrastructure.

It simplifies deploying, managing, and scaling containerized applications using Kubernetes on AWS infrastructure.

`AWS` provides official documentation for creating an `EKS` cluster. You can follow their step-by-step guide to create `EKS` cluster on `AWS` console.

<figure><img src="/files/hq3qjPdi9lwrYSLMWts6" alt=""><figcaption><p>EKS cluster dashboard on AWS console</p></figcaption></figure>

## Adding Node Group To Cluster

Once you have created the `EKS` cluster, you need to add a node group to it.

We added a `node` group with `20` nodes to the cluster to measure the response time of the [KBY-AI ID Document Liveness Detection SDK](https://kby-ai.com/id-document-liveness-detection-sdk/) under multi-threading conditions. Each `node` was configured with `2` `CPU` cores and `8GB` of `RAM`.

<figure><img src="/files/6KSzvOxw7uoT6pgqMEX0" alt=""><figcaption><p>Node group dashboard on AWS</p></figcaption></figure>

We allocated `CPU`, `RAM`, and `pods` to each `node` as shown in the diagram below.

<figure><img src="/files/rLRCKyCOcaju3tmP53L0" alt=""><figcaption><p>Capacity allocation on node</p></figcaption></figure>

## Preparing Python Script To Measure API Response Time

We provided a `Python` script to measure `API` response time when sending `1,000` requests simultaneously using a `Base64` image.

To run the script, you need to prepare the `Base64` image data (`base64.txt` file).

```python
import requests
import time
from concurrent.futures import ThreadPoolExecutor

# API Endpoint
api_url = "http://ad90b13a512d447b38c87a1570b3660c-657463739.us-east-1.elb.amazonaws.com:80/process_image_base64"
# api_url = "http://44.218.215.247:9001/process_image_base64"

# Load Base64 Data from File
base64_file = "base64.txt"

try:
    with open(base64_file, "r") as file:
        base64_data = file.read().strip()  # Read and remove any extra spaces/newlines
except FileNotFoundError:
    print(f"Error: {base64_file} not found.")
    exit(1)

# Example Payload
payload = {
    "base64": base64_data
}

# Number of concurrent threads
num_threads = 1000

# Function to make an API call
def call_api(thread_id):
    start_time = time.time()
    try:
        response = requests.post(api_url, json=payload, timeout=1000)  # Set timeout
        elapsed_time = time.time() - start_time
        print(f"Thread {thread_id}: {response.status_code} - {response.text[:100]} (Time: {elapsed_time:.2f}s)")
    except requests.exceptions.RequestException as e:
        print(f"Thread {thread_id}: Request failed - {e}")

# Execute concurrent requests
if __name__ == "__main__":
    start_time = time.time()
    with ThreadPoolExecutor(max_workers=num_threads) as executor:
        futures = [executor.submit(call_api, i) for i in range(num_threads)]
        for future in futures:
            future.result()  # Ensure all tasks complete
    total_time = time.time() - start_time
    print(f"Total Execution Time: {total_time:.2f}s")
```

## Result

We ran the `Python` script against the `API` from both the `EKS` cluster and an `EC2` instance to compare the response time between sending requests simultaneously and sending them serially from the `EC2` instance.

It took `1,381.77` seconds to receive all `1,000` responses through the `API` from the `EC2` instance when sending requests sequentially.

<figure><img src="/files/QThxVBum0MyMZ9xqomIk" alt=""><figcaption><p>ID document liveness detecton API response time on EC2 instance</p></figcaption></figure>

In contrast, it took `57.25` seconds to receive all `1,000` responses through the `API` from the `EKS` `Kubernetes` cluster when sending all requests simultaneously.

<figure><img src="/files/ItcnqRfW8BuY0Pgz3oUn" alt=""><figcaption><p>ID document liveness detection response time on EKS cluster</p></figcaption></figure>

As you can see, deploying the [KBY-AI ID Document Liveness Detection SDK](https://kby-ai.com/id-document-liveness-detection-sdk/) to an `EKS` cluster with a `node` group significantly reduced `API` response time.

To learn more about accelerating our `SDK`s, please [contact us](https://kby-ai.com/contact-us/).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.kby-ai.com/help/faq/accelerating-kby-ai-sdks-with-kubernetes-configuration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
