TL;DR: Give Me The Code
For full code of the process-image example, visit https://github.com/yevhen-k/triton-tutorials and check the following files and directories:
client-process-image.pymodels/process-image
Introduction
In this post, we’re going to explore how to send and receive images using Triton.
In Python and deep learning, images are primarily represented in two ways:
- Bytes: The raw, encoded data of an image (e.g., a
.jpgor.pngfile). - Tensors: The decoded image data, often a NumPy array or an OpenCV
cv::Matobject, where the image is a multi-dimensional array of pixel values.
Depending on which representation you use, you’ll need to configure your Triton server and client differently.
To represent an image as bytes and wrap it into a NumPy array, you can use the following code:
from pathlib import Path
import numpy as np
with Path.open(Path("assets/bus.jpg"), "rb") as f:
data = f.read()
np_data = np.frombuffer(data, dtype=np.uint8)
To represent an image as tensor, you can use:
from pathlib import Path
import cv2
np_data = cv2.imread("assets/bus.jpg")
What do We Need to Process Images?
To process images in Python, we typically rely on libraries like OpenCV, Pillow, and Scikit-Image. However, none of these libraries are included in the official Triton server Docker image. If you run the command below, you’ll see a list of the pre-installed packages, none of which are for image processing.
docker run nvcr.io/nvidia/tritonserver:24.08-py3 pip3 freeze
Preparation
Since the official Triton Docker image doesn’t include image processing libraries, we need to extend it. To do this, we’ll create a Dockerfile and a requirements.txt file.
Dockerfile:
FROM nvcr.io/nvidia/tritonserver:24.08-py3
WORKDIR /requirements-dir
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
RUN apt-get update -y && apt-get install ffmpeg libsm6 libxext6 -y
COPY ./requirements.txt requirements.txt
RUN --mount=type=cache,target=/root/.cache python3 -m pip install -r requirements.txt
CMD ["bash"]
requirements.txt file:
rfdetr @ git+https://github.com/roboflow/rf-detr.git
torch==2.5.1
torchvision==0.20.1
opencv-python==4.10.0.84
We’ll install rfdetr, torch, and torchvision for future use, but for now, we only need the opencv-python package.
To build your docker image, use the following command:
REPO_ID=YOUR_DOCKERHUB_REPO_ID
IMAGE_NAME=triton-pytorch-rfdetr
docker build --rm -t ${IMAGE_NAME} .
docker tag ${IMAGE_NAME} ${REPO_ID}/${IMAGE_NAME}:24.08-py3
docker push ${REPO_ID}/${IMAGE_NAME}:24.08-py3
If you prefer to build the image yourself without publishing it, use the docker build and docker tag commands. Alternatively, you can use the pre-built image yevhenk10s/triton-pytorch-rfdetr:24.08-py3, which already has all the necessary packages for this project.
A word of caution: It’s always a good idea to be skeptical of random Docker images you find online. For a production environment, you should always build your own images from a trusted source. In simple words, do not trust random Docker images!
Echo Image Example
In this example, we’ll send an image as a byte array to the Triton server and get those same bytes back, unchanged.
We already know how to pass bytes to Triton from our Echo JSON post. We’ll use the same trick here with images:
- First, we’ll convert the image object into a byte array.
- Next, we’ll wrap those bytes into a NumPy array.
- Finally, we’ll pass that NumPy array to the Triton client.
The Client Side
On the client side, we’ll read an image as a byte array, send it to the Triton server, and expect the exact same byte array to be returned.
# client-echo-image.py
from pathlib import Path
import numpy as np
import tritonclient.grpc as grpcclient
import tritonclient.utils as utils
# 00 Read image bytes
with Path.open(Path("assets/bus.jpg"), "rb") as f:
data = f.read()
np_data = np.frombuffer(data, dtype=np.uint8)
# 01 Make a client
grpc_client = grpcclient.InferenceServerClient(url="localhost:8001", verbose=False)
model_name = "echo-image"
# 02 Prepare inputs/outputs
# Here we assume that model input name for the image is "in:jpg"
inputs = [
grpcclient.InferInput(
"in:jpg", np_data.shape, utils.np_to_triton_dtype(np_data.dtype)
)
]
inputs[0].set_data_from_numpy(np_data)
# Here we assume that model output name is "out:jpg"
outputs = [
grpcclient.InferRequestedOutput("out:jpg"),
]
# 03 Make actual request
results = grpc_client.infer(model_name, inputs, outputs=outputs)
# 04 Convert Trinton response to NumPy array back
image = results.as_numpy("out:jpg")
# 05 Now we have our image back
# Here we can do whatever we want with the image
with Path.open(Path("sample.jpg"), "wb") as f:
f.write(image.tobytes())
imageis a NumPy array containing bytes of encoded JPG image. This means we’re sending and receiving the raw bytes of the image file, not the decoded pixel data.
The config.pbtxt File
We’ve already established the model name and the input/output tensor names on the client side. Now, in the config.pbtxt file, we’ll configure these exact same parameters.
name: "echo-image"
max_batch_size: 0
backend: "python"
input [
{
name: "in:jpg"
data_type: TYPE_UINT8
dims: [ -1 ]
}
]
output [
{
name: "out:jpg"
data_type: TYPE_UINT8
dims: [ -1 ]
}
]
instance_group [
{
kind: KIND_CPU
}
]
This process should feel familiar, as it’s no different from what we did in the echo json example.
The model.py File
On the client side, we send the raw bytes of a JPG image. So, within the model’s execute() method, you’ll need to use cv2.imdecode() to convert those bytes into a usable format, like a NumPy array.
input_tensor = pb_utils.get_input_tensor_by_name(request, "in:jpg")
input_arr: np.ndarray = input_tensor.as_numpy()
image = cv2.imdecode(input_arr, cv2.IMREAD_UNCHANGED)
Now you can process the image array as you normally would. Once you’re done, you need to encode the NumPy array back into a JPG byte buffer.
ret, buf = cv2.imencode("image.jpg", image)
np_data = np.frombuffer(buf, dtype=np.uint8)
out_tensor = pb_utils.Tensor("out:jpg", np_data.astype(self.output_dtype))
With the byte buffer ready, you can now create the response object.
inference_response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
responses.append(inference_response)
Running The Model
Start the Triton server:
docker run --gpus=1 --rm --net=host -v ${PWD}/models:/models yevhenk10s/triton-pytorch-rfdetr:24.08-py3 tritonserver --model-repository=/models
Here we use custom Docker image we’ve made earlier to start the server.
Wait for a bit for Triton to start the model, and then make a request:
python client-echo-image.py
For the complete code of the echo-image example, you can visit my GitHub repository at https://github.com/yevhen-k/triton-tutorials. Check the following files and directories:
client-echo-image.pymodels/echo-image
Note: in this example we decode and encode image on the model side.
Process Image Example
In this example, we’ll send an image as bytes, just like before, but this time we’ll receive a resized image back in the form of a tensor. The image will be resized to a shape of [512, 512, 3]. This shows how you can use the Triton server for more than just model serving. It’s a great tool for any necessary data manipulation, including preprocessing, post-processing, and even data augmentation.
The Client Side
On the client side, the process is familiar. We’ll read the image, turn its bytes into a tensor, and send it to the Triton server. The only new step is that after receiving the processed image back, we must reshape it to the correct dimensions.
# client-process-image.py
from pathlib import Path
import numpy as np
import tritonclient.grpc as grpcclient
import tritonclient.utils as utils
# 00 Read image bytes
with Path.open(Path("assets/bus.jpg"), "rb") as f:
data = f.read()
np_data = np.frombuffer(data, dtype=np.uint8)
# 01 Make a client
grpc_client = grpcclient.InferenceServerClient(url="localhost:8001", verbose=False)
model_name = "process-image"
# 02 Prepare inputs/outputs
# Here we assume that model input name for the image is "in:jpg"
inputs = [
grpcclient.InferInput(
"in:jpg", np_data.shape, utils.np_to_triton_dtype(np_data.dtype)
)
]
inputs[0].set_data_from_numpy(np_data)
# Here we assume that model output name is "out:tensor"
outputs = [
grpcclient.InferRequestedOutput("out:tensor"),
]
# 03 Make actual request
results = grpc_client.infer(model_name, inputs, outputs=outputs)
# 04 Convert Trinton response to NumPy array back
image = results.as_numpy("out:tensor")
# 05 Reshape image array
response_shape = [512, 512, 3]
image = np.reshape(image, shape)
# 06 Now we can do with response
# whatever we want:
ok = cv2.imwrite("sample.jpg", image)
assert ok
The config.pbtxt File
Based on the model name, tensor names, and dimensions we can make a config.pbtxt:
name: "process-image"
max_batch_size: 0
backend: "python"
input [
{
name: "in:jpg"
data_type: TYPE_UINT8
dims: [ -1 ]
}
]
output [
{
name: "out:tensor"
data_type: TYPE_UINT8
dims: [512, 512, 3] # row, col, chan
}
]
instance_group [
{
kind: KIND_CPU
}
]
The model.py File
We’ve already set the desired dimensions in the config.pbtxt file, but how do we access those values within our Python model?
The key is to access the model configuration from the initialize() function:
class TritonPythonModel:
def initialize(self, args: Dict[str, str]) -> None:
self.model_config = json.loads(args["model_config"])
input_config = pb_utils.get_input_config_by_name(self.model_config, "in:jpg")
self.input_dtype = pb_utils.triton_string_to_numpy(input_config["data_type"])
output_config = pb_utils.get_output_config_by_name(
self.model_config, "out:tensor"
)
self.output_dtype = pb_utils.triton_string_to_numpy(output_config["data_type"])
self.output_shape = output_config["dims"]
Now, self.output_shape holds the target shape for our image. We can use this variable directly in our execute() function to resize the image.
# Get input tensor by name and decode it to the NumPy array:
input_tensor = pb_utils.get_input_tensor_by_name(request, "in:jpg")
input_arr: np.ndarray = input_tensor.as_numpy()
image = cv2.imdecode(input_arr, cv2.IMREAD_UNCHANGED)
# Process image to the target shape:
resized_image = cv2.resize(
image, (self.output_shape[1], self.output_shape[0])
)
# Prepare output
outputs = np.frombuffer(resized_image.tobytes(), dtype=np.uint8)
out_tensor = pb_utils.Tensor(
"out:tensor", outputs.astype(self.output_dtype)
)
inference_response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
responses.append(inference_response)
Running The Model
Start the Triton server:
docker run --gpus=1 --rm --net=host -v ${PWD}/models:/models yevhenk10s/triton-pytorch-rfdetr:24.08-py3 tritonserver --model-repository=/models
Here we use custom Docker image we’ve made earlier to start the server.
Wait for a bit for Triton to start the model, and then make a request:
python client-process-image.py
For full code of the process-image example, visit https://github.com/yevhen-k/triton-tutorials and check the following files and directories:
client-process-image.pymodels/process-image
Note: in this example we decode image on the model side. No encoding back to image is necessary in this example.
Home Assignment
Here are some challenges to help you practice what you’ve learned:
- Send an image as a tensor to the Triton server instead of a byte array.
- Consider the best way to split image decoding and encoding between the client and server. What factors should you think about?
- Access to specialized hardware: Is it better to decode on the client if it has a GPU, or on the server where the model is?
- Network bandwidth: Is it more efficient to send a smaller, compressed image (as bytes) or a larger, uncompressed tensor?
- Instead of hardcoding the target image size in the
config.pbtxtfile, try sending the target size in a JSON to the model.
Wrapping It Up
We made great progress today learning how to send and receive images with Triton. We saw that NumPy acts as a container for image bytes, while OpenCV is used to decode and encode the images.
A key takeaway is that to process images on the server, you need to install the necessary dependencies directly into your Triton Docker image.