Triton. Echo Json. Part 2

- 7 mins read

Series: Triton

Is Triton Just for Tensors?

In our last post, we made some great progress, learning how to set up and run a simple Triton server to pass a tensor back and forth. But what if a tensor isn’t enough?

As ML engineers, we often need to send more than just numerical arrays. We might want to send images, JSON objects, or audio files to our models. This brings up a big question: if Triton only deals in tensors, how do we handle all this other data?

That’s exactly what we’ll explore in this and the following posts. As a hands-on example, we’re going to build a Triton echo server that accepts a JSON and sends it right back, unchanged.

It’s All Just Bytes, Really

Here’s a key takeaway from our last post: if you can represent your data as a simple sequence of bytes, you can pass it to and from Triton. We’re going to use this principle to solve our JSON echoing problem.

The Client Side

Let’s start with our JSON. Can we represent it as bytes on the client side? The answer is a resounding “yes!”

# client-echo-json.py
from pathlib import Path
import json

# If JSON is stored in file
# JSON bytes are just file bytes:
with Path.open(Path("assets/dummy.json"), "rb") as f:
    data = f.read()

# If JSON is a Python object
# JSON bytes are just bytes of string
# which represents JSON
dymmy = {"a": 1, "b": 2}
dymmy_string = json.dumps(dymmy)
data = dymmy_string.encode()

Here data is a bytes object we’re going to send somehow to the Triton.

On the client side, the Triton library typically works with NumPy arrays to represent tensors. So, the question becomes: how do we convert our bytes object into a NumPy array that Triton can understand?

# client-echo-json.py
import numpy as np

# NOTE: pay attention to the data type!
np_data = np.frombuffer(data, dtype=np.uint8)

And that’s it! We’ve succesully converted our JSON to NumPy array!

Now, let’s prepare request to the Triton server with our custom data as a payload:

# client-echo-json.py
import tritonclient.grpc as grpcclient
import tritonclient.utils as utils

grpc_client = grpcclient.InferenceServerClient(url="localhost:8001", verbose=False)
model_name = "echo-json"

# Here we assume that model input name for out JSON data is "config:json"
inputs = [
    grpcclient.InferInput(
        "config:json", np_data.shape, utils.np_to_triton_dtype(np_data.dtype)
    )
]
inputs[0].set_data_from_numpy(np_data)

# Here we assume that model output name is "response:json"
outputs = [
    grpcclient.InferRequestedOutput("response:json"),
]

# Making actual request
results = grpc_client.infer(model_name, inputs, outputs=outputs)

# Converting Trinton response to NumPy array back
response_json_arr = results.as_numpy("response:json")

# Getting response in JSON format
response_json = json.loads(response_json_arr.tobytes())

Putting it all together:

  1. Convert your JSON data into bytes.
  2. Create a NumPy array from those bytes, making sure to use the np.uint8 data type.
  3. Send a request to the echo-json model, specifying our input as config:json and our output as response:json.
  4. Convert the NumPy array in the response back into bytes so we can work with the original data again.

So, we have all necessary information to construct config.pbtxt file.

The config.pbtxt File

Now, let’s turn our attention to the config.pbtxt file, where we’ll define our model’s configuration and its input/output tensors.

We already have a model name, and we know our input and output tensors are of the bytes type, as we established on the client side. The only thing left is to define their shapes. This is simple for a typical NumPy array like np.array([ [1], [2] ], dtype=np.float32), where the shape is clearly [2, 1].

But what about a JSON object like {"a": 1, "b": 2}? How do we define its shape, especially since a JSON can hold arbitrary data and its size can change?

This is where Triton’s special [-1] shape comes in.

Here’s a tip: When you don’t know the exact number of elements for a certain dimension of a tensor, you can use -1 to represent an arbitrary size.

For example, if your tensor may have shapes [1, 3], [2, 3], [3, 3] and so on:

np_data = np.array(
    [
        [1, 2, 3],
    ],
    dtype=np.float32,
)

np_data = np.array(
    [
        [1, 2, 3],
        [4, 5, 6],
    ],
    dtype=np.float32,
)

np_data = np.array(
    [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9],
    ],
    dtype=np.float32,
)

you can use [-1, 3] shape in the config.pbtxt to cover all of these possibilities:

data_type: TYPE_FP32
dims: [ -1, 3 ]

Think about for a moment how can we use this feature of -1 size for a JSON case. For impatient readers here is the full listing of config.pbtxt file for echo-json model:

# config.pbtxt
name: "echo-json"
max_batch_size: 0
backend: "python"

input [
    {
        name: "config:json"
        data_type: TYPE_UINT8
        dims: [ -1 ]
    }
]

output [
    {
        name: "response:json"
        data_type: TYPE_UINT8
        dims: [ -1 ]
    }
]

instance_group [
    {
      count: 1
      kind: KIND_CPU
    }
]

Using -1 in the shape of a tensor in the config.pbtxt file allows you to handle input and output data where the size is not fixed, which is exactly the case with a JSON object. Since a JSON can vary in size, you need a way to tell Triton to accept a variable-length byte array.

Here’s how it works for a JSON:

  • The entire JSON is converted into a single-dimensional array of bytes.
  • Since the number of bytes can change for each request, we use [-1] as the shape. This tells Triton to expect a single dimension of arbitrary size.

This feature is crucial because it allows Triton to handle a variety of data types, not just traditional tensors with fixed shapes, as long as they can be represented as bytes. For the echo-json model, this lets us describe the input and output in a flexible way.

The model.py File

Okay, we’ve prepared the client and the config.pbtxt file for our model. Now it’s time to build the model itself.

On the model side, the process is pretty straightforward:

  1. Receive the request.
  2. Extract the tensor using its name.
  3. Convert it into a NumPy array.
  4. Decode the JSON data from that array.

To create the response, we just reverse the process:

  1. Convert the JSON back into a NumPy array.
  2. Wrap the NumPy array into a Triton tensor, making sure to use the correct data type and name.
  3. Wrap the tensor in a Triton response object.

It’s a simple flow, and here’s how it looks in the code:

# model.py

import json
from typing import Dict, List

import numpy as np

import triton_python_backend_utils as pb_utils


class TritonPythonModel:
    def initialize(self, args: Dict[str, str]) -> None:
        # Here in the `initialize()` we simply read "config.pbtxt" 
        # and get output tensor type
        self.model_config = model_config = json.loads(args["model_config"])

        output_config = pb_utils.get_output_config_by_name(
            model_config, "response:json"
        )
        self.output_dtype = pb_utils.triton_string_to_numpy(output_config["data_type"])

    def execute(
        self, requests: "List[pb_utils.InferenceRequest]"
    ) -> "List[pb_utils.InferenceResponse]":
        responses = []
        for request in requests:
            # Get input by name
            input_tensor = pb_utils.get_input_tensor_by_name(request, "config:json")
            input_arr: np.ndarray = input_tensor.as_numpy()
            # Load JSON object from bytes we sent from the client
            input_json = json.loads(input_arr.tobytes())

            # Got input json: `input_json`.
            # Process `input_json` here
            # [...]

            # Prepare response
            json_string = json.dumps(input_json)
            outputs = np.frombuffer(json_string.encode(), dtype=np.uint8)
            out_tensor = pb_utils.Tensor(
                "response:json", outputs.astype(self.output_dtype)
            )

            inference_response = pb_utils.InferenceResponse(output_tensors=[out_tensor])

            responses.append(inference_response)

        return responses

Running The Model

On the previous post we’ve already covered in details how to start the Triton server and make request to it. Start the Triton server:

docker run --gpus=1 --rm --net=host -v ${PWD}/models:/models nvcr.io/nvidia/tritonserver:24.08-py3 tritonserver --model-repository=/models

Wait for a bit for Triton to start the model, and then make a request:

python client-echo-json.py

As the result, you shoul get the same JSON as you’ve sent to the server.

For full code of the echo-json example, please visit https://github.com/yevhen-k/triton-tutorials and check the following files and directories:

  • client-echo-json.py
  • models/echo-json

Home Assignment

Here are some exercises to help you solidify what you’ve learned. Feel free to experiment with the code we’ve built and see what happens:

  1. Change data type: is it good idea to pass JSONs as float32?
  2. Experiment with input/output shapes
    1. What happens if you use a fixed, but arbitrary, dimension instead of [-1]?
    2. If you know the exact length of your input and output tensors ahead of time, how can you hardcode them instead of using [-1]?
  3. Try modifying the JSON data within your model.py script before sending it back.
  4. Is it possible to make a model that returns two JSONs? How would you implement that on both the server and client sides?

Wrapping It Up

Today, we figured out how to send and receive arbitrary data types to and from the Triton server, using JSONs as our example.

The core idea is to use a dimension size of [-1] in your config.pbtxt when you don’t know the exact number of elements in a tensor ahead of time. Converting your data into a Triton tensor is much simpler than it sounds: you just need to turn the data into a byte array and then wrap those bytes in a NumPy array.