Skip to content

zerohertzLib.mlops.server

Classes:

Name Description
BaseTritonPythonModel

Triton Inference Server에서 Python backend 사용을 위한 class

BaseTritonPythonModel

Bases: ABC

Triton Inference Server에서 Python backend 사용을 위한 class

Note

Abstract Base Class: Model의 추론을 수행하는 abstract method _inference 정의 후 사용

Examples:

model.py:

class TritonPythonModel(zz.mlops.BaseTritonPythonModel):
    def initialize(self, args: dict[str, str]) -> None:
        super().initialize(args)
        self.model = Model(cfg)

    def _inference(input) -> tuple[Any]:
        return self.model(input)

Normal Logs (Without Batching):

2025-09-25 16:06:51.904 | INFO     | zerohertzLib.mlops.triton:initialize:* - Initialize: {
    "name": "...",
    "platform": "",
    "backend": "python",
    "runtime": "",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 0,
...
2025-09-25 16:22:48.226 | INFO     | zerohertzLib.mlops.triton:execute:* - Called
2025-09-25 16:22:48.234 | DEBUG    | zerohertzLib.mlops.triton:_get_inputs:* - inputs: images=(2078, 1470, 3)
2025-09-25 16:22:48.234 | INFO     | zerohertzLib.mlops.triton:execute:* - Inference start
2025-09-25 16:22:49.026 | INFO     | zerohertzLib.mlops.triton:execute:* - Inference completed (0.79s)
2025-09-25 16:22:49.026 | DEBUG    | zerohertzLib.mlops.triton:_set_outputs:* - outputs: boxes=(12, 4), scores=(12,), labels=(12,)

Normal Logs (With Batching):

2025-11-07 08:36:52.242 | INFO     | zerohertzLib.mlops.triton:execute:* - Called
2025-11-07 08:36:52.276 | DEBUG    | zerohertzLib.mlops.triton:_get_inputs:* - inputs: images=(5, 3000, 3000, 3)
2025-11-07 08:36:52.276 | INFO     | zerohertzLib.mlops.triton:execute:* - Inference start
2025-11-07 08:36:54.091 | INFO     | zerohertzLib.mlops.triton:execute:* - Inference completed (1.81s)
2025-11-07 08:36:54.092 | DEBUG    | zerohertzLib.mlops.triton:_set_outputs:* - outputs (0 ~ 1): bboxes=(235, 4, 2), (293, 4, 2), texts=(235,), (293,), scores=(235,), (293,), batch_index=(235,), (293,)
2025-11-07 08:36:54.092 | DEBUG    | zerohertzLib.mlops.triton:_set_outputs:* - outputs (2 ~ 4): bboxes=(293, 4, 2), (46, 4, 2), (235, 4, 2), texts=(293,), (46,), (235,), scores=(293,), (46,), (235,), batch_index=(293,), (46,), (235,)

Error Logs:

2025-09-25 16:26:32.004 | ERROR    | zerohertzLib.mlops.triton:execute:* - zerohertzLib!
Traceback (most recent call last):
> File "/usr/local/lib/python3.10/dist-packages/zerohertzLib/mlops/triton.py", line 371, in execute
    outputs = self._inference(**inputs)
            |    |            -> {'images': array([[[ 38,  38,  38],
            |    |                       [ 37,  37,  37],
            |    |                       [ 37,  37,  37],
            |    |                       ...,
            |    |                       [255, 255, 255],
            |    |                ...
            |    -> <function TritonPythonModel._inference at 0x7f106f48f400>
            -> <1.model.TritonPythonModel object at 0x7f121fa1f010>

File "/models/docling_layout_old_static/1/model.py", line 34, in _inference
    raise Exception("zerohertzLib!")
Exception: zerohertzLib!

Methods:

Name Description
execute

Triton Inference Server 호출 시 수행되는 method

finalize

Triton Inference Server 종료 시 수행되는 method

initialize

Triton Inference Server 시작 시 수행되는 method

_get_inputs

_get_inputs(requests: list[Any]) -> tuple[dict[str, NDArray[DTypeLike]], list[int]]
Source code in zerohertzLib/mlops/server.py
def _get_inputs(
    self, requests: list[Any]
) -> tuple[dict[str, NDArray[DTypeLike]], list[int]]:
    batch_index = [0]
    _inputs = defaultdict(list)
    for request in requests:
        for index, cfg_input in enumerate(self.cfg["input"]):
            value = pb_utils.get_input_tensor_by_name(
                request, cfg_input["name"]
            ).as_numpy()
            if index == 0 and 0 < self.max_batch_size:
                batch_index.append(batch_index[-1] + value.shape[0])
            _inputs[cfg_input["name"]].append(value)
    inputs = {}
    for key, value in _inputs.items():
        inputs[key] = np.concatenate(value, axis=0)
    logger.debug(
        "inputs: "
        + ", ".join([f"{key}={value.shape}" for key, value in inputs.items()])
    )
    return inputs, batch_index

_inference abstractmethod

_inference(**inputs: NDArray[DTypeLike]) -> Any | tuple[Any]

Model 추론을 수행하는 private method (상속을 통한 재정의 필수)

Parameters:

Name Type Description Default
inputs NDArray[DTypeLike]

Model 추론 시 사용될 입력 (config.pbtxt 의 입력에 따라 입력 결정)

{}

Returns:

Type Description
Any | tuple[Any]

Model의 추론 결과

Source code in zerohertzLib/mlops/server.py
@abstractmethod
def _inference(self, **inputs: NDArray[DTypeLike]) -> Any | tuple[Any]:
    """
    Model 추론을 수행하는 private method (상속을 통한 재정의 필수)

    Args:
        inputs: Model 추론 시 사용될 입력 (`config.pbtxt` 의 입력에 따라 입력 결정)

    Returns:
        Model의 추론 결과
    """
    pass

_set_outputs

_set_outputs(outputs: tuple[Any], batch_index: list[int]) -> list[Any]
Source code in zerohertzLib/mlops/server.py
def _set_outputs(self, outputs: tuple[Any], batch_index: list[int]) -> list[Any]:
    responses = []
    if 0 < self.max_batch_size:
        for index in range(len(batch_index) - 1):
            batch_tensors = defaultdict(list)
            for batch in range(batch_index[index], batch_index[index + 1]):
                for cfg_output, value in zip(self.cfg["output"], outputs):
                    _value = value[batch]
                    if cfg_output["name"] == "batch_index":
                        _value -= batch_index[index]
                    batch_tensors[cfg_output["name"]].append(_value)
            output_tensors = []
            for cfg_output in self.cfg["output"]:
                value = np.concatenate(batch_tensors[cfg_output["name"]], axis=0)
                output_tensors.append(
                    pb_utils.Tensor(
                        cfg_output["name"],
                        value.astype(
                            pb_utils.triton_string_to_numpy(cfg_output["data_type"])
                        ),
                    )
                )
            responses.append(
                pb_utils.InferenceResponse(output_tensors=output_tensors)
            )
            logger.debug(
                f"outputs ({batch_index[index]} ~ {batch_index[index + 1] - 1}): "
                + ", ".join(
                    [
                        f"{key}="
                        + ", ".join([f"{_value.shape}" for _value in value])
                        for key, value in batch_tensors.items()
                    ]
                )
            )
        return responses
    output_tensors = []
    for cfg_output, value in zip(self.cfg["output"], outputs):
        output_tensors.append(
            pb_utils.Tensor(
                cfg_output["name"],
                value.astype(
                    pb_utils.triton_string_to_numpy(cfg_output["data_type"])
                ),
            )
        )
    responses.append(pb_utils.InferenceResponse(output_tensors=output_tensors))
    logger.debug(
        "outputs: "
        + ", ".join(
            [
                f"""{key["name"]}={value.shape}"""
                for key, value in zip(self.cfg["output"], outputs)
            ]
        )
    )
    return responses

execute

execute(requests: list[Any]) -> list[Any]

Triton Inference Server 호출 시 수행되는 method

Parameters:

Name Type Description Default
requests list[Any]

Client에서 전송된 model inputs

required

Returns:

Type Description
list[Any]

Client에 응답할 model의 추론 결과

Source code in zerohertzLib/mlops/server.py
def execute(self, requests: list[Any]) -> list[Any]:
    """Triton Inference Server 호출 시 수행되는 method

    Args:
        requests: Client에서 전송된 model inputs

    Returns:
        Client에 응답할 model의 추론 결과
    """
    logger.info("Called")
    try:
        inputs, batch_index = self._get_inputs(requests=requests)
        logger.info("Inference start")
        start = time.time()
        outputs = self._inference(**inputs)
        end = time.time()
        logger.info(f"Inference completed ({end - start:.2f}s)")
        if not isinstance(outputs, tuple):
            outputs = tuple([outputs])
        responses = self._set_outputs(outputs=outputs, batch_index=batch_index)
    except Exception as exc:
        logger.exception(exc)
        responses = [
            pb_utils.InferenceResponse(
                output_tensors=[], error=pb_utils.TritonError(exc)
            )
            for _ in requests
        ]
    return responses

finalize

finalize() -> None

Triton Inference Server 종료 시 수행되는 method

Source code in zerohertzLib/mlops/server.py
def finalize(self) -> None:
    """Triton Inference Server 종료 시 수행되는 method"""
    logger.info("Finalize")

initialize

initialize(args: dict[str, str]) -> None

Triton Inference Server 시작 시 수행되는 method

Parameters:

Name Type Description Default
args dict[str, str]

config.pbtxt 에 포함된 model의 정보

required
Source code in zerohertzLib/mlops/server.py
def initialize(self, args: dict[str, str]) -> None:
    """Triton Inference Server 시작 시 수행되는 method

    Args:
        args: `config.pbtxt` 에 포함된 model의 정보
    """
    self.cfg = json.loads(args["model_config"])
    logger.info(f"Initialize: {json.dumps(self.cfg, indent=4)}")
    self.device = "cpu"
    device = args.get("model_instance_device_id", None)
    if device is not None:
        self.device = f"cuda:{device}"
    self.max_batch_size = self.cfg.get("max_batch_size", 0)