mxnet cpu memory leak when running inference on model

I'm running into a memory leak when performing inference on an mxnet model (i.e. converting an image buffer to tensor and running one forward pass through the model).

A minimal reproducable example is below:

import mxnet
from gluoncv import model_zoo
from gluoncv.data.transforms.presets import ssd

model = model_zoo.get_model('ssd_512_resnet50_v1_coco')
model.initialize()

for _ in range(100000):
  # note: an example imgbuf string is too long to post
  # see gist or use requests etc to obtain
  imgbuf = 
  ndarray = mxnet.image.imdecode(imgbuf, to_rgb=1)
  tensor, orig = ssd.transform_test(ndarray, 512)
  labels, confidences, bboxs = model.forward(tensor)

The result is a linear increase of RSS memory (from 700MB up to 10GB+).

The problem persists with other pretrained models and with a custom model that I am trying to use. And using garbage collectors does not show any increase in objects.

This gist has the full code snippet including an example imgbuf.

Environment info:

python 2.7.15

gcc 4.2.1

mxnet-mkl 1.3.1

gluoncv 0.3.0

Answers

MXNet is running a asynchronous engine to maximize parallelism and parallel executions of operators, that means that every call to enqueue operation / copy data returns eagerly and the operation is enqueued on the MXNet backend. Effectively by running the loop as you have written it, you are enqueueing operations faster than you are processing them.

You can add an explicit synchronization point, for example .asnumpy() or .mx.nd.waitall() or .wait_to_read(), that way MXNet will wait for the enqueued operations to be completed before continuing the python execution.

This will solve your issue:

import mxnet
from gluoncv import model_zoo
from gluoncv.data.transforms.presets import ssd

model = model_zoo.get_model('ssd_512_resnet50_v1_coco')
model.initialize()

for _ in range(100000):
  # note: an example imgbuf string is too long to post
  # see gist or use requests etc to obtain
  imgbuf = 
  ndarray = mxnet.image.imdecode(imgbuf, to_rgb=1)
  tensor, orig = ssd.transform_test(ndarray, 512)
  labels, confidences, bboxs = model.forward(tensor)
  mx.nd.waitall()

Read more about MXNet asynchronous execution here: http://d2l.ai/chapter_computational-performance/async-computation.html

Posted on by Thomas