Tuesday, October 31, 2017

A trick in evaluate image using CNTK | 一個使用CNTK評估圖片的技巧

Most of time when you search for CNTK's example, they are training and evaluating model using bulk data, but in some production field, we need evaluate one or few image from real time generated data, how does CNTK framework work on it? Let's find out.
通常你搜尋CNTK的範例,它們都在用大量資料訓練跟驗證模型,但是在一些實際使用情況下,我們需要評估即時產生的一張或幾張圖片時,CNTK框架可以表現得如何呢? 讓我們來看看。

I writed some example for compare evaluate performance between from C#/CNTK, Python/CNTK and Python/OpenCv+DNN, full solution put on github CNTK Evaluate Performance Test with detail description, so I will skip the code detail focus on others.
我寫了一些範例來比較C#/CNTK, Python/CNTK 跟 Python/OpenCv+DNN 之間的驗證效率,完整專案放在github上 CNTK Evaluate Performance Test 還有各種細節描述,所以我這邊就會跳過程式細節的部份來說其他的。

Let's check the performance test result first, CNTK framework(both C# and Python) need over 1100ms to evaluate one image, but OpenCv+DNN only need 150ms to do same thing on same pc.

This result was weird, so I post new issue on CNTK, thank for the help of CNTK team, I know how to solve the program in few days.

At first,CNTK team recommend me try multiple evaluations and see if it gets faster, the result was here.

First test was same, second test extremely slow, but the rest was extremely fast. CNTK team explain how and why the result performance like that.

In CNTK we have autotuning for determining which convolution to be used during execution. This happens in the second minibatch.

Autotuning consumes extra memory, thus in the first minibatch we use zero/low memory convolution methods to make sure data buffers are allocated first. Then during the second minibatch we perform autotuning to find the fastest algorithm. Starting the third minibatch things will run very fast.

You may consider feeding some dummy data with same dimension as your input, and treat first two minibatches as model initialization.
(source) 這段我懶得翻啦

Overall, if we need using CNTK to evaluate things in real time, we need to add two dummy minibatch to warm up it, then the evaluation can run very fast.

BTW,if your system don't need VERY fast evaluation, you can try using CPU as device, on my environment it look like can avoid autotuning extra time, although the evaluate speed not so fast but maybe it can fit your need.
另外,如果你的系統不需要非常快速的評估,你可以試試改用CPU來算,在我的環境下看起來這樣可以避掉 autotuning 的額外時間,雖然速度沒那麼快不過也許可以應付你的需求。

Hope you enjoy it.

No comments:

Post a Comment