Tuesday, January 30, 2018

[C#] OpenCvSharp DNN with YOLO2

yolo
On my post "OpenCV DNN speed compare in Python, C#, C++", Blaise Thunderbytes asked me to implement pjreddie's YOLO with OpenCvSharp, so that was why this post came out :P

Since OpenCV 3.3.1, DNN moudle supported parse YOLO models, so we can easily using YOLO pre-trained model now. OpenCv Doc have a tutorial of YOLO object detection writed in C++, if you using C++ can check it, I will using C# with OpenCVSharp.


Better Speed

yolo4
Compare with my previous test, YOLOv2 544x544 was almost 2x faster than SSD 512x512 (1000ms vs 1900ms using CPU and OpenCvSharp), and faster than SSD with python almost half(1000ms vs 1500ms).


Let's look the code, because we using DNN module to load darknet model, so the code template was similar.

            var cfg = "yolo-voc.cfg";
            var model = "yolo-voc.weights"; //YOLOv2 544x544
            var threshold = 0.3;
We using YOLO2 voc 544x544 model.



            var blob = CvDnn.BlobFromImage(org, 1 / 255.0, new Size(544, 544), new Scalar(), true, false);
            var net = CvDnn.ReadNetFromDarknet(cfg, model);
            net.SetInput(blob, "data");
Setting blob, remember the parameter value are important, it make the result different.



const int prefix = 5;   //skip 0~4

for (int i = 0; i < prob.Rows; i++)
{
 var confidence = prob.At(i, 4);
 if (confidence > threshold)
 {
  //get classes probability
  Cv2.MinMaxLoc(prob.Row[i].ColRange(prefix, prob.Cols), out _, out Point max);
  var classes = max.X;
  var probability = prob.At(i, classes + prefix);

  if (probability > threshold) //more accuracy
  {
   //get center and width/height
   var centerX = prob.At(i, 0) * w;
   var centerY = prob.At(i, 1) * h;
   var width = prob.At(i, 2) * w;
   var height = prob.At(i, 3) * h;
   //label formating
   var label = $"{Labels[classes]} {probability * 100:0.00}%";
   Console.WriteLine($"confidence {confidence * 100:0.00}% {label}");
   var x1 = (centerX - width / 2) < 0 ? 0 : centerX - width / 2; //avoid left side over edge
   //draw result
   org.Rectangle(new Point(x1, centerY - height / 2), new Point(centerX + width / 2, centerY + height / 2), Colors[classes], 2);
   var textSize = Cv2.GetTextSize(label, HersheyFonts.HersheyTriplex, 0.5, 1, out var baseline);
   Cv2.Rectangle(org, new Rect(new Point(x1, centerY - height / 2 - textSize.Height - baseline),
     new Size(textSize.Width, textSize.Height + baseline)), Colors[classes], Cv2.FILLED);
   Cv2.PutText(org, label, new Point(x1, centerY - height / 2-baseline), HersheyFonts.HersheyTriplex, 0.5, Scalar.Black);
  }
 }
}
YOLO's output format was like this :
0,1 : Center of x, y
2,3 : Width, Height
4 : Confidence
rest : Individual class probability
In this case, VOC has 20 classes, so 5~24 are class probability.
After take few time to figure out it, the other part just draw the result like before.

BTW I add a IF CASE of (probability > threshold) to make result look better, if you don't do it, the result will look like this.
yolo1


And the final result was here.
yolo2

and other pictures.
yolo5

yolo3


The full code was here, or you can get it from github.

using System;
using System.Diagnostics;
using System.Linq;
using OpenCvSharp;
using OpenCvSharp.Dnn;

namespace OpenCvDnnYolo
{
    class Program
    {
        private static readonly string[] Labels = { "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor" };
        private static readonly Scalar[] Colors = Enumerable.Repeat(false, 20).Select(x => Scalar.RandomColor()).ToArray();
        static void Main()
        {
            var file = "bali.jpg";
            // https://pjreddie.com/darknet/yolo/
            var cfg = "yolo-voc.cfg";
            var model = "yolo-voc.weights"; //YOLOv2 544x544
            var threshold = 0.3;

            var org = Cv2.ImRead(file);
            var w = org.Width;
            var h = org.Height;
            //setting blob, parameter are important
            var blob = CvDnn.BlobFromImage(org, 1 / 255.0, new Size(544, 544), new Scalar(), true, false);
            var net = CvDnn.ReadNetFromDarknet(cfg, model);
            net.SetInput(blob, "data");

            Stopwatch sw = new Stopwatch();
            sw.Start();
            //forward model
            var prob = net.Forward();
            sw.Stop();
            Console.WriteLine($"Runtime:{sw.ElapsedMilliseconds} ms");

            /* YOLO2 VOC output
             0 1 : center                    2 3 : w/h
             4 : confidence                  5 ~24 : class probability */
            const int prefix = 5;   //skip 0~4

            for (int i = 0; i < prob.Rows; i++)
            {
                var confidence = prob.At(i, 4);
                if (confidence > threshold)
                {
                    //get classes probability
                    Cv2.MinMaxLoc(prob.Row[i].ColRange(prefix, prob.Cols), out _, out Point max);
                    var classes = max.X;
                    var probability = prob.At(i, classes + prefix);

                    if (probability > threshold) //more accuracy
                    {
                        //get center and width/height
                        var centerX = prob.At(i, 0) * w;
                        var centerY = prob.At(i, 1) * h;
                        var width = prob.At(i, 2) * w;
                        var height = prob.At(i, 3) * h;
                        //label formating
                        var label = $"{Labels[classes]} {probability * 100:0.00}%";
                        Console.WriteLine($"confidence {confidence * 100:0.00}% {label}");
                        var x1 = (centerX - width / 2) < 0 ? 0 : centerX - width / 2; //avoid left side over edge
                        //draw result
                        org.Rectangle(new Point(x1, centerY - height / 2), new Point(centerX + width / 2, centerY + height / 2), Colors[classes], 2);
                        var textSize = Cv2.GetTextSize(label, HersheyFonts.HersheyTriplex, 0.5, 1, out var baseline);
                        Cv2.Rectangle(org, new Rect(new Point(x1, centerY - height / 2 - textSize.Height - baseline),
                                new Size(textSize.Width, textSize.Height + baseline)), Colors[classes], Cv2.FILLED);
                        Cv2.PutText(org, label, new Point(x1, centerY - height / 2-baseline), HersheyFonts.HersheyTriplex, 0.5, Scalar.Black);
                    }
                }
            }
            using (new Window("died.tw", org))
            {
                Cv2.WaitKey();
            }
        }
    }
}

Hope you enjoy it.


Seems lots ppl can't get right yolo weight file after it upgrade to version 3, so I uploaded my solution include weight file.
Download Here
It should be click and run, I hope :)

2019/1/10 Update: I have a new post of YOLO v3, you can try it if need.

18 comments:

  1. I got this exception on line 26 whit your code, do you know how could i solve it?
    OpenCvSharp.OpenCVException
    HResult=0x80131500
    Message=ifile.is_open()
    Source=OpenCvSharp
    StackTrace:
    at OpenCvSharp.NativeMethods.<>c.<.cctor>b__1579_0(ErrorCode status, String funcName, String errMsg, String fileName, Int32 line, IntPtr userdata)
    at OpenCvSharp.NativeMethods.dnn_readNetFromDarknet(String cfgFile, String darknetModel)
    at OpenCvSharp.Dnn.Net.ReadNetFromDarknet(String cfgFile, String darknetModel)
    at OpenCvSharp.Dnn.CvDnn.ReadNetFromDarknet(String cfgFile, String darknetModel)
    at OpenCvDnnYolo.Program.Main() in F:\Icarian\Escritorio\OpenCvSharpDnnYolo-master\OpenCvDnnYolo\Program.cs:line 26

    ReplyDelete
  2. Is there a way to only detect 1 kind of object, to get better FPS processing?

    ReplyDelete
    Replies
    1. as I know, retrain model with one class can get better speed, if using YOLO, you can try tiny YOLO, a lot faster but got lower mAP.

      Delete
  3. Hi again,

    Trying to get this working but getting this error "OpenCvSharp.OpenCVException: 'separator_index < line.size()'" at this line var net = CvDnn.ReadNetFromDarknet(cfg, model);

    I have tried a few yolov2 and yolov3 config and weight file with the same exception. Could you upload the exact files you used?

    ReplyDelete
    Replies
    1. I download yolov2 from https://pjreddie.com/darknet/yolo/ , you can try it.

      Delete
    2. Hi Derek,

      I uploaded my solution, you can download it from https://mega.nz/#!knhiwT4Z!aVqbGvDjl__wAPIcJVsq1CM8OhjhFPKHZv6aiaNOKUc

      Delete
  4. Cv2.MinMaxLoc(prob.Row[i].ColRange(prefix, prob.Cols), out _, out Point max); i get error that point is type but used like variable can you help me?

    ReplyDelete
    Replies
    1. Maybe your .net/c# version too low ?

      Delete
    2. Hey, just had this issue as well but this was fixed by declaring this variable outside of the line, like so:

      Point min;
      Point max;
      Cv2.MinMaxLoc(prob.Row[i].ColRange(prefix, prob.Cols), out min, out max);
      var classes = max.X;

      I'm actually having some trouble myself where everything is working mostly fine, however I'm getting an issue with

      org.Rectangle(new Point(x1, centerY - height / 2), new Point(centerX + width / 2, centerY + height / 2), Colors[classes], 2); with the error ArgumentException: right > left

      Any ideas? I'm using Unity for this implementation and it's working up until this line, going so far as correctly identifying the images in a Debug.Log, but failing to draw the rectangle


      Delete
    3. First point should be centerX-width/2 instead of x1

      Delete
    4. Tried that but sadly to no avail, been messing around fruitlessly but I can't seem to figure it out.

      Here's a snippet of my results, https://imgur.com/a/hcowSNY , it's my first time working with neural nets and I can't tell if these are off or correct (huge numbers for an image only around 700x400). If anyone could compare to their own results it would be much appreciated, using the picture horses.jpg for this one and it would be the first resultset.


      Delete
    5. I've fixed the issue, in my version of C# I was required to make casts to several data types which I naively assumed were integers. I downloaded your project and adjusted my types to be the same as yours and she works perfectly!

      Delete
  5. Anyone have any tips on using this for YOLO3? I did some research and I think it's necessary to rebuild the OpenCV dll's in OpenCVSharp

    ReplyDelete
  6. Hi Died , could u share how to use opencvsharp GPU Accelerated Computing ?
    I tried , but catch a error message " cannot use cuda model "

    ReplyDelete
    Replies
    1. Sorry for late replay.
      OpenCvSharp3 can enable GPU but you have to build it myself, it's hard unless I never success lol.
      then at OpenCvSharp4, it only support CPU.

      Delete