Saturday, March 29, 2014

Performance with opencv and python, fps reduced

So, I decided to sharpen the knifes and get real data on what affects performance in running Behave.

The problem: 

A refactor in python code using opencv, made the program running slower.

The refactor consisted in transforming basic code using functions, "Functional Code":
https://github.com/Kieleth/behave/commit/fda558d61c11720d346a107c174a052f97c2ee3b

Into more abstracted "Code with classes":
https://github.com/Kieleth/behave/commit/503e698dbcbea2c7224fae72c12a94832552c606

Difference was observed from +-12fps to +-5fps. Functionality is the same.

The weapons:

- Python performance tips:
https://wiki.python.org/moin/PythonSpeed/PerformanceTips
- Python profiler:
http://docs.python.org/2/library/profile.html
- Opencv optimisation guide:
http://answers.opencv.org/question/755/object-detection-slow/#760

Process:

First of all was to create a branch from the functional code in git, this allows to explore both possibilities and move between them.

Now, lets get some data:

Functional code:

It runs smoothly at around 10fps.

Added a break in the code, that is triggered after 20 seconds of running the app:

profile:
python2.7-32 -m profile -s cumulative capturebasic.py
         16009 function calls (15918 primitive calls) in 16.698 seconds
   Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   16.698   16.698 profile:0(<code object <module> at 0x445a88, file "capturebasic.py", line 1>)
        1    0.041    0.041   16.697   16.697 capturebasic.py:1(<module>)
        1    0.253    0.253   16.465   16.465 capturebasic.py:10(launch)
      129    6.311    0.049    6.311    0.049 :0(waitKey)
      129    2.615    0.020    2.615    0.020 :0(imshow)
      129    2.554    0.020    2.554    0.020 :0(detectMultiScale)
      129    1.867    0.014    1.867    0.014 :0(read)
      129    1.124    0.009    1.124    0.009 :0(flip)
      129    0.688    0.005    0.688    0.005 :0(cvtColor)
      129    0.602    0.005    0.602    0.005 :0(equalizeHist)
        1    0.205    0.205    0.205    0.205 :0(VideoCapture)...
cProfile:
python2.7-32 -m cProfile -s cumulative capturebasic.py 
         16184 function calls (16093 primitive calls) in 20.487 seconds
   Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.044    0.044   20.488   20.488 capturebasic.py:1(<module>)
        1    0.073    0.073   20.331   20.331 capturebasic.py:10(launch)
      141    8.175    0.058    8.175    0.058 {cv2.waitKey}
      141    3.450    0.024    3.450    0.024 {cv2.imshow}
      141    2.565    0.018    2.565    0.018 {method 'detectMultiScale' of 'cv2.CascadeClassifier' objects}
      141    1.903    0.013    1.903    0.013 {method 'read' of 'cv2.VideoCapture' objects}
        1    1.309    1.309    1.309    1.309 {cv2.VideoCapture}
      141    1.219    0.009    1.219    0.009 {cv2.flip}
      141    0.592    0.004    0.592    0.004 {cv2.equalizeHist}
      141    0.565    0.004    0.565    0.004 {cv2.cvtColor}
        1    0.303    0.303    0.303    0.303 {method 'release' of 'cv2.VideoCapture' objects}
        1    0.115    0.115    0.115    0.115 {cv2.CascadeClassifier}
Found:


Added a break in the code that exits the program after 100 frames:

Using linux "time":
#3 passes: 
time python2.7-32 capturebasic.py 
real 0m15.977s,       real 0m15.411s,       real 0m15.398s


Code with classes:

Running sluggish at 5fps.

Break at 20 seconds of running the app:
python2.7-32 -m cProfile -s cumulative capturebasic.py
         13488 function calls (13397 primitive calls) in 21.944 seconds
   Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.160    0.160   21.946   21.946 capturebasic.py:1(<module>)
       52    0.001    0.000    8.801    0.169 classifiers.py:34(__init__)
       52    0.003    0.000    8.800    0.169 classifiers.py:4(__init__)
       52    8.797    0.169    8.797    0.169 {cv2.CascadeClassifier}
       52    0.001    0.000    5.382    0.104 gui.py:17(should_quit)
       52    5.382    0.103    5.382    0.103 {cv2.waitKey}
       52    1.768    0.034    1.768    0.034 {cv2.imshow}
       52    0.001    0.000    1.679    0.032 classifiers.py:21(detect_multiscale)
       52    1.678    0.032    1.678    0.032 {method 'detectMultiScale' of 'cv2.CascadeClassifier' objects}
        1    1.178    1.178    1.178    1.178 {cv2.VideoCapture}
       52    0.952    0.018    0.952    0.018 {method 'read' of 'cv2.VideoCapture' objects}
       52    0.002    0.000    0.838    0.016 utils.py:15(prepare_frame_for_detection)
       52    0.000    0.000    0.714    0.014 utils.py:11(flip_frame)
       52    0.714    0.014    0.714    0.014 {cv2.flip}
       52    0.423    0.008    0.423    0.008 {cv2.cvtColor}
       52    0.413    0.008    0.413    0.008 {cv2.equalizeHist}
        1    0.330    0.330    0.330    0.330 {method 'release' of 'cv2.VideoCapture' objects}
        1    0.003    0.003    0.107    0.107 __init__.py:106(<module>)
        1    0.000    0.000    0.050    0.050 add_newdocs.py:9(<module>)
        2    0.003    0.002    0.042    0.021 __init__.py:1(<module>)
        1    0.002    0.002    0.037    0.037 __init__.py:15(<module>)
        2    0.007    0.004    0.029    0.014 __init__.py:2(<module>)
        1    0.000    0.000    0.028    0.028 type_check.py:3(<module>)
       52    0.001    0.000    0.016    0.000 gui.py:11(display_faces)
Found:

  • "{cv2.CascadeClassifier}" is executed 52 times, that is, every frame, whereas in the functional code is only executed 1.

Break after 100 frames:
time python2.7-32 capturebasic.py
real 0m27.675s,     real 0m25.162s,      real 0m26.867s

Solution:

In the refactored code I was instantiating the face detector "FaceClassifier" class in every pass of the loop:
while(True):
    ...
    face_classifier = FaceClassifier()
    faces_list = face_classifier.detect_multiscale(frame_prepared)
Taking it out of the while:
face_classifier = FaceClassifier()
while(True):
    ...
    faces_list = face_classifier.detect_multiscale(frame_prepared)
And fps are back on track:
Cleaned up camera.
         18736 function calls (18645 primitive calls) in 21.654 seconds
   Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.092    0.092   21.656   21.656 capturebasic.py:1(<module>)
      168    0.008    0.000    9.017    0.054 gui.py:17(should_quit)
      168    9.008    0.054    9.008    0.054 {cv2.waitKey}
      168    3.317    0.020    3.317    0.020 {cv2.imshow}
      168    0.020    0.000    2.745    0.016 classifiers.py:21(detect_multiscale)
      168    2.725    0.016    2.725    0.016 {method 'detectMultiScale' of 'cv2.CascadeClassifier' objects}
      168    2.077    0.012    2.077    0.012 {method 'read' of 'cv2.VideoCapture' objects}
      168    0.001    0.000    1.322    0.008 utils.py:11(flip_frame)
      168    1.320    0.008    1.320    0.008 {cv2.flip}
      168    0.015    0.000    1.277    0.008 utils.py:15(prepare_frame_for_detection)
        1    1.165    1.165    1.165    1.165 {cv2.VideoCapture}
      168    0.648    0.004    0.648    0.004 {cv2.equalizeHist}
      168    0.613    0.004    0.613    0.004 {cv2.cvtColor}
        1    0.332    0.332    0.332    0.332 {method 'release' of 'cv2.VideoCapture' objects}
        1    0.002    0.002    0.104    0.104 __init__.py:106(<module>) 
Timings as well:
time python2.7-32 capturebasic.py
real 0m14.860s,    real 0m17.444s,    real 0m13.883s
This is a bug, but a good one, since it has forced me to dig into performance tools used with opencv. I'm pretty sure this will not be the last time to have to use them in the project.


No comments:

Post a Comment