## NEC unveils facial recognition system for 2020 Tokyo Olympics – The Verge

But will it work…..

## Finding webcams

List all devices on windows using ffmpeg

ffmpeg -list_devices true -f dshow -i dummy


and on Linux

v4l2-ctl –list-devices

To get the device capabilites

ffmpeg -f dshow -list_options true -i video="Mobius"

where “Mobius” is the name of the camera.

On the Mac use

ffmpeg -f avfoundation -list_devices true -i ""

## A curated list of deep learning resources for computer vision

A really useful list of key papers using Deep learning in computer vision

## Axis IP camera

I use axis IP camera a lot for capturing images and video. The image quality is great and they are highly customisable. I use a P1344 camera and it supports still images, MJPEG and H264. The still image are fine for capturing a one off, but too slow for video work. For this I need either MJPEG or H264. Both have there pros and cons.

MJPEG is existentially a sequence of JPEG images. It’s easy to use and the quality is good, depending on the compression settings. The downside is the bitrate over the network is larger than H264.

H264 is a lossy video compression format that has become ubiquitous on the internet these days for compressed video and blue-ray videos.

The camera can be controlled using the url. To get a H264 stream (using VLC in this case, but ffplay works perfectly well), at the command prompt type

vlc rtsp://192.168.0.103:554/axis-media/media.amp

vlc "rtsp://192.168.0.103:554/axis-media/media.amp?user=XXX&password=XXXX"

The image resolution can be changed with

vlc rtsp://192.168.0.103:554/axis-media/media.amp?resolution=640x480

The resolution is camera dependent. There are a bunch of different settings, such as bit rate, compression, you can apply see the AXIS VAPIX documentation for the whole list. An easy way to do this is by using the cameras settings page to create a Stream Profile. There are a number built in and you can select  them like so

 vlc rtsp://192.168.0.103:554/axis-media/media.amp?streamprofile=Quality



Still images can be captured, by using

http://192.168.0.103/axis-cgi/jpg/image.cgi?resolution=320x240&compression=25

here I’ve selected the resolution and compression factor. You can grab the image by placing the above url into a browser.

## Image Quality

Looking at the above shows subregions of example images captured at full resolution using JPG. We can see that there is significant compression artifacts in the image even at low compression ratios. Setting the compression ratio less than 40 appears to have little effect on image quality.

H264 streams appear to be similarly affected. The bitmap image shows some improvement however the data rate to transmit this is considerably larger.

Note that the RMS errors are calculated from the JPEG image with a compression factor of 0.

## Benchmarking

 ffplay "http://192.168.0.103/axis-cgi/mjpg/video.cgi?resolution=640x480&fps=15"

uses 60% of one core of my Odroid XU4 and 17% on my 2.7 GHz iMac

ffplay "rtsp://192.168.0.103:554/axis-media/media.amp?resolution=640x480&fps=15"

uses 88% on the Odroid and 14% on the iMac.

## Mathematical representation of images and optics.

The way we represent an image mathematically can have a big impact on our ability to mathematically manipulate it. Conceptually it would be simplest to represent an image (let’s assume it’s grey-level) as a 2D array. If my image is a 2D array $$\mathbf{X}$$ I could implement the effect a linear shift invariant blurring function $$\mathbf{H}$$ and produce an output image $$\mathbf{F}$$ via the convolution operator:

$$\mathbf{F}=\mathbf{H} \ast \mathbf{X}$$

I could do other things with this notation such as introduce a shift operator to move my image by one pixel

$$\mathbf{F}=\mathbf{\acute {H}} \ast \mathbf{X}$$

where $$\mathbf{\acute {H}}=[0, 0 ,1]$$.

The problem is this is all shift invariant, the same blur or shift is applied to all the pixels in an image. What if the amount of blurring and shifting changes from pixel to pixel as it does in a real image due to imperfections in the camera’s lens? I would need a separate $$\mathbf{\acute {H}}=[0, 0 ,1]$$ for every pixel. A more convenient way is to drop the 2D convolution and implement our system using matrix multiplications. To do this we lexicography rearrange the 2D image matrix into a 1D vector

$$\left[ \begin{array}{ccc} a & b & c \\ d & e & f \\ g & h & i \end{array}\right] \longrightarrow \left[ \begin{array}{c} a\\d\\g\\b\\e\\h\\c\\f\\i \end{array} \right]$$

In Matlab this would be implemented with X1d=X(:) and we can transform it back to 2d with knowledge of the original number of rows and columns X=reshape(X1d,rows,cols).

For simplicity sake I shall reduce the number pixels in my image to 3. But what can we do with this? Well let’s look at a matrix multiply operation

$$\left[ \begin{array}{ccc} a & b & c \\ d & e & f \\ g & h & i \end{array}\right] \left[ \begin{array}{c}x\\y\\z\end{array}\right] = \left[ \begin{array}{c} ax+by+cz\\dx+ey+fz\\gx+hy+iz\end{array}\right]$$

each row in the matrix is like an operator on each pixel. I’ve effectively got a shift variant convolution. For example I could blur the first pixel and leave the rest the same

$$\left[ \begin{array}{ccc} 0.33& 0.33 & 0.33 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right] \left[ \begin{array}{c}x\\y\\z\end{array}\right] = \left[ \begin{array}{c} 0.33x+0.33y+0.33z\\y\\z\end{array}\right]$$

Note that the 1s go down the diagonal.

I could implement a shift on the second pixel

$$\left[ \begin{array}{ccc} 1& 0 & 0 \\ 0 & 0& 1 \\ 0 & 0 & 1 \end{array}\right] \left[ \begin{array}{c}x\\y\\z\end{array}\right] = \left[ \begin{array}{c} x\\z\\z\end{array}\right]$$

by changing the values I could implement rotations and warps.

And we can combine several matrices together to define our system. If $$\mathbf{S}$$ is a shift matrix and $$\mathbf{B}$$ is a blurring matrix with can simply combine the results together

$$\mathbf{F}=\mathbf{SBX}$$

to describe our shift variant optical system.

An additional step we may wish to introduce the effect of sensor pixel size. We can implement this by making our original image have a much higher resolution and them use a decimation filter to reduce this to a low resolution camera image. To this we create a matrix with $$N$$ rows, which equals the number of pixels in the decimated image, and $$M$$ columns, which equals the number of pixels in the high resolution image.

$$\left[ \begin{array}{ccc} 0.5& 0.5 & 0 & 0 \\ 0 & 0 & 0.5 & 0.5 \end{array}\right] \left[ \begin{array}{c}w\\x\\y\\z\end{array}\right] = \left[ \begin{array}{c} 0.5w+0.5x \\0.5y +0.5z\end{array}\right]$$

shows how we can reduce the resolution by 1/2 in one dimension and we can easily extend this to 2D.

## Update

It’s worth noting if the blurring is shift invariant (which is a lot easier to deal with) the matrix is block circulant. This means it is of the form

$$\left[ \begin{array}{ccccc} d(0) & d(M-1) & d(M-2)& \ldots &d(1) \\ d(1) & d(0) & d(M-1)& \ldots &d(2)\\ d(2) & d(1) & d(0)& \ldots &d(3)\\ \vdots&\vdots&\vdots&\vdots&\vdots&\\ d(M-1) & d(M-2) & d(M-3)& \ldots &d(0) \end{array}\right]$$

Note, each row is a shifted version of the one above it. The reason, this is important is that it is easy to invert.