In pure spatial processing vision chips only image enhancement, edge detection, or other stationary visual tasks are of concern. Spatio-temporal vision chips are concerned with the time dependent features of the image. Although spatial vision chips have shown a degree of robustness for operation under different lighting environments, there are still no claims about the robust operation of spatio-temporal vision chips.
The complicated nature of motion detection for VLSI implementation is from the fact that in almost all models some form of delay or storage element is required. Most algorithms also require inputs from several time frames. Storage or delay elements, apart from being very area consuming, are difficult to implement. Another reason for the less robust operation of motion detectors is the required temporal contrast for reliable motion detection. The temporal contrast of objects in real scenes is relatively small, and can hardly trigger many analog motion detection chips. The unsatisfactory results from most motion detector chips is driving recent implementations towards intuitive but robust solutions with some structural deviations from the original models.
Algorithms that have been devised for motion detection chips are in two main categories: biological, and computational. Some early implementation were based on the optic-flow theory, which belongs to the computational category. Due to complexity and inherent problems in this theory, however, no recent motion detection chips have been based on this model. In fact all computational algorithms for motion detection are very complex and only a few motion detection chips are based on these models. On the other hand, biological models, for example Reichardt's correlative motion detector, offer a simple structure which is VLSI friendly. Therefore, a large number of vision chips have adopted these models, or modified versions of them.
It should be mentioned that subtracting two frames of the image, though is a temporal processing function, cannot be considered as motion detection, which is implied at least for detecting optical flow in time and in space.