Weiss, Simoncelli, and Aldeson’s 2002 paper Motion Illusion as Optimal Percepts sets out a useful model to explain a few of the inconsistencies in human vision. First, some background on a couple of these inconsistencies.
The aperture problem results when parallel lines move along a two dimensional slit. You can view an animation of it here. We perceive the motion as diagonal, but the lines could also be moving right or downward. In order to clear up this ambiguity, you need to be able to see the ends of the bars.
One application of the aperture problem that the researchers examine is in a rhombus. You should view it here, and compare the “thinnest” and “fattest” rombi with high contrast and occluders on. The “thin” should appear as diagonal movement, while the “fat” one should appear as horizontal movement.
Before this paper, the rules for estimating coherent pattern motion could not account for both of these effects at once. The “intersections of constraints” model is able to explain the horizontal perception in the “fat” rhombus, but not diagonal motion in the “thin” rhombus. The “vector average” model is able to explain the perception of diagonal motion in the “thin” rhombus but not the horizontal motion in the “fat” rhombus.
The three authors create a new model that can explain both of these phenomenon at the same time. One other phenomenon that their model explains is that when contrast increases, perceived velocity increases as well. The paper becomes most interesting when they explain how they came about their model, which they break up into 5 steps:
1) They make an assumption of “intensity conservation,” meaning that points in the visual field move but do not alter in intensity over time.
2) They assume that intensity will not be conserved exactly and therefore that there will be some noise, for which they add a variable of n.
3) They assume that this noise is Gaussian (a bell-shaped curve) and that the velocity will remain constant in a small geometric space. They also make an assumption about intensity that I don’t fully follow (my error I’m sure) but that allows them to approximate intensity linearly in a short time frame.
4) They assume a prior favoring low speeds. This means that pre-data input, the most likely velocity is no movement, and that the slower speed will be the more likely conclusion in every case.
5) They assume that the entire image moves based on a single translation velocity.
Through some substitution and calculus, they are able to write their equation using standard linear algebra. It is elegant, and predicts some empirical data well. This is an influential paper in vision modeling, and according to Scopus it has already been cited 114 times.
Y. Weiss, E.P. Simoncelli, E.H. Adelson, Motion illusions as optimal percepts, Nature Neuroscience 5 (2002), pp. 598-604. doi: 10.1038/nn858.