[Optimoz] gesture recognition with a hidden markov model

Aaron Lenfestey lenfestey at gmail.com
Sat Oct 29 19:16:30 EDT 2005


Hi,

well, I've had at least 2 methods in my head, one of which is very
simple and the other substantially more complex (computationally).  We
might have to do some experiments to find out exactly how much extra
work low-end machines can afford to do, but my guess is that 300Mhz
machines are effectively off the consideration radar (very little
extra work can be done without causing unacceptable performance).  In
any case, any gesture recognizer(s) that we do come up with will be an
'advanced option' and the user can choose to stick with gesture
recognizer classic (tm) if they want.  Finally, has a C/C++ backend
been definitively ruled out of optimoz?  In this case, these problems
would be, by and large, moot.  We could of course fork off into
optimoz++ until the day comes when js can handle our codes on common
denominator hardware.

Now, for something a little more technical:

Proposal 1: at each time step we maintain a probability distribution
over the characters in our alphabet.  We define HMM transition
probabilities as a joint distribution over pairs of characters (before
and after).  We can do exact inference (forward-backward) in something
like N^2*T time, where T = # data points and N = # characters in
alphabet (~10).  I've implemented this in Matlab with a smaller
alphabet (4) and it works reasonably well.

Proposal 2: at each time step we maintain a probability distribution
over prefixes of valid inputs.  Transition probabilities are the same,
defined as a function of the last character of the prefix and the
possible subsequent characters that would still yield a valid
sequence.  From a modeling perspective, this is a lot more interesting
but exact inference requires that we consider a distribution over all
possible prefixes to every sequence.  Most of these will be very
unlikely, so I think its amenable to approximation, also we can think
about possible sampling algorithms.  Worst case: N^2*T, where N ~
1000.  There is probably a lot of literature on how to do this
efficiently, which I haven't read (sorry).

Anyway, proposal 1 amounts to glorified smoothing, although in
practice I think it can yield significant improvements.  There is
still room in the model to learn user behavior.  Proposal 2 also makes
use of which input sequence the user intends to make.  Exact inference
is so much harder because the state space is so much larger.  I'd be
interested in trying out particle filters or something like that;
sampling algorithms like these would give the a user a dial which can
be tuned to their hardware (performance vs. accuracy).

-Aaron


More information about the Optimoz mailing list