
title: Gesture management
This research work aims at extending gestures in XR in order to make XRSH optional spatial. A physical keyboard might remain the best way to interact, it is still important to consider multiple ways to interact with content. Hand tracking has improved to the point of being relatively reliable. Some frameworks like AFrame already support hand tracking with pinch gestures.
This research support the exploration of other gestures beyond thumb/index pinch.
This approach is not relying on any statistical solution, see related approaches at the end of this page for this. Here instead each gesture must be explicitely programmed based on available joints. To better understand how hands (for now but potentially full body tracking in the future) are represented check the Immersive Web explainer https://immersive-web.github.io/webxr-hand-input/
Note that because this is relatively complex some helpers are provided as e.g.
For more see https://forgejo.isvery.ninja/xrsh/xrsh/src/branch/gestures/src/gestures.js#L43-L48
This research does not provide an exhaustive list of usable gestures as this is impossible. This seems trivial yet consider that with only 2 hands, each with 5 fingers and 3 to 4 joints there are already more than 45 contact points. Combinations, touching 2 joints then 2 others, grow explonentially. One can also consider touching then non-touching (or touching the "air") or "scrolling" between 2 joints. To summarize even though focusing solely on 2 hands seems like a tractable problem, the richness is surprisingly vast.
Even though microGlyph (or uGlyph) has great descriptive power, the overlap with WebXR Hand Tracking specification is only partial. It is a useful tool for human-computer interface (HCI) researchers in order to precisely describe gestures beyond text and images but despite a computational description (e.g. LaTeX) the partial mapping is not very useful. Also there is no adoption beyond HCI research.
It might be possible that more helping structures (e.g. allJointsNames) or functions (e.g. proximityBetweenJointsCheck) makes the partial mapping usable but this still means checking potential gestures in real time.
An interesting way could be to do so and rate how performant gestures are. Even better would be that helpers integrate a notion of hierarchy and ergonomy. A practical example would be that if 2 hands are far appart, then there is no need to check for the finger positions between hands as those will also be too far apart.
For trainign see `addDebbugGraph()` and `drawPoints(points)` to try to estimate what joints and pose can be used for your custom gesture while in XR. This could be extended by make the treshold line movable while in XR.
See also XR hand keyboard focus
Testing code https://forgejo.isvery.ninja/xrsh/xrsh/src/branch/gestures/src/gestures_tests.html#L60
window.test = function( event ){ console.log('custom callback showcase:', event.detail) }
AFRAME.scenes[0].setAttribute('template-gesture-detector', 'callback:test;')
alternatively without customization on callback, just target element :
AFRAME.scenes[0].setAttribute('template-gesture-detector', 'target:#demotarget;')
AFRAME.scenes[0].querySelector("[isoterminal]").object3D.children[0].material.color
document.querySelector("[isoterminal]").components.isoterminal.term.send(e.detail.key)
document.querySelector("[isoterminal]").components.isoterminal.term.exec("ls")
Specifically .term.send() and .term.exec() on the isoterminal component.
For more see https://xrsh.isvery.ninja/#Custom%20REPLs
The core of the suggestion remains at https://forgejo.isvery.ninja/xrsh/xrsh/src/branch/gestures/src/gestures.js#L168-L211 namely an AFrame component split in 3 parts :
management of gestures itself, in XR
See Gesture management by toggling gesture detection by the user
Debugging joint distance via the console is impractical. As mentioned earlier there are more than 45 values just for pose (itself being 2 triplets for x,y,z for position and roll,pitch,yaw for rotation) which is visually uninerpretable, if even computationally so in standalone headset. Consequently a debug graph visualizer has been build. The idea is to show the current position for each value over time. The goal is then to find upper/lower thresholds to filter in or filter out. Despite this approach the problem remains complicated to define the right detector that gets the right values in only the gesture wanted despite the relative position changing. This means that some gestures rely on relative pose rather than absolute value. A practical example being the plam facing the user. This remains true despite the rotation of all joints changing.
AFRAME component positional-context filtering on where a gesture is done, see https://forgejo.isvery.ninja/xrsh/xrsh/src/branch/gestures/src/gestures.js#L308-L334
My notes on Tools gather what I know or want to know. Consequently they are not and will never be complete references. For this, official manuals and online communities provide much better answers.