title: Gesture management

This research work aims at extending gestures in XR in order to make XRSH optional spatial. A physical keyboard might remain the best way to interact, it is still important to consider multiple ways to interact with content. Hand tracking has improved to the point of being relatively reliable. Some frameworks like AFrame already support hand tracking with pinch gestures.

This research support the exploration of other gestures beyond thumb/index pinch.

This approach is not relying on any statistical solution, see related approaches at the end of this page for this. Here instead each gesture must be explicitely programmed based on available joints. To better understand how hands (for now but potentially full body tracking in the future) are represented check the Immersive Web explainer https://immersive-web.github.io/webxr-hand-input/

Note that because this is relatively complex some helpers are provided as e.g.

  • fingersNames
  • tips
  • thumbParts
  • fingerParts
  • fingers, and finally
  • allJointsNames

For more see https://forgejo.isvery.ninja/xrsh/xrsh/src/branch/gestures/src/gestures.js#L43-L48

Notation and training on gesture

This research does not provide an exhaustive list of usable gestures as this is impossible. This seems trivial yet consider that with only 2 hands, each with 5 fingers and 3 to 4 joints there are already more than 45 contact points. Combinations, touching 2 joints then 2 others, grow explonentially. One can also consider touching then non-touching (or touching the "air") or "scrolling" between 2 joints. To summarize even though focusing solely on 2 hands seems like a tractable problem, the richness is surprisingly vast.

Even though microGlyph (or uGlyph) has great descriptive power, the overlap with WebXR Hand Tracking specification is only partial. It is a useful tool for human-computer interface (HCI) researchers in order to precisely describe gestures beyond text and images but despite a computational description (e.g. LaTeX) the partial mapping is not very useful. Also there is no adoption beyond HCI research.

It might be possible that more helping structures (e.g. allJointsNames) or functions (e.g. proximityBetweenJointsCheck) makes the partial mapping usable but this still means checking potential gestures in real time.

An interesting way could be to do so and rate how performant gestures are. Even better would be that helpers integrate a notion of hierarchy and ergonomy. A practical example would be that if 2 hands are far appart, then there is no need to check for the finger positions between hands as those will also be too far apart.

For trainign see `addDebbugGraph()` and `drawPoints(points)` to try to estimate what joints and pose can be used for your custom gesture while in XR. This could be extended by make the treshold line movable while in XR.

Recorded demonstrations :

See also XR hand keyboard focus

Testing code https://forgejo.isvery.ninja/xrsh/xrsh/src/branch/gestures/src/gestures_tests.html#L60

Demonstration of a customized callback for gesture detector

window.test = function( event ){ console.log('custom callback showcase:', event.detail) } AFRAME.scenes[0].setAttribute('template-gesture-detector', 'callback:test;')

alternatively without customization on callback, just target element : AFRAME.scenes[0].setAttribute('template-gesture-detector', 'target:#demotarget;')

Integration with isoterminal

AFRAME.scenes[0].querySelector("[isoterminal]").object3D.children[0].material.color document.querySelector("[isoterminal]").components.isoterminal.term.send(e.detail.key) document.querySelector("[isoterminal]").components.isoterminal.term.exec("ls")

Specifically .term.send() and .term.exec() on the isoterminal component.

For more see https://xrsh.isvery.ninja/#Custom%20REPLs

Gesture template

The core of the suggestion remains at https://forgejo.isvery.ninja/xrsh/xrsh/src/branch/gestures/src/gestures.js#L168-L211 namely an AFrame component split in 3 parts :

  1. gesture detection in customizable throttle tick (50ms default frequency)
  2. gesture reception via event listening
  3. removable event listener to add/remove gestures

Gesture management

management of gestures itself, in XR

See Gesture management by toggling gesture detection by the user

Debugging joint distance via the console is impractical. As mentioned earlier there are more than 45 values just for pose (itself being 2 triplets for x,y,z for position and roll,pitch,yaw for rotation) which is visually uninerpretable, if even computationally so in standalone headset. Consequently a debug graph visualizer has been build. The idea is to show the current position for each value over time. The goal is then to find upper/lower thresholds to filter in or filter out. Despite this approach the problem remains complicated to define the right detector that gets the right values in only the gesture wanted despite the relative position changing. This means that some gestures rely on relative pose rather than absolute value. A practical example being the plam facing the user. This remains true despite the rotation of all joints changing.

AFRAME component positional-context filtering on where a gesture is done, see https://forgejo.isvery.ninja/xrsh/xrsh/src/branch/gestures/src/gestures.js#L308-L334

Other approaches

gesture-manager component uGlyphs integration (https://lig-microglyph.imag.fr) B: uGlyphs XRSH-integration demo + docs C: train on gesture (via uGlyph description and "test") D: train on gesture XRSH-integration demo + docs

E: components which broadcast events for common tasks (speech-input-command, copy-paste e.g.)

    F: XRSH integration demo + docs for task 6E

G: event-remapping functionality component which allows rerouting of events (keyboard/mouse-event to ‘pinch’ for people without indexfinger, mouseclick to certain uGlyph e.g.)

    H: XRSH integration demo + docs for task 6G

Note

My notes on Tools gather what I know or want to know. Consequently they are not and will never be complete references. For this, official manuals and online communities provide much better answers.