Introduction From
a previous research study conducted by the authors, through qualitative
interviews, it was gathered that visually impaired users often used the
camera on their smartphones. One notable instance of camera use that
stood out was the use of an application called Envision AI. This app
provides multiple functionalities specifically designed for visually
impaired users. Some of these include the ability to read any text (even
handwritten) using the camera, speech based screen description of what
is in the camera frame, ability to search for specific objects (e.g.
chair) in the surroundings. On further investigation, we found that even
though the Envision AI app provides all these functionalities, a
primary aspect of using the camera framing is hard for visually impaired
users. Framing for a camera is often something that sighted users take for granted mainly because of the real time (visual) feedback loop that is available to them. Multiple
applications like CamScanner, Evernote, Adobe Scan etc. offer document
scanning and OCR abilities, using these, a visually impaired user could
technically “listen” to any document they desire. The technology exists
and has been implemented in numerous instances. The primary roadblock
that prevents visually impaired users from using this technology to its
full extent is that they cannot frame the document within their point of
interest properly. The current work around for this is to ask people
nearby to assist them. The aim of this work is to give users the
independence to perform this activity themselves.
Interaction Design The
first point of interaction is initialization of the tracking for the
frame. In order for the system to work on horizontal, vertical or
angular surfaces, we decided to use an adaptive system to recognize the
plane on which the document of interest is kept. This is done in a novel
way by keeping the phone against the surface of the document. Once this
is done, users tap anywhere on the screen to initialize tracking. At
this point the orientation values of the phone on the surface of the
document are recorded and subsequent values are compared to this. Along
with this, at the same time, user receives vibration feedback from the
system that input has been received and that tracking has been
enabled. Once the user has initialized the system,
they proceed to lift (or move away) the phone from the document surface.
While doing this, they are provided with continuous auditory feedback
about the orientation of the phone as compared to the plane of the
document. Non speech auditory tones were
chosen as research has shown that tonal modalities are faster and in
some instances more accurate than speech modalities. Another important
factor in this decision was the urgent requirement of real time
feedback. This is important so that the user can make changes quickly
which are only possible in a real time closed feedback system. If we
give the user the ability to understand the state of the system and what
steps they need to take, we can also allow some more freedom and make
the system less constrained. In non-speech audio, there are multiple
ways of representing information. Some of these are using pitch,
loudness, timber and simultaneous audio. Pitch was chosen as prior
research has shown that pitch differences are one of the most effective
methods to differentiate between values [ 1 ].
Pitch
(or frequency) is proportionally mapped to how far away the user is
from the required position. By giving the user feedback about the
distance from the required position, we essentially transform three
variables (yaw, pitch and roll) into a single feedback variable (tone).
While multiple tonal variations can be used, research has shown that
humans are best receptive of tones from MIDI 20 to MIDI 100. Along with
this it is exceptionally hard to comprehend more than two tones at once.
Thus this novel data conversion technique helps make feedback data
concise and more intuitive.
On
initial tests, we found that the tones when playing continuously were
sometimes irritating to users. To overcome this, we added the
functionality of the auditory feedback being paused
whenever the user taps the screen once the tracking has been
initialized. Audio is paused until the user holds their tap on the
screen. If for some reason the user would like to
restart the process which frame tracking is enabled, there is a button
on the lower side of the screen which can be pressed to reset the
process. A vibration feedback is used to indicate that an app reset has
occurred.
References [1] Peres and Lane, “AUDITORY GRAPHS: THE EFFECTS OF REDUNDANT DIMENSIONS AND DIVIDED ATTENTION.”
Team Prabodh Sakhardande Santiago Arconada Alvarez