rueiche.me Open in urlscan Pro
185.199.108.153  Public Scan

Submitted URL: http://rueichechang.github.io/
Effective URL: https://rueiche.me/
Submission: On October 29 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Toggle navigation
 * about (current)
 * publications
 * cv
 * 




RUEI-CHE CHANG

Human-AI Interaction | Accessibility

rueiche@umich.edu
Twitter
Google Scholar
Curriculum Vitae


I am a Ph.D. candidate at Department of Computer Science, University of
Michigan, advised by Anhong Guo.

My research focuses on building Human-AI systems that enable blind or visually
impaired people to access the real world, specifically with audio and captioning
interfaces.

During my PhD study, I interned at Meta Reality Labs. Prior to that, I graduated
with a Master degree in Computer Science at Dartmouth College and a Bachelor
degree in Electrical Engineering at National Cheng Kung University in Taiwan.


NEWS

Oct 03, 2024 WorldScribe awarded Best Paper Award 🏆 at UIST’24! Jul 04, 2024
WorldScribe is conditionally accepted to UIST’24 Jul 04, 2024 EditScribe and
CustomAD are conditionally accepted to ASSETS’24


SELECTED PUBLICATIONS

 1. WorldScribe: Towards Context-Aware Live Visual Descriptions
    Ruei-Che Chang, Yuxuan Liu, and Anhong Guo
    In Proceedings of the 37th Annual ACM Symposium on User Interface Software
    and Technology (UIST ’24), Pittsburgh, PA, USA, 2024
    
    🏆 Best Paper Award
    Abs HTML PDF Video 30s Preview
    
    Automated live visual descriptions can aid blind people in understanding
    their surroundings with autonomy and independence. However, providing
    descriptions that are rich, contextual, and just-in-time has been a
    long-standing challenge in accessibility. In this work, we develop
    WorldScribe, a system that generates automated live real-world visual
    descriptions that are customizable and adaptive to users’ contexts: (i)
    WorldScribe’s descriptions are tailored to users’ intents and prioritized
    based on semantic relevance. (ii) WorldScribe is adaptive to visual
    contexts, e.g., providing consecutively succinct descriptions for dynamic
    scenes, while presenting longer and detailed ones for stable settings. (iii)
    WorldScribe is adaptive to sound contexts, e.g., increasing volume in noisy
    environments, or pausing when conversations start. Powered by a suite of
    vision, language, and sound recognition models, WorldScribe introduces a
    description generation pipeline that balances the tradeoffs between their
    richness and latency to support real-time use. The design of WorldScribe is
    informed by prior work on providing visual descriptions and a formative
    study with blind participants. Our user study and subsequent pipeline
    evaluation show that WorldScribe can provide real-time and fairly accurate
    visual descriptions to facilitate environment understanding that is adaptive
    and customized to users’ contexts. Finally, we discuss the implications and
    further steps toward making live visual descriptions more context-aware and
    humanized.

 2. EditScribe: Non-Visual Image Editing with Natural Language Verification
    Loops
    Ruei-Che Chang, Yuxuan Liu, Lotus Zhang, and Anhong Guo
    In Proceedings of the 26th International ACM SIGACCESS Conference on
    Computers and Accessibility (ASSETS ’24), St. John’s, Newfoundland and
    Labrador, Canada, 2024
    
    
    Abs PDF Video
    
    Image editing is an iterative process that requires precise visual
    evaluation and manipulation for the output to match the editing intent.
    However, current image editing tools do not provide accessible interaction
    nor sufficient feedback for blind and low vision individuals to achieve this
    level of control. To address this, we developed EditScribe, a prototype
    system that makes image editing accessible using natural language
    verification loops powered by large multimodal models. Using EditScribe, the
    user first comprehends the image content through initial general and object
    descriptions, then specifies edit actions using open-ended natural language
    prompts. EditScribe performs the image edit, and provides four types of
    verification feedback for the user to verify the performed edit, including a
    summary of visual changes, AI judgement, and updated general and object
    descriptions. The user can ask follow-up questions to clarify and probe into
    the edits or verification feedback, before performing another edit. In a
    study with ten blind or low-vision users, we found that EditScribe supported
    participants to perform and verify image edit actions non-visually. We
    observed different prompting strategies from participants, and their
    perceptions on the various types of verification feedback. Finally, we
    discuss the implications of leveraging natural language verification loops
    to make visual authoring non-visually accessible.

 3. SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality
    Awareness
    Ruei-Che Chang, Chia-Sheng Hung, Bing-Yu Chen, Dhruv Jain, and Anhong Guo
    In Proceedings of the 2024 ACM Designing Interactive Systems Conference (DIS
    ’24), IT University of Copenhagen, Denmark, 2024
    
    
    Abs PDF Video
    
    Mixed-reality (MR) soundscapes blend real-world sound with virtual audio
    from hearing devices, presenting intricate auditory information that is hard
    to discern and differentiate. This is particularly challenging for blind or
    visually impaired individuals, who rely on sounds and descriptions in their
    everyday lives. To understand how complex audio information is consumed, we
    analyzed online forum posts within the blind community, identifying
    prevailing challenges, needs, and desired solutions. We synthesized the
    results and propose SoundShift for increasing MR sound awareness, which
    includes six sound manipulations: Transparency Shift, Envelope Shift,
    Position Shift, Style Shift, Time Shift, and Sound Append. To evaluate the
    effectiveness of SoundShift, we conducted a user study with 18 blind
    participants across three simulated MR scenarios, where participants
    identified specific sounds within intricate soundscapes. We found that
    SoundShift increased MR sound awareness and minimized cognitive load.
    Finally, we developed three real-world example applications to demonstrate
    the practicality of SoundShift.

 4. OmniScribe: Authoring Immersive Audio Descriptions for 360° Videos
    Ruei-Che Chang, Chao-Hsien Ting, Chia-Sheng Hung, Wan-Chen Lee, Liang-Jin
    Chen, Yu-Tzu Chao, Bing-Yu Chen, and Anhong Guo
    In Proceedings of the 35th Annual ACM Symposium on User Interface Software
    and Technology (UIST ’22), Bend, OR, USA, 2022
    
    
    Abs HTML PDF Video
    
    Blind people typically access videos via audio descriptions (AD) crafted by
    sighted describers who comprehend, select, and describe crucial visual
    content in the videos. 360° video is an emerging storytelling medium that
    enables immersive experiences that people may not possibly reach in everyday
    life. However, the omnidirectional nature of 360° videos makes it
    challenging for describers to perceive the holistic visual content and
    interpret spatial information that is essential to create immersive ADs for
    blind people. Through a formative study with a professional describer, we
    identified key challenges in describing 360° videos and iteratively designed
    OmniScribe, a system that supports the authoring of immersive ADs for 360°
    videos. OmniScribe uses AI-generated content-awareness overlays for
    describers to better grasp 360° video content. Furthermore, OmniScribe
    enables describers to author spatial AD and immersive labels for blind users
    to consume the videos immersively with our mobile prototype. In a study with
    11 professional and novice describers, we demonstrated the value of
    OmniScribe in the authoring workflow; and a study with 8 blind participants
    revealed the promise of immersive AD over standard AD for 360° videos.
    Finally, we discuss the implications of promoting 360° video accessibility.

© Copyright 2024 Ruei-Che Chang. Last updated: October 17, 2024.