
Improving the Scalability of Moderated Usability Testing for Voice Products
NDA Notice
This case study has been anonymized to protect proprietary information. Company name, internal tools, and identifying details have been generalized. The work, the process, and outcomes accurately reflect my contributions and responsibilities.
📌 TL;DR - UX Research Case Study
Role: UX Researcher
Company: Mid-sized B2B technology company (anonymized)
Focus: Improving the efficiency and reliability of moderated usability testing
Summary
Led end-to-end improvements to a legacy usability testing workflow by consolidating tools, refining moderation practices, and building lightweight solutions when exisitng tools fell short—balanced speed, research integrity, and real-world constraints to deliver more consistent insights.
Key contributions
-
Evaluated and selected usability test platforms
-
Streamlined moderator script to improve session flow
-
Designed and built a custom Wizard-of-Oz tool under tight timelines
-
Solved technical constraints affecting live session quality
-
Applied AI selectively to support (not replace) the researcher judgement
Methods
Moderated usability testing | Wizard-of-Oz | Task-based studies | Qualitative Synthesis

The Story
Role
UX Researcher (End-to-End Ownership)
Company
Mid-sized B2B technology company building voice-first, AI-enabled products
Timeline
Several months (iterative)
Methods
Moderated usability testing, Wizard-of-Oz, task-based studies, qualitative analysis, AI-assisted synthesis
The Challenge
When I joined this organization, usability testing existed—but it wasn’t designed to scale.
​
Voice-based usability studies relied on a legacy Wizard-of-Oz (WoZ) tool inherited through acquisition. Access was limited to a small number of individuals, workflows were fragmented across multiple platforms, and running a single study required coordination between multiple people.
​
As usability testing was increasingly positioned as part of the company’s service offerings, these constraints became a risk. The challenge wasn’t just improving individual studies—it was building a sustainable usability testing capability that could support growing demand without sacrificing research quality.
Why This Was Hard
Voice-based usability testing sits outside most mainstream UX tooling and guidance. The majority of usability platforms are optimized for websites, apps, or physical products—not conversational or voice experiences.
At the same time:
-
AI tools were rapidly emerging, but not all were appropriate for research
-
Automation needed to be balanced with experimental control
-
Study integrity mattered more than novelty
-
Timelines and access constraints were real
Ultimately, there was no established playbook to follow.
Goals
I defined success as:
-
Reducing time from recruitment to insights
-
Enabling a single researcher to run end-to-end studies
-
Consolidating tools where possible
-
Introducing AI only where it improved efficiency without compromising validity
-
Improving consistency across studies
The Process
Step 1: Rethinking the Research Platform
The existing research workflow relied on several disconnected tools for recruitment, moderation, scripting, and analysis. Each handoff added friction and extended timelines.
​
I evaluated multiple third-party usability platforms, focusing on how well they supported moderated research, not just automated insights. Evaluation criteria included:
-
Ease of use
-
Insight quality
-
Cost and scalability
-
Flexibility for different study types
-
Maturity of AI features
Rather than treating AI as a requirement on its own, I assessed whether AI capabilities actually reduced researcher workload while preserving control.
Outcome
I recommended a moderated usability platform that consolidated recruitment, testing, and analysis while offering selective AI support. This reduced tool fragmentation and improved repeatability across studies.
Step 2: Streamlining the Moderator Script
​
The moderator script had evolved into a lengthy, fragmented document. Sessions often spent nearly 10 minutes before participants reached their first task, which occasionally forced tasks to be skipped due to time constraints.
​
I redesigned the moderator script to:
-
Remove unnecessary content
-
Improve clarity and flow
-
Live in a single, editable view
-
Support real-time note-taking
Outcome
-
Pre-task setup time reduced to ~5 minutes
-
Tasks consistently completed within session time
-
Improved session pacing and participant experience
Step 3: A Deliberate Decision About AI
​
I explored using AI-driven audio tools to automate Wizard-of-Oz responses. On paper, this promised hands-off execution and consistency.
​
In practice, testing revealed:
-
Variability in responses
-
Script deviations
-
Occasional hallucinations
This posed a clear risk to study validity. I made a deliberate decision not to use AI for this portion of the research, prioritizing methodological integrity over automation.
Step 4: Building a Custom Wizard-of-Oz tool
​
Three days before a scheduled usability study, access to the existing WoZ tool was unavailable. Canceling or delaying the study would have disrupted timelines and stakeholder expectations.
​
Instead, I built a lightweight WoZ tool from scratch.
​
Using HTML/CSS and iterative collaboration with an AI coding assistant, I created a reliable, soundboard-style tool that allowed precise control over audio playback during live sessions.
​
Key features included:
-
Step-by-step audio playback
-
Visual prompts and expected participant responses
-
Time tracking per audio clip
-
Clear visual hierarchy to reduce moderator error
The tool was designed, built, tested, and used successfully within two days.
​
Post usability study, the tool was improved, increasing accessibility and the user interface.
Troubleshooting
Solving Audio Constraints in Live Sessions
​
The moderated testing platform did not support direct system audio sharing. My initial workaround - external speakers and a microphone - worked but reduced audio quality.
​
I diagnosed the technical limitation, collaborated with AI, and implemented an audio-routing solution that enabled direct desktop audio playback into live sessions.
Outcome
-
Improved audio clarity and professionalism
-
Reduced technical risk during moderation

Outcomes
Impact
​
This work transformed voice-based usability testing from a fragile, multi-person effect into a more scalable, researcher-owned capability.
​
Results
-
Reduced time from study execution to insights
- Enabled single-researcher study execution
- Improved consistency and session flow
- Preserved research integrity while selectively leveraging AI
- Sparked internal interest in formalizing WoZ tool for broader use
What I learned...
​​​
- AI can accelerate research - but only when applied thoughtfully
- Tooling decisions directly affect research quality and consistency
- Constraint-driven problem solving often produces more resilient systems
- Building research infrastructure can be as impactful as delivering insights
Why this matters​
​
This project reflects how I approach UX research:
-
Methodology first
-
Pragmatic about tools
-
Focused on scale and usability
-
Willing to build solutions when none exist
_edited.png)
