Showing posts with label accessibility. Show all posts
Showing posts with label accessibility. Show all posts

Tuesday, December 17, 2024

Add browser speech input & output to your app

One of the amazing benefits of modern machine learning is that computers can reliably turn text into speech, or transcribe speech into text, across multiple languages and accents. We can then use those capabilities to make our web apps more accessible for anyone who has a situational, temporary, or chronic issue that makes typing difficult. That describes so many people - for example, a parent holding a squirmy toddler in their hands, an athlete with a broken arm, or an individual with Parkinson's disease.

There are two approaches we can use to add speech capabilites to our apps:

  1. Use the built-in browser APIs: the SpeechRecognition API and SpeechSynthesis API.
  2. Use a cloud-based service, like the Azure Speech API.

Which one to use? The great thing about the browser APIs is that they're free and available in most modern browsers and operating systems. The drawback of the APIs is that they're often not as powerful and flexible as cloud-based services, and the speech output often sounds much more robotic. There are also a few niche browser/OS combos where the built-in APIs don't work, like SpeechRecognition on Microsoft Edge on a Mac M1. That's why we decided to add both options to azure-search-openai-demo, to give developers the option to decide for themselves.

In this post, I'm going to show you how to add speech capabilities using the free built-in browser APIs, since free APIs are often easier to get started with, and it's important to do what we can to improve the accessibility of our apps. The GIF below shows the end result, a chat app with both speech input and output buttons:

GIF of speech input and output for a chat app

All of the code described in this post is part of openai-chat-vision-quickstart, so you can grab the full code yourself after seeing how it works.

Speech input with SpeechRecognition API

To make it easier to add a speech input button to any app, I'm wrapping the functionality inside a custom HTML element, SpeechInputButton. First I construct the speech input button element with an instance of the SpeechRecognition API, making sure to use the browser's preferred language if any are set:

class SpeechInputButton extends HTMLElement {
  constructor() {
    super();
    this.isRecording = false;
    const SpeechRecognition =
      window.SpeechRecognition || window.webkitSpeechRecognition;
    if (!SpeechRecognition) {
      this.dispatchEvent(
        new CustomEvent("speecherror", {
          detail: { error: "SpeechRecognition not supported" },
        })
      );
      return;
    }
    this.speechRecognition = new SpeechRecognition();
    this.speechRecognition.lang = navigator.language || navigator.userLanguage;
    this.speechRecognition.interimResults = false;
    this.speechRecognition.continuous = true;
    this.speechRecognition.maxAlternatives = 1;
  }

Then I define the connectedCallback() method that will be called whenever this custom element has been added to the DOM. When that happens, I define the inner HTML to render a button and attach event listeners for both mouse and keyboard events. Since we want this to be fully accessible, keyboard support is important.

connectedCallback() {
  this.innerHTML = `
        <button class="btn btn-outline-secondary" type="button" title="Start recording (Shift + Space)">
            <i class="bi bi-mic"></i>
        </button>`;
  this.recordButton = this.querySelector('button');
  this.recordButton.addEventListener('click', () => this.toggleRecording());
  document.addEventListener('keydown', this.handleKeydown.bind(this));
}
  
handleKeydown(event) {
  if (event.key === 'Escape') {
    this.abortRecording();
  } else if (event.key === ' ' && event.shiftKey) { // Shift + Space
    event.preventDefault();
    this.toggleRecording();
  }
}
  
toggleRecording() {
  if (this.isRecording) {
    this.stopRecording();
  } else {
    this.startRecording();
  }
}

The majority of the code is in the startRecording function. It sets up a listener for the "result" event from the SpeechRecognition instance, which contains the transcribed text. It also sets up a listener for the "end" event, which is triggered either automatically after a few seconds of silence (in some browsers) or when the user ends the recording by clicking the button. Finally, it sets up a listener for any "error" events. Once all listeners are ready, it calls start() on the SpeechRecognition instance and styles the button to be in an active state.

startRecording() {
  if (this.speechRecognition == null) {
    this.dispatchEvent(
      new CustomEvent("speech-input-error", {
        detail: { error: "SpeechRecognition not supported" },
      })
    );
  }

  this.speechRecognition.onresult = (event) => {
    let input = "";
    for (const result of event.results) {
      input += result[0].transcript;
    }
    this.dispatchEvent(
      new CustomEvent("speech-input-result", {
        detail: { transcript: input },
      })
    );
  };

  this.speechRecognition.onend = () => {
    this.isRecording = false;
    this.renderButtonOff();
    this.dispatchEvent(new Event("speech-input-end"));
  };

  this.speechRecognition.onerror = (event) => {
    if (this.speechRecognition) {
      this.speechRecognition.stop();
      if (event.error == "no-speech") {
        this.dispatchEvent(
          new CustomEvent("speech-input-error", {
            detail: {error: "No speech was detected. Please check your system audio settings and try again."},
         }));
      } else if (event.error == "language-not-supported") {
        this.dispatchEvent(
          new CustomEvent("speech-input-error", {
            detail: {error: "The selected language is not supported. Please try a different language.",
        }}));
      } else if (event.error != "aborted") {
        this.dispatchEvent(
          new CustomEvent("speech-input-error", {
            detail: {error: "An error occurred while recording. Please try again: " + event.error},
        }));
      }
    }
  };

  this.speechRecognition.start();
  this.isRecording = true;
  this.renderButtonOn();
}

If the user stops the recording using the keyboard shortcut or button click, we call stop() on the SpeechRecognition instance. At that point, anything the user had said will be transcribed and become available via the "result" event.

stopRecording() {
  if (this.speechRecognition) {
    this.speechRecognition.stop();
  }
}

Alternatively, if the user presses the Escape keyboard shortcut, we instead call abort() on the SpeechRecognition instance, which stops the recording and does not send any previously untranscribed speech over.

abortRecording() {
  if (this.speechRecognition) {
    this.speechRecognition.abort();
  }
}

Once the custom HTML element is fully defined, we register it with the desired tag name, speech-input-button:

customElements.define("speech-input-button", SpeechInputButton);

To use the custom speech-input-button element in a chat application, we add it to the HTML for the chat form:


  <speech-input-button></speech-input-button>
  <input id="message" name="message" class="form-control form-control-sm" type="text" rows="1"></input>

Then we attach an event listener for the custom events dispatched by the element, and we update the input text field with the transcribed text:

const speechInputButton = document.querySelector("speech-input-button");
speechInputButton.addEventListener("speech-input-result", (event) => {
    messageInput.value += " " + event.detail.transcript.trim();
    messageInput.focus();
});

You can see the full custom HTML element code in speech-input.js and the usage in index.html. There's also a fun pulsing animation for the button's active state in styles.css.

Speech output with SpeechSynthesis API

Once again, to make it easier to add a speech output button to any app, I'm wrapping the functionality inside a custom HTML element, SpeechOutputButton. When defining the custom element, we specify an observed attribute named "text", to store whatever text should be turned into speech when the button is clicked.

class SpeechOutputButton extends HTMLElement {
  static observedAttributes = ["text"];

In the constructor, we check to make sure the SpeechSynthesis API is supported, and remember the browser's preferred language for later use.

constructor() {
  super();
  this.isPlaying = false;
  const SpeechSynthesis = window.speechSynthesis || window.webkitSpeechSynthesis;
  if (!SpeechSynthesis) {
    this.dispatchEvent(
      new CustomEvent("speech-output-error", {
        detail: { error: "SpeechSynthesis not supported" }
    }));
    return;
  }
  this.synth = SpeechSynthesis;
  this.lngCode = navigator.language || navigator.userLanguage;
}

When the custom element is added to the DOM, I define the inner HTML to render a button and attach mouse and keyboard event listeners:

connectedCallback() {
    this.innerHTML = `
            <button class="btn btn-outline-secondary" type="button">
                <i class="bi bi-volume-up"></i>
            </button>`;
    this.speechButton = this.querySelector("button");
    this.speechButton.addEventListener("click", () =>
      this.toggleSpeechOutput()
    );
    document.addEventListener('keydown', this.handleKeydown.bind(this));
}

The majority of the code is in the toggleSpeechOutput function. If the speech is not yet playing, it creates a new SpeechSynthesisUtterance instance, passes it the "text" attribute, and sets the language and audio properties. It attempts to use a voice that's optimal for the desired language, but falls back to "en-US" if none is found. It attaches event listeners for the start and end events, which will change the button's style to look either active or unactive. Finally, it tells the SpeechSynthesis API to speak the utterance.

toggleSpeechOutput() {
    if (!this.isConnected) {
      return;
    }
    const text = this.getAttribute("text");
    if (this.synth != null) {
      if (this.isPlaying || text === "") {
        this.stopSpeech();
        return;
      }

      // Create a new utterance and play it.
      const utterance = new SpeechSynthesisUtterance(text);
      utterance.lang = this.lngCode;
      utterance.volume = 1;
      utterance.rate = 1;
      utterance.pitch = 1;

      let voice = this.synth
        .getVoices()
        .filter((voice) => voice.lang === this.lngCode)[0];
      if (!voice) {
        voice = this.synth
          .getVoices()
          .filter((voice) => voice.lang === "en-US")[0];
      }
      utterance.voice = voice;

      if (!utterance) {
        return;
      }

      utterance.onstart = () => {
        this.isPlaying = true;
        this.renderButtonOn();
      };

      utterance.onend = () => {
        this.isPlaying = false;
        this.renderButtonOff();
      };
      
      this.synth.speak(utterance);
    }
  }

When the user no longer wants to hear the speech output, indicated either via another press of the button or by pressing the Escape key, we call cancel() from the SpeechSynthesis API.

stopSpeech() {
      if (this.synth) {
          this.synth.cancel();
          this.isPlaying = false;
          this.renderButtonOff();
      }
  }

Once the custom HTML element is fully defined, we register it with the desired tag name, speech-output-button:

customElements.define("speech-output-button", SpeechOutputButton);

To use this custom speech-output-button element in a chat application, we construct it dynamically each time that we've received a full response from an LLM, and call setAttribute to pass in the text to be spoken:

const speechOutput = document.createElement("speech-output-button");
speechOutput.setAttribute("text", answer);
messageDiv.appendChild(speechOutput);

You can see the full custom HTML element code in speech-output.js and the usage in index.html. This button also uses the same pulsing animation for the active state, defined in styles.css.

Acknowledgments

I want to give a huge shout-out to John Aziz for his amazing work adding speech input and output to the azure-search-openai-demo, as that was the basis for the code I shared in this blog post.

Wednesday, July 10, 2024

Playwright and Pytest parametrization for responsive E2E tests

I am a big fan of Playwright, a tool for end-to-end testing that was originally built for Node.JS but is also available in Python and other languages.

Playwright 101

For example, here's a simplified test for the chat functionality of our open-source RAG solution:

def test_chat(page: Page, live_server_url: str):

  page.goto(live_server_url)
  expect(page).to_have_title("Azure OpenAI + AI Search")
  expect(page.get_by_role("heading", name="Chat with your data")).to_be_visible()

  page.get_by_placeholder("Type a new question").click()
  page.get_by_placeholder("Type a new question").fill("Whats the dental plan?")
  page.get_by_role("button", name="Submit question").click()

  expect(page.get_by_text("Whats the dental plan?")).to_be_visible()

We then run that test using pytest and the pytest-playwright plugin on headless browsers, typically chromium, though other browsers are supported. We can run the tests locally and in our GitHub actions.

Viewport testing

We recently improved the responsiveness of our RAG solution, with different font sizing and margins in smaller viewports, plus a burger menu:

Screenshot of RAG chat at a small viewport size

Fortunately, Playwright makes it easy to change the viewport of a browser window, via the set_viewport_size function:

page.set_viewport_size({"width": 600, "height": 1024})

I wanted to make sure that all the functionality was still usable at all supported viewport sizes. I didn't want to write a new test for every viewport size, however. So I wrote this parameterized pytest fixture:

@pytest.fixture(params=[(480, 800), (600, 1024), (768, 1024), (992, 1024), (1024, 768)])
def sized_page(page: Page, request):
    size = request.param
    page.set_viewport_size({"width": size[0], "height": size[1]})
    yield page

Then I modified the most important tests to take the sized_page fixture instead:

def test_chat(sized_page: Page, live_server_url: str):
  page = sized_page
  page.goto(live_server_url)

Since our website now has a burger menu at smaller viewport sizes, I also had to add an optional click() on that menu:

if page.get_by_role("button", name="Toggle menu").is_visible():
    page.get_by_role("button", name="Toggle menu").click()

Now we can confidently say that all our functionality works at the supported viewport sizes, and if we have any regressions, we can add additional tests or viewport sizes as needed. So cool!

Monday, August 7, 2023

Accessibility snapshot testing for Python web apps (Part 2)

In my previous post, I showed a technique that used axe-core along with pytest and Playwright to make sure that pages in your web apps have no accessibility violations. That's a great approach if it works for you, but realistically, most webpages have a non-zero number of accessibility violations, due to limited engineering resources or dependence on third-party libraries. Do we just give up on being able to test them? No! Instead, we use a different approach: snapshot testing.


Snapshot testing

Snapshot testing is a way to test that the output of a function matches a previously saved snapshot.

Here's an example using pytest-snapshot:

def emojify(s):
    return s.replace('love', '❤️').replace('python', '🐍')

def test_function_output_with_snapshot(snapshot):
    snapshot.assert_match(emojify('I love python'), 'snapshot.txt')

The first time we run the test, it will save the output to a file. We check the generated snapshots into source control. Then, the next time anyone runs that test, it will compare the output to the saved snapshot.


Snapshot testing + axe-core

First, a big kudos to Michael Wheeler from UMich and his talk on Automated Web Accessibility Testing for the idea of using snapshot testing with axe-core.

Here's the approach: We save snapshots of the axe-core violations and check them into source control. That way, our tests will let us know when new violations come up, and our snapshot files keep track of which parts of the codebase need accessibility improvements.

To make it as easy as possible, I made a pytest plugin that combines Playwright, axe-core, and snapshot testing.

python3 -m pip install pytest-axe-playwright-snapshot
python3 -m playwright install --with-deps

Here's an example test from a Flask app:

from flask import url_for
from playwright.sync_api import Page

def test_index(page: Page, axe_pytest_snapshot):
    page.goto(url_for("index", _external=True))
    axe_pytest_snapshot(page)

Running the snapshot tests

First run: We specify the --snapshot-update argument to tell the plugin to save the snapshots to file.

python3 -m pytest --snapshot-update

That saves a file like this one to a directory named after the test and browser engine, like snapshots/test_violations/chromium/snapshot.txt:

color-contrast (serious) : 2
empty-heading (minor) : 1
link-name (serious) : 1

Subsequent runs: The plugin compares the new snapshot to the saved snapshot, and asserts if they differ.

python3 -m pytest

Let's look through some example outputs next.


Test results

New accessibility issue 😱

If there are violations in the new snapshot that weren't in the old, the test will fail with a message like this:

E  AssertionError: New violations found: html-has-lang (serious)
E  That's bad news! 😱 Either fix the issue or run `pytest --snapshot-update` to update the snapshots.
E  html-has-lang - Ensures every HTML document has a lang attribute
E    URL: https://dequeuniversity.com/rules/axe/4.4/html-has-lang?application=axeAPI
E    Impact Level: serious
E    Tags: ['cat.language', 'wcag2a', 'wcag311', 'ACT']
E    Elements Affected:
E    1) Target: html
E       Snippet: <html>
E       Messages:
E       * The <html> element does not have a lang attribute

Fixed accessibility issue 🎉

If there are less violations in the new snapshot than the old one, the test will also fail, but with a happy message like this:

E  AssertionError: Old violations no longer found: html-has-lang (serious).
E  That's good news! 🎉 Run `pytest --snapshot-update` to update the snapshots.

CI/CD integration

Once you've got snapshot testing setup, it's a great idea to run it on every potential change to your codebase.

Here's an example of a failing GitHub action due to an accessibility violation, using this workflow file: Screenshot of GitHub actions workflow that shows test failure due to accessibility violations

Fixing accessibility issues

What should you do if you realize you've introduced an accessibility violation, or if you are tasked with reducing existing violations? You can read the reports from pytest to get a gist for the accessibility violations, but it's often easier to use a browser extension that uses the same Axe-core rules.

Also consider an IDE extension like VS Code Axe Linter


Don't rely on automation to find all issues

I think it's really important for web apps to measure their accessibility violations, so that they can avoid introducing accessibility regressions and eventually resolve existing violations. However, it's really important to note that these automated tools can only go so far. According to the axe-core docs, it finds about 57% of WCAG issues automatically. There can still be many issues with your site, like with tab order or keyboard access.

In addition to automation, please consider other ways to discover issues, such as paying for an external accessibility audit, engaging with your disabled users, and hiring engineers with disabilities.

Friday, July 21, 2023

Automated accessibility audits for Python web apps (Part 1)

We all know by now the importance of accessibility for webpages. But it's surprisingly easy to create inaccessible web experiences, and unknowingly deploy those to production. How do we check for accessibility issues? One approach is to install a browser extension like Accessibility Insights and run that on changed webpages. I love that extension, but I don't trust myself to remember to run it. So I've been working on tools for running accessibility tests on Python web apps, which I'll present at next week's North Bay Python.

In this post, I'm going to share a way to automatically verify that a Python web app has *zero* accessibility issues -- or at least, zero issues that can be caught by automated testing. One should always do additional manual tests (like keyboard tests) and work with disabled users to discover all issues.

Setup

Here's what we'll need:

  • Playwright: A tool for end-to-end testing in various browser engines. Similar to Selenium, if you're familiar with that.
  • Axe-core: An accessibility engine for automated Web UI testing, built with JavaScript. Used by many other tools, like the Accessibility Insights browser extension.
  • axe-playwright-python: A package that I developed to connect the two together, running axe-core on Playwright pages and returning the results in useful formats.

For this example, I'll also use Flask, Pytest, and pytest-flask to run a local server during testing. However, you could easily use other frameworks (like Django and unittest).

The test

Here's the full code for a test of the four main routes on my personal website (pamelafox.org):

from axe_playwright_python.sync_playwright import Axe
from flask import url_for
from playwright.sync_api import Page

def test_a11y(app, live_server, page: Page):
    page.goto(url_for("home_page", _external=True))
    results = Axe().run(page)
    assert results.violations_count == 0, results.generate_report()

Let's break that down:

  • def test_a11y(app, live_server, page: Page):

    The app and live_server fixtures take care of starting up the app at a local URL. The app fixture comes from my conftest.py and the live_server fixture comes from pytest-flask.

  • page.goto(url_for("home_page", _external=True))

    I use the Page fixture from Playwright to navigate to a route from my app.

  • results = Axe().run(page)

    Using the Axe object from my axe-playwright-python package, I run axe-core on the page.

  • assert results.violations_count == 0, results.generate_report()

    I assert that the violations count is zero, but I also provide a human-friendly report as the assertion message. That way, if any violations were found, I'll see the report in the pytest output.

For the full code, see the tests/ folder in the GitHub repository.

The output

When there are no violations found, the test passes! 🎉

When there are any violations found, the pytest output looks like this:

    def test_a11y(app, live_server, page: Page):
        axe = Axe()
        page.goto(url_for("home_page", _external=True))
        results = axe.run(page)
>       assert results.violations_count == 0, results.generate_report()
E       AssertionError: Found 1 accessibility violations:
E         Rule Violated:
E         image-alt - Ensures  elements have alternate text or a role of none or presentation
E             URL: https://dequeuniversity.com/rules/axe/4.4/image-alt?application=axeAPI
E             Impact Level: critical
E             Tags: ['cat.text-alternatives', 'wcag2a', 'wcag111', 'section508', 'section508.22.a', 'ACT']
E             Elements Affected:
E             
E         
E               1)      Target: img
E                       Snippet: <img src="http://blog.pamelafox.org/bla.jpg">
E                       Messages:
E                       * Element does not have an alt attribute
E                       * aria-label attribute does not exist or is empty
E                       * aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty
E                       * Element has no title attribute
E                       * Element's default semantics were not overridden with role="none" or role="presentation"
E         
E       assert 1 == 0

I can then read the report, look for the HTML matching the snippet, and make it accessible. In the case above, there's an img tag missing an alt attribute. Once I fix that, the test passes.

Checking more routes

To check additional routes, I can either add more tests or I can parameterize the current test like so:

@pytest.mark.parametrize("route", ["home_page", "projects", "talks", "interviews"])
def test_a11y(app, live_server, page: Page, route: str):
    axe = Axe()
    page.goto(url_for(route, _external=True))
    results = axe.run(page)
    assert results.violations_count == 0, results.generate_report()

For testing a route where user interaction causes a change in the page, I can use Playwright to interact with the page and then run Axe after the interaction. Here's an example of that from another app:

def test_quiz_submit(page: Page, snapshot, fake_quiz):
    page.goto(url_for("quizzes.quiz", quiz_id=fake_quiz.id, _external=True))
    page.get_by_label("Your name:").click()
    page.get_by_label("Your name:").fill("Pamela")
    page.get_by_label("Ada Lovelace").check()
    page.get_by_label("pip").check()
    page.get_by_role("button", name="Submit your score!").click()
    expect(page.locator("#score")).to_contain_text("You scored 25% on the quiz.")
    results = Axe().run(page)
    assert results.violations_count == 0, results.generate_report()

Is perfection possible?

Fortunately, I was able to fix all of the accessibility violations for my very small personal website. However, many webpages are much bigger and more complicated, and it may not be possible to address all the violations. Is it possible to run tests like this in that situation? Yes, but we need to do something like snapshot testing: tracking the violations over time and ensuring that changes don't introduce additional violations. I'll show an approach for that in Part 2 of this blog post series. Stay tuned!