Tom Yahav

Posted on Apr 7 • Edited on Apr 17

How to Build an AI Image Caption Generator in Vue 3

#vue #ai #frontend #programming

Introduction & Context

Images are everywhere—but they often lack proper descriptions. Whether for accessibility, SEO, or UX, adding meaningful captions is essential. But what if we could generate them automatically using AI?

In this article, we’ll walk you through building a Vue 3 component that uses the Hugging Face Inference API to generate human-like captions for uploaded images. You’ll learn how to create a drag-and-drop uploader, handle image previews, and get smart captions with just a few lines of code.

Perfect for frontend devs looking to bring intelligent features into visual interfaces!

Goals and What You’ll Learn

By the end of this tutorial, you’ll be able to:
Build a drag-and-drop or file upload component
Send images to an AI image captioning model
Display the uploaded image preview and its AI-generated caption
Handle loading and error states gracefully
Enhance accessibility with alt-text or voiceover features

Tech Stack

Vue 3 +
Axios for API calls
Hugging Face Inference API (BLIP or similar image-to-text model)
Tailwind CSS (optional for styling)

Prerequisites

Create a free account on Hugging Face.
Generate an API token from your account settings.

Store it in an .env file:
VITE_HUGGINGFACE_API_KEY=your_token_here

Building the Component: ImageCaptioner.vue

<template>
  <div class="max-w-lg mx-auto p-4 border rounded bg-white">
    <h2 class="text-lg font-semibold mb-2">AI Image Caption Generator</h2>

    <input type="file" accept="image/*" @change="handleUpload" class="mb-4" />

    <div v-if="imageUrl" class="mb-4">
      <img :src="https://dev.to/imageUrl" alt="Uploaded preview" class="rounded shadow" />
    </div>

    <button
      @click="generateCaption"
      :disabled="!imageBlob || loading"
      class="bg-blue-600 text-white px-4 py-2 rounded disabled:opacity-50"
    >
      {{ loading ? 'Generating...' : 'Generate Caption' }}
    </button>

    <div v-if="caption" class="mt-4 p-2 border rounded bg-gray-100">
      <strong>Caption:</strong>
      <p>{{ caption }}</p>
    </div>
  </div>
</template>

<script setup>
import { ref } from 'vue'
import axios from 'axios'

const imageUrl = ref(null)
const imageBlob = ref(null)
const caption = ref('')
const loading = ref(false)

const handleUpload = (e) => {
  const file = e.target.files[0]
  if (!file) return

  imageBlob.value = file
  imageUrl.value = URL.createObjectURL(file)
  caption.value = ''
}

const generateCaption = async () => {
  if (!imageBlob.value) return

  loading.value = true
  caption.value = ''

  const formData = new FormData()
  formData.append('file', imageBlob.value)

  try {
    const response = await axios.post(
      'https://api-inference.huggingface.co/models/Salesforce/blip-image-captioning-base',
      imageBlob.value,
      {
        headers: {
          Authorization: `Bearer ${import.meta.env.VITE_HUGGINGFACE_API_KEY}`,
          'Content-Type': 'application/octet-stream'
        }
      }
    )

    caption.value = response.data[0]?.generated_text || 'No caption returned.'
  } catch (err) {
    console.error('Error:', err)
    caption.value = 'An error occurred. Try again.'
  } finally {
    loading.value = false
  }
}
</script>

Accessibility and UX Improvements

Image Preview: Helps users confirm they uploaded the correct file.

Disabled Button State: Prevents repeated submissions.

Alt Text: Use the generated caption as an alt attribute for better accessibility.

💡 Tip: You can add aria-live="polite" to the caption container to make it screen reader-friendly.

Feature Ideas for Enhancement

Add a copy caption button.
Automatically insert the caption as alt for the image.
Add text-to-speech (TTS) using the Web Speech API.
Support multiple captions or translation with AI.

Links and References

Hugging Face Inference API
BLIP Model
Vue 3 Docs
Axios

Summary and Conclusion

You’ve just built an intelligent, user-friendly image captioning tool powered by Hugging Face and Vue 3. This type of component is a great example of how frontend developers can leverage AI to improve usability and accessibility in real-world apps.

With just a few tools, you can turn static interfaces into intelligent, responsive experiences—and delight your users in the process.

Call to Action / Community Engagement

What other AI features would you want to see in Vue components? Have you tried AI for image recognition, tagging, or alt-text generation?

Drop your thoughts, feedback, or experiments in the comments below!

DEV Community