An AI-Powered Multimodal Shopping Assistant
A sophisticated shopping list curator that uses cutting-edge AI to understand your shopping needs through text, voice, or images, and provides intelligent product recommendations.
- Voice Input: Speak your shopping list naturally
- Image Recognition: Upload photos of handwritten lists or receipts
- AI-Powered Processing: Google Gemini AI breaks down complex requests
- Smart Recommendations: Semantic search for the best product matches
- Modern UI: Responsive React interface with Tailwind CSS
- Real-time Processing: Fast OCR and recommendation engine
- Framework: Django REST API
- Database: MongoDB with vector embeddings
- AI Models:
- Google Gemini 2.0 Flash for natural language processing
- SentenceTransformers for semantic similarity
- Mistral OCR for image text extraction
- Search: Cosine similarity matching for product recommendations
- Framework: React with modern hooks
- Styling: Tailwind CSS for responsive design
- Features: Voice recognition, image upload, real-time updates
- Python 3.8+
- Node.js 16+
- MongoDB
- API Keys for:
- Google Gemini AI
- Mistral OCR
-
Clone and navigate to backend
cd shopping-curator -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment variables Create
.envfile inshopping-curator/:GOOGLE_API_KEY=your_google_api_key MISTRAL_API_KEY=your_mistral_api_key MONGO_URI=mongodb://localhost:27017/
-
Run Django migrations
python manage.py migrate
-
Start the backend server
python manage.py runserver
-
Navigate to frontend
cd shopping-ui -
Install dependencies
npm install
-
Start the React app
npm start
-
Access the application Open http://localhost:3000
- Type your shopping items in the text area
- Click "Get Recommendations" to process
- View categorized product suggestions
- Click the microphone button
- Speak your shopping list naturally
- The app will transcribe and process your speech
- Click "Upload Image" button
- Select a photo of your handwritten list or receipt
- OCR will extract text and process items
Processes shopping list input and returns recommendations.
Request Body:
items(string): Text input of shopping itemsimage(file, optional): Image file for OCR processing
Response:
{
"items": [
{
"name": "milk",
"recommendations": [
{
"name": "Great Value Whole Milk",
"brand": "Great Value",
"price": 3.48,
"category": "Dairy"
}
]
}
]
}- Input Processing: Text, voice, or image input collection
- OCR Extraction: Mistral OCR converts images to text
- AI Breakdown: Google Gemini processes complex requests into individual items
- Embedding Generation: SentenceTransformers creates vector representations
- Similarity Search: Cosine similarity matching against product database
- Recommendation Ranking: Top 5 products returned per item
├── shopping-curator/ # Django Backend
│ ├── api/
│ │ ├── views.py # Main API endpoints
│ │ ├── chain.py # LangChain AI integration
│ │ └── models.py # Database models
│ ├── mysite/ # Django configuration
│ ├── mistral_ocr_inference.py # OCR processing
│ └── manage.py # Django management
├── shopping-ui/ # React Frontend
│ ├── src/
│ │ ├── App.js # Main React component
│ │ ├── components/ # Reusable components
│ │ └── index.js # App entry point
│ └── package.json # Dependencies
└── requirements.txt # Python dependencies
- Django - Web framework
- MongoDB - Document database
- Google Gemini AI - Natural language processing
- Mistral OCR - Image text extraction
- SentenceTransformers - Text embeddings
- LangChain - AI orchestration
- NumPy - Mathematical operations
- React 19 - UI framework
- Tailwind CSS - Utility-first styling
- Axios - HTTP client
- Web Speech API - Voice recognition
- Environment variables for API keys
- CORS configuration for secure cross-origin requests
- Input validation and error handling
- Secure file upload handling
- Configure production database
- Set environment variables
- Use WSGI server (Gunicorn)
- Set up reverse proxy (Nginx)
- Build production bundle:
npm run build - Deploy to static hosting (Netlify, Vercel)
- Configure API endpoint URLs
- Google for Gemini AI API
- Mistral AI for OCR capabilities
- Hugging Face for SentenceTransformers
- Open source community for the amazing tools and libraries