This project is a web scraping and caching application built with Flask and Scrapy. It fetches product information from two different websites, caches the data, and serves it through a web interface. The cache is updated periodically to ensure the data remains fresh.
- Scrapes product data (images and titles) from two websites.
- Caches scraped data locally to reduce redundant network requests.
- Background thread for periodic cache updates.
- Flask-based web interface to display the data.
- Organized and modular codebase for maintainability.
- Python 3.7+
- pip (Python package manager)
-
Clone the repository:
https://github.com/aryala7/BigBossScraper.git cd flask-scrapy-caching -
Set up a virtual environment (optional but recommended):
python3 -m venv venv source venv/bin/activate # On Windows, use venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Start the application:
python app.py
-
Open your browser and navigate to
http://127.0.0.1:5000/to view the application.
.
├── app.py # Main Flask application
├── cache_manager.py # Handles data scraping and caching logic
├── templates/ # HTML templates for the web interface
├── static/ # Static assets (CSS, JS, images)
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Displays a random sample of 20 products from each category.
- Displays all products for a specified category (
glasorfliese).
- Cache expiration: The cache is updated every 15 minutes by default. You can modify
CACHE_EXPIRATIONincache_manager.pyto change this interval.
- Create a new branch:
git checkout -b feature-name
- Make your changes and commit them:
git commit -am "Add new feature" - Push the branch and create a pull request:
git push origin feature-name
Currently, there are no automated tests included. You can add tests using frameworks like pytest or unittest.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Feel free to open issues or submit pull requests with improvements or fixes.
- Flask: For providing a lightweight web framework.
- Scrapy: For its powerful web scraping capabilities.
- Requests: For handling HTTP requests.