Flask Scrapy Caching Application (BigBossScraper)

This project is a web scraping and caching application built with Flask and Scrapy. It fetches product information from two different websites, caches the data, and serves it through a web interface. The cache is updated periodically to ensure the data remains fresh.

Features

Scrapes product data (images and titles) from two websites.
Caches scraped data locally to reduce redundant network requests.
Background thread for periodic cache updates.
Flask-based web interface to display the data.
Organized and modular codebase for maintainability.

Prerequisites

Python 3.7+
pip (Python package manager)

Installation

Clone the repository:

https://github.com/aryala7/BigBossScraper.git
cd flask-scrapy-caching

Set up a virtual environment (optional but recommended):

python3 -m venv venv
source venv/bin/activate # On Windows, use venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Start the application:
```
python app.py
```
Open your browser and navigate to http://127.0.0.1:5000/ to view the application.

Project Structure

.
├── app.py                 # Main Flask application
├── cache_manager.py       # Handles data scraping and caching logic
├── templates/             # HTML templates for the web interface
├── static/                # Static assets (CSS, JS, images)
├── requirements.txt       # Python dependencies
└── README.md              # Project documentation

Endpoints

`/`

Displays a random sample of 20 products from each category.

`/category/<name>`

Displays all products for a specified category (glas or fliese).

Configuration

Cache expiration: The cache is updated every 15 minutes by default. You can modify CACHE_EXPIRATION in cache_manager.py to change this interval.

Development

Adding a New Feature

Create a new branch:
```
git checkout -b feature-name
```
Make your changes and commit them:
```
git commit -am "Add new feature"
```
Push the branch and create a pull request:
```
git push origin feature-name
```

Testing

Currently, there are no automated tests included. You can add tests using frameworks like pytest or unittest.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributions

Contributions are welcome! Feel free to open issues or submit pull requests with improvements or fixes.

Acknowledgments

Flask: For providing a lightweight web framework.
Scrapy: For its powerful web scraping capabilities.
Requests: For handling HTTP requests.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
static/assets		static/assets
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
application.py		application.py
cache_manager.py		cache_manager.py
fliese.json		fliese.json
products.json		products.json
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Flask Scrapy Caching Application (BigBossScraper)

Features

Prerequisites

Installation

Usage

Project Structure

Endpoints

`/`

`/category/<name>`

Configuration

Development

Adding a New Feature

Testing

License

Contributions

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

aryala7/BigBossScraper

Folders and files

Latest commit

History

Repository files navigation

Flask Scrapy Caching Application (BigBossScraper)

Features

Prerequisites

Installation

Usage

Project Structure

Endpoints

/

/category/<name>

Configuration

Development

Adding a New Feature

Testing

License

Contributions

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`/`

`/category/<name>`

Packages