Skip to content

feat: Automatic Scraping & Parsing of Academic Calendar #199

@amaydixit11

Description

@amaydixit11

Feature Request: Automatic Scraping & Parsing of Academic Calendar


Problem Statement

Current Issue:
The academic calendar is uploaded on the IIT Bhilai website as a PDF/HTML table, and currently admins must manually copy dates into our Calendar component. This is time-consuming, error-prone, and requires updates every semester/year.


Proposed Solution

✅ Automatic Academic Calendar Scraping

✅ Integration with Existing Calendar UI

  • New events should automatically appear in the existing Calendar component with category-appropriate icons and color coding.
  • Admins should have ability to review/edit scraped data before publishing (optional improvement).

Technical Notes

Layer Requirement
Scraper A cron-triggered script
Data Parsing Match patterns like: "holiday", "exam", "commencement", "registration"
Database Add relevant fields to calendar events table
Retry & Fail-Safe If scraping fails, continue using last known data
Optional Cache PDF locally with versioning for historical comparison

Possible Libraries

  • BeautifulSoup4 / lxml (for HTML scraping)
  • PyMuPDF or pdfplumber if format changes back to PDF

Alternatives Considered

  1. Manual Upload of Events
    ❌ Still requires regular admin effort
  2. Direct API Feed from Institute Website
    ❌ No such API currently exists

Mockups & Visual Examples

  • Checkout the figma design mockups in README

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions