Skip to content

ulbmuenster/dataasee

Repository files navigation

DatAasee Logo DatAasee (0.5)

DatAasee centralizes and interlinks distributed library/research metadata into an API‑first union catalog.

DatAasee schematic

A Metadata-Lake for Libraries

Licenses: MIT (add. CC-BY for openapi.yaml)

Function: Metadata-Lake, Metadata Catalog, Metadata Aggregator, Union Catalog

Audience: University Libraries, Research Libraries, Academic Libraries, Scientific Libraries

Documentation

Getting Started (Deployment)

Quick Start (Prepare a dedicated directory, inside run:)

$ wget https://raw.githubusercontent.com/ulbmuenster/dataasee/0.5/compose.yaml
$ mkdir -p -m 766 backup
$  DL_PASS=password1 DB_PASS=password2 docker compose up

Web: http://localhost:8000 (API: http://localhost:8343/api/v1/ )

  • Depends on docker compose (and compatible to docker and podman)
  • To deploy, no need to clone, just use the compose.yaml file.
  • See the Deploy Documentation for details.

Tech Stack Canvas

  • Setting: Many distributed data and metadata sources
  • Goals:
    • Centralize metadata
    • Interlinked metadata catalog
    • Super-index for bibliographic and research data
  • Features:
    • Interact through HTTP-API (JSON)
    • Search by filter, full-text, source, doi
    • Custom query via: SQL, Gremlin, Cypher, MQL, GraphQL
  • Frontend: Lowdefy (Optional)
  • Backend: Connect (fmr. Benthos)
  • Data Storage: ArcadeDB (Graph Database)
  • Infrastructure: Compose (via Docker or Podman)
  • Deployment: via Harbor (at Uni Münster)
  • Monitoring: Container Logs (local logging driver)
  • Integrations:
    • Protocols: OAI-PMH (HTTP), S3 (HTTP), GET (HTTP), DatAasee (HTTP)
    • Encodings: XML (Plain-Text)
    • Formats: DataCite (XML), DC (XML), LIDO (XML), MARC (XML), MODS (XML)
  • Exports: DataCite (JSON), BibJSON (JSON)
  • Security: Privileged endpoints (CQRS)
  • Testing: check-jsonschema
  • Development: Github

Default Ports

  • 8343 DatAasee API
  • 8000 Web Frontend
  • 2480 Database API (Development Container Images Only)
  • 9999 Database JMX (Development Container Images Only)

API Cheat Sheet

Repository Contents

  • api/ API definition and message schemas
  • assets/ Logos and style definition
  • backend/ Processor pipeline and component definitions
  • container/ Dockerfiles
  • database/ Database initialization, schemas and enumerated data
  • docs/ Documentation of software, data and architecture
  • frontend/ Prototype frontend definition
  • tests/ Test definitions and data

Getting Started (Development)

  • Available make targets:
    • make setup Build server images (builds development images)
    • make start Start servers
    • make stop Stop servers
    • make reset Stop and start servers
    • make build Build release images (pass REGISTRY= to set container image registry)
    • make empty Delete database backups
    • make logs Show logs (requires grep)
    • make peak Report peak database memory usage (requires grep)
    • make test Run tests (requires check-jsonschema, busybox, wget)
    • make tidy List violations of StrictYAML (requires yamllint)
    • make todo List inline TODOs in repo (requires grep)
  • Custom make variable: COMPOSE (set Compose implementation)

Contributors

tl;dr

DatAasee is centralized Metasearch for distributed Metadata.