Releases · yusufkaraaslan/Skill_Seekers

30 May 20:46

v3.7.0

c71befe

v3.7.0 Latest

Latest

[3.7.0] - 2026-05-30

Theme: AI-driven project knowledge base (skill-seekers scan) — bootstrap a complete skill set for a project in one command, with safety/observability/coverage hardening throughout.

Added

skill-seekers scan <dir> command (#327) — point at any project; an AI agent inspects manifests, README, Dockerfile/CI, sampled source files (first 2 KB each), and the git remote, then emits one Skill Seekers config per detected framework plus a <project>-codebase.json for the project's own code. Each config stamped with metadata.detected_version so re-scans report added / version-bumped / removed dependencies. Internationalized canonical-name resolver (CJK + EU language suffixes) so detections like "Godot 引擎" resolve godot. Out-dir cache means re-scans reuse prior emissions and respect manual edits. Doctor-style report with pluralized counts and resolved / AI-generated / unresolved / archived breakdown.
Coverage: scan recognizes ~50 manifest types (Pipfile, environment.yml, deno.json, flake.nix, Chart.yaml, stack.yaml, deps.edn, dune-project, BUILD.bazel, …) and walks src/lib/app/cmd/crates/packages/apps/services/backend/frontend plus root-level files (catches Django, flat-layout Python, Go, Rust workspaces, JS monorepos).
Cost + safety flags: --max-ai-generations N (default 10) caps unbounded AI generation for monorepos; --dry-run previews what would be emitted without writing or invoking AI; --probe-urls HEAD-probes AI-generated URLs with retry-on-404; --no-fetch / --no-generate / --no-publish-prompt for offline / CI use.
Community submission (opt-in): freshly AI-generated configs can be submitted to the community registry via a native-async flow. Pre-checks GITHUB_TOKEN, idempotency-guards against duplicate issues, retries transient failures with backoff.
Archival: configs that disappear from detections are moved (not deleted) to out_dir/.archived/<UTC-timestamp>/ so the user never loses hand-edited work and out_dir stays clean.
Docs: new docs/getting-started/05-scan-a-project.md; entries in README, FAQ, CLI Reference, Feature Matrix, Config Format, Environment Variables, and the Quick Start cross-link.

Changed

CLI dispatch unified (#327) — scan and doctor now consume the parsed-args namespace directly via Command(args).execute() instead of building a second argparse.ArgumentParser. Eliminates the _reconstruct_argv hack for these commands; remaining ~14 commands flagged for migration.
Config schema: detected_version lives under metadata.detected_version (alongside metadata.version for the config-schema version) rather than at top level. Backwards-compatible reader; old top-level placements migrate on next stamp.
SourceDetector.CODE_PROJECT_MARKERS is now public (was _CODE_PROJECT_MARKERS); cross-module callers no longer reach into a private attribute.

Fixed

Correctness (#327) — diff layer keyed by stable filename slug instead of internal config name (eliminates phantom add/remove churn); resolve_config_path lookups now append .json so local-disk + user-dir paths actually find files; out-dir cache prevents redundant API/AI calls on re-scan; lowercase filename slugs prevent duplicate-file accumulation across runs.
Safety (#327) — atomic JSON writes via os.replace so SIGINT mid-write can't corrupt a config and silently flip it to "removed" on the next scan; _safe_size guards stat() so a broken symlink in src/ no longer crashes the scan; AgentClient.call exceptions caught and logged; AI-generated config names rejected if they fail the registry regex; URL probe catches AI hallucinations of base_url before writing.
Observability (#327) — logging.basicConfig in scan so logger.warning/error reaches the user (was silently dropped); non-zero exit code when no configs and no codebase config were emitted, so CI pipelines detect total-failure scans.
Publish flow (#327) — native async (asyncio.run at single entry, asyncio.to_thread for input()); pre-check GITHUB_TOKEN with actionable hint instead of asking N "yes/no" questions and failing N times; idempotency check (search existing open issues) prevents duplicate submissions; retry with backoff on transient failures; nested-event-loop detection with clear message instead of opaque traceback.

Assets 2

03 May 10:54

github-actions

v3.6.0

4cd5140

v3.6.0

[3.6.0] - 2026-05-03

Theme: Quality-of-life release — packaging targets, GitHub issue workflow, codebase analysis fixes, and source detection hardening.

Added

IBM Bob packaging target — new --target bob adaptor and agent install support for IBM's Bob agent platform (#366)
GitHub issue filtering — --github-issue-state, --github-issue-labels, and --github-issue-since filters in the GitHub scraper for narrowing which issues are pulled (#367)
Per-issue files — GitHub scraper now writes one Markdown file per issue instead of a single bundle, improving navigation and downstream chunking (#367)
Pinecone frontmatter — Pinecone vector exports now include consistent YAML frontmatter for metadata round-tripping (#367)

Fixed

Unified scraper now generates codebase_analysis/ index — local sources were producing C3.x outputs with broken SKILL.md links; the unified skill builder now wires up the index and resolves links correctly (#362, #376)
Guides fallback fires correctly — unified_skill_builder was emitting a truthy placeholder for empty guides which suppressed the fallback content; placeholder removed (#364, #375)
HTML URLs no longer treated as local files — source_detector now checks for http(s):// before falling through to the local-path branch, fixing false-positive routing (#373)
PDF extracted images appear in markdown — pdf_scraper now inserts ![](…) references for images extracted from PDFs so they render in the generated SKILL.md (#369)
C3.x output for local sources — unified command was skipping the C3.x analysis pipeline for local codebase sources; now emits the full pattern/test/guide/config/router output (#363, #372)
Language filter passed to C3.x clone analysis — repos cloned for analysis now respect --languages instead of analyzing every file (fixes #361, #370)
Unity vs Unreal detection — Unity projects with C# imports were being misidentified as Unreal; detection now keys on C# import patterns (fixes #365, #368)

Assets 2

12 Apr 19:00

github-actions

v3.5.1

006822f

v3.5.1

[3.5.1] - 2026-04-12

Added

Centralized defaults.json config — single source of truth for all default values (rate_limit, max_pages, workers, async_mode, enhancement, analysis, RAG settings). New defaults.py loader module. All 15+ files that previously hardcoded defaults now read from this file (#356)
Low-signal code snippet filtering — _is_low_signal_code_snippet() filters junk patterns like bare True, options, single identifiers from quick references (#360)
Pattern description normalization — _normalize_pattern_description() cleans boilerplate prefixes and truncates to first meaningful sentence (#360)
Example language priority ranking — _example_language_priority() ranks Python > Bash > JSON > etc. for SKILL.md examples (#360)
checkpoint_exists() method on DocToSkillConverter — was called but never defined (#360)
Unified config source normalization — DocToSkillConverter.__init__ merges fields from sources[0] into flat config for compatibility (#360)
display_name support in SKILL.md generation — produces cleaner titles and slugs (#360)
New tests: test_doc_scraper_entrypoint.py (regression for _run_scraping), quick-reference quality tests, docs-only compatibility tests, nested reference coverage tests (#360)

Changed

max_pages default is now unlimited (-1) — the scraper fetches all pages unless the user explicitly sets --max-pages. Previously defaulted to 500 (#356)
--no-rate-limit flag now works — was defined in CLI arguments but never consumed by ExecutionContext (#356)
constants.py reads from defaults.json — no longer contains hardcoded magic numbers (#356)
ExecutionContext.ScrapingSettings — rate_limit and max_pages now use real defaults instead of None, preventing None-poisoning downstream (#356)
SKILL.md frontmatter cleanup — empty doc_version: and version: fields are now omitted; placeholder sections removed (#360)
Enhancement routing through platform adaptors instead of importing nonexistent enhance_skill_md helper (#360)
quality_metrics.py uses rglob for nested reference directories in unified skills (#360)

Fixed

TypeError: '>' not supported between instances of 'NoneType' and 'int' — rate_limit defaulted to None in ExecutionContext, which flowed through config.get("rate_limit", DEFAULT) (dict.get returns None when the key exists with value None, ignoring the fallback). Fixed in doc_scraper.py (sync + async paths), estimate_pages.py, and sync_config.py (#356, #359)
discover_urls() loop never executed with unlimited max_pages — len(discovered) < -1 is always False. Added unlimited mode guard (#356)
converter.scrape() called nonexistent method in _run_scraping() — changed to converter.scrape_all() (#360)
None-safety for BeautifulSoup attributes — link["href"], sitemap.text, meta_desc["content"] guarded against None XML text nodes (#360)
Python 3.10 compatibility — backslash in f-string in quality_metrics.py not supported before 3.12 (#360)

Assets 2

11 Apr 13:00

github-actions

v3.5.0

c21749a

v3.5.0

[3.5.0] - 2026-04-09

Theme: Grand Unification — one command, one interface, direct converters. Agent-agnostic architecture, marketplace pipeline, smart SPA discovery, all content extraction enabled by default. 80+ files changed across the codebase.

Added

Grand Unification — unified create command as single entry point for all 18 source types with auto-detection, direct converter invocation, and centralized enhancement (#346)
Agent-agnostic AgentClient abstraction — all 5 enhancers now support Claude, Kimi, Codex, Copilot, OpenCode, and custom agents via a unified interface. Auto-detects agent from API keys instead of hardcoding (#336)
Kimi CLI integration with stdin piping and output parsing (#336)
MarketplacePublisher — publish skills to Claude Code plugin marketplace repos (#336)
MarketplaceManager — register and manage marketplace repositories (#336)
ConfigPublisher — push configs to registered config source repos (#336)
push_config MCP tool for automated config publishing (#336)
Smart SPA discovery engine — three-layer discovery: sitemap.xml, llms.txt, SPA nav rendering (#336)
"browser": true config support for JavaScript SPA sites with browser renderer timeout defaults (60s, domcontentloaded) (#336)
Dynamic routing via _build_argv() — replaced manual arg forwarding with dynamic forwarder, added 7 missing CLI flags (#336)
Kotlin language support for codebase analysis — Full C3.x pipeline support: AST parsing (classes, objects, functions, data/sealed classes, extension functions, coroutines), dependency extraction, design pattern recognition (object declaration→Singleton, companion object→Factory, sealed class→Strategy), test example extraction (JUnit, Kotest, MockK, Spek), language detection patterns, config detection (build.gradle.kts), and extension maps across all analyzers (#287)
Headless browser rendering (--browser flag) — uses Playwright to render JavaScript SPA sites (React, Vue, etc.) that return empty HTML shells. Auto-installs Chromium on first use. Optional dep: pip install "skill-seekers[browser]" (#321)
skill-seekers doctor command — 8 diagnostic checks (Python version, package install, git, core/optional deps, API keys, MCP server, output dir) with pass/warn/fail status and --verbose flag (#316)
Prompt injection check workflow — bundled prompt-injection-check workflow scans scraped content for injection patterns (role assumption, instruction overrides, delimiter injection, hidden instructions). Added as first stage in default and security-focus workflows. Flags suspicious content without removing it (#324)
Codex CLI plugin manifest (.codex-plugin/plugin.json) for OpenAI Codex integration (#350)
6 behavioral UML diagrams — 3 sequence (create pipeline, GitHub+C3.x flow, MCP invocation), 2 activity (source detection, enhancement pipeline), 1 component (runtime dependencies with interface contracts)
134 new tests — test_agent_client.py, test_config_publisher.py, _build_argv tests. Total: 3194 passed, 39 expected skips (#336)

Changed

All content extraction features enabled by default — pattern detection, test examples, how-to guides, config extraction, and router generation no longer require explicit opt-in
Renamed claude-enhanced merge mode to ai-enhanced — backward compatibility alias kept (#336)
Removed 118+ hardcoded Claude references across 60+ files (#336)
Refactored 5 enhancers to use AgentClient abstraction (#336)
Removed 50-file GitHub API analysis limit (#336)
Removed 100-file config extraction limit (#336)
Fixed unified scraper default max_pages from 100 to 500 (#336)
Centralized enhancement timeouts to 45min default with unlimited support (#336)
Excluded slow MCP/e2e tests from CI coverage step to prevent timeout

Fixed

glob('*.md') replaced with rglob('*.md') in all adaptors — fixes packaging when skills are in nested directories (#349)
scraped_data list-vs-dict bug in conflict detection (#336)
base_url passthrough to doc scraper subprocess (#336)
URL filtering now uses base directory correctly (#336)
C3.x analysis data loss (#336)
--enhance-level flag not passed correctly (#336)
guide_enhancer method rename — _call_claude_api renamed to _call_ai (#336)
11 pre-existing test failures fixed (#336)
Per-file language detection in GitHub scraper (#336)
GitHub language detection crashes with TypeError when API response contains non-integer metadata keys (e.g., "url") — now filters to integer values only (#322)
C3.x codebase analysis crashes with TypeError — _run_c3_analysis() and _analyze_c3x() passed removed enhance_with_ai/ai_mode kwargs to analyze_codebase() instead of enhance_level (#323)

Security

Removed command injection via cloned repo script execution (#336)
Replaced git add -A with targeted staging in marketplace publisher (#336)
Clear auth tokens from cached .git/config after clone (#336)
Use defusedxml for sitemap XML parsing (XXE protection) (#336)
Path traversal validation for config names (#336)

Assets 2

25 Mar 19:21

yusufkaraaslan

v3.4.0

336ab6a

v3.4.0 — 12 LLM Platforms, SPA Detection, UML Architecture

What's New in v3.4.0

Theme: 8 new LLM platform adaptors (12 total), 7 new CLI agent paths (18 total), OpenCode skill tools, SPA site detection, 8 bug fixes, and full UML architecture documentation.

Platform Expansion: 5 → 12 LLM Targets

New Platform	Flag	Base
OpenCode	`--target opencode`	Directory-based, dual YAML
Kimi	`--target kimi`	OpenAI-compatible
DeepSeek	`--target deepseek`	OpenAI-compatible
Qwen	`--target qwen`	OpenAI-compatible
OpenRouter	`--target openrouter`	OpenAI-compatible
Together AI	`--target together`	OpenAI-compatible
Fireworks AI	`--target fireworks`	OpenAI-compatible

All new platforms inherit from a shared OpenAI-compatible base class for consistent behavior.

Agent Expansion: 11 → 18 Install Paths

New agents: roo, cline, aider, bolt, kilo, continue, kimi-code

OpenCode Skill Tools

Skill splitter — auto-split large docs into focused sub-skills with router
Bi-directional converter — import/export between OpenCode and any platform format

Distribution

Smithery manifest (smithery.yaml)
GitHub Actions template for automated skill updates
Claude Code Plugin with slash commands

Bug Fixes

sanitize_url() crash on Python 3.14 strict urlparse (#284)
Blind /index.html.md append breaking non-Docusaurus sites (#277)
Unified scraper temp config format (#317)
Unicode arrows breaking Windows cp1252 terminals
CLI flags in plugin slash commands
MiniMax adaptor improvements (#319)
Misleading "Scraped N pages" count — now shows (N saved, M skipped) (#320)
SPA site detection — warns when site requires JavaScript rendering (#320, #321)

Documentation

Full UML architecture — 14 class diagrams synced from source code via StarUML
StarUML HTML API reference export
Ecosystem section linking all Skill Seekers repos
Architecture references in README and CONTRIBUTING
Consolidated Docs/ into docs/

Test Results

2929 passed, 39 skipped, 0 failures

Install / Upgrade

pip install --upgrade skill-seekers

Full changelog: https://github.com/yusufkaraaslan/Skill_Seekers/blob/main/CHANGELOG.md

Assets 2

15 Mar 22:27

github-actions

v3.3.0

2b725aa

v3.3.0

[3.3.0] - 2026-03-16

Theme: 10 new source types (17 total), EPUB unified integration, sync-config command, performance optimizations, 12 README translations, and 19 bug fixes. 117 files changed, +41,588 lines since v3.2.0.

Supported Source Types (17)

#	Type	CLI Command	Config Type	Auto-Detection
1	Documentation (web)	`scrape` / `create <url>`	`documentation`	HTTP/HTTPS URLs
2	GitHub repository	`github` / `create owner/repo`	`github`	`owner/repo` or github.com URLs
3	PDF document	`pdf` / `create file.pdf`	`pdf`	`.pdf` extension
4	Word document	`word` / `create file.docx`	`word`	`.docx` extension
5	EPUB e-book	`epub` / `create file.epub`	`epub`	`.epub` extension
6	Video	`video` / `create <url/file>`	`video`	YouTube/Vimeo URLs, video extensions
7	Local codebase	`analyze` / `create ./path`	`local`	Directory paths
8	Jupyter Notebook	`jupyter` / `create file.ipynb`	`jupyter`	`.ipynb` extension
9	Local HTML	`html` / `create file.html`	`html`	`.html`/`.htm` extensions
10	OpenAPI/Swagger	`openapi` / `create spec.yaml`	`openapi`	`.yaml`/`.yml` with OpenAPI content
11	AsciiDoc	`asciidoc` / `create file.adoc`	`asciidoc`	`.adoc`/`.asciidoc` extensions
12	PowerPoint	`pptx` / `create file.pptx`	`pptx`	`.pptx` extension
13	RSS/Atom feed	`rss` / `create feed.rss`	`rss`	`.rss`/`.atom` extensions
14	Man pages	`manpage` / `create cmd.1`	`manpage`	`.1`–`.8`/`.man` extensions
15	Confluence wiki	`confluence`	`confluence`	API or export directory
16	Notion pages	`notion`	`notion`	API or export directory
17	Slack/Discord chat	`chat`	`chat`	Export directory or API

Added

10 New Skill Source Types (17 total)

Skill Seekers now supports 17 source types — up from 7. Every new type is fully integrated into the CLI (skill-seekers <type>), create command auto-detection, unified multi-source configs, config validation, the MCP server, and the skill builder.

Jupyter Notebook — skill-seekers jupyter --notebook file.ipynb or skill-seekers create file.ipynb
- Extracts markdown cells, code cells with outputs, kernel metadata, imports, and language detection
- Handles single files and directories of notebooks; filters .ipynb_checkpoints
- Optional dependency: pip install "skill-seekers[jupyter]" (nbformat)
- Entry point: skill-seekers-jupyter
Local HTML — skill-seekers html --html-path file.html or skill-seekers create file.html
- Parses HTML using BeautifulSoup with smart main content detection (<article>, <main>, .content, largest div)
- Extracts headings, code blocks, tables (to markdown), images, links; converts inline HTML to markdown
- Handles single files and directories; supports .html, .htm, .xhtml extensions
- No extra dependencies (BeautifulSoup is a core dep)
OpenAPI/Swagger — skill-seekers openapi --spec spec.yaml or skill-seekers create spec.yaml
- Parses OpenAPI 3.0/3.1 and Swagger 2.0 specs from YAML or JSON (local files or URLs via --spec-url)
- Extracts endpoints, parameters, request/response schemas, security schemes, tags
- Resolves $ref references with circular reference protection; handles allOf/oneOf/anyOf
- Groups endpoints by tags; generates comprehensive API reference markdown
- Source detection sniffs YAML file content for openapi: or swagger: keys (avoids false positives on non-API YAML files)
- Optional dependency: pip install "skill-seekers[openapi]" (pyyaml — already a core dep, guard added for safety)
AsciiDoc — skill-seekers asciidoc --asciidoc-path file.adoc or skill-seekers create file.adoc
- Regex-based parser (no external library required) with optional asciidoc library support
- Extracts headings (= through =====), [source,lang] code blocks, |=== tables, admonitions (NOTE/TIP/WARNING/IMPORTANT/CAUTION), and include:: directives
- Converts AsciiDoc formatting to markdown; handles single files and directories
- Optional dependency: pip install "skill-seekers[asciidoc]" (asciidoc library for advanced rendering)
PowerPoint (.pptx) — skill-seekers pptx --pptx file.pptx or skill-seekers create file.pptx
- Extracts slide text, speaker notes, tables, images (with alt text), and grouped shapes
- Detects code blocks by monospace font analysis (30+ font families)
- Groups slides into sections by layout type; handles single files and directories
- Optional dependency: pip install "skill-seekers[pptx]" (python-pptx)
RSS/Atom Feeds — skill-seekers rss --feed-url <url> / --feed-path file.rss or skill-seekers create feed.rss
- Parses RSS 2.0, RSS 1.0, and Atom feeds via feedparser
- Optionally follows article links (--follow-links, default on) to scrape full page content using BeautifulSoup
- Extracts article titles, summaries, authors, dates, categories; configurable --max-articles (default 50)
- Source detection matches .rss and .atom extensions (.xml excluded to avoid false positives)
- Optional dependency: pip install "skill-seekers[rss]" (feedparser)
Man Pages — skill-seekers manpage --man-names git,curl / --man-path dir/ or skill-seekers create git.1
- Extracts man pages by running man command via subprocess or reading .1–.8/.man files directly
- Handles gzip/bzip2/xz compressed man files; strips troff/groff formatting (backspace overstriking, macros, font escapes)
- Parses structured sections (NAME, SYNOPSIS, DESCRIPTION, OPTIONS, EXAMPLES, SEE ALSO)
- Source detection uses basename heuristic to avoid false positives on log rotation files (e.g., access.log.1)
- No external dependencies (stdlib only)
Confluence — skill-seekers confluence --base-url <url> --space-key <key> or --export-path dir/
- API mode: fetches pages from Confluence REST API with pagination (atlassian-python-api)
- Export mode: parses Confluence HTML/XML export directories
- Extracts page content, code/panel/info/warning macros, page hierarchy, tables
- Optional dependency: pip install "skill-seekers[confluence]" (atlassian-python-api)
Notion — skill-seekers notion --database-id <id> / --page-id <id> or --export-path dir/
- API mode: fetches pages via Notion API with support for 20+ block types (paragraph, heading, code, callout, toggle, table, etc.)
- Export mode: parses Notion Markdown/CSV export directories
- Extracts rich text with annotations (bold, italic, code, links), 16+ property types for database entries
- Optional dependency: pip install "skill-seekers[notion]" (notion-client)
Slack/Discord Chat — skill-seekers chat --export-path dir/ or --token <token> --channel <channel>
- Slack: parses workspace JSON exports or fetches via Slack Web API (slack_sdk)
- Discord: parses DiscordChatExporter JSON or fetches via Discord HTTP API
- Extracts messages, code snippets (fenced blocks), shared URLs, threads, reactions, attachments
- Generates per-channel summaries and topic categorization
- Optional dependency: pip install "skill-seekers[chat]" (slack-sdk)

EPUB Unified Pipeline Integration

EPUB (.epub) input support via skill-seekers create book.epub or skill-seekers epub --epub book.epub
- Extracts chapters, metadata (Dublin Core), code blocks, images, and tables from EPUB 2 and EPUB 3 files
- DRM detection with clear error messages (Adobe ADEPT, Apple FairPlay, Readium LCP)
- Font obfuscation correctly identified as non-DRM
- EPUB 3 TOC bug workaround (ignore_ncx option)
- --help-epub flag for EPUB-specific help
- Optional dependency: pip install "skill-seekers[epub]" (ebooklib)
- 107 tests across 14 test classes
EPUB added to unified scraper — _scrape_epub() method, scraped_data["epub"], config validation (_validate_epub_source), and dry-run display. Previously EPUB worked standalone but was missing from multi-source configs.

Unified Skill Builder — Generic Merge System

_generic_merge() — Priority-based section merge for any combination of source types not covered by existing pairwise synthesis (docs+github, docs+pdf, etc.). Produces YAML frontmatter + source-attributed sections.
_append_extra_sources() — Appends additional source type content (e.g., Jupyter + PPTX) to pairwise-synthesized SKILL.md.
_generate_generic_references() — Generates references/<type>/index.md for any source type, with ID resolution fallback chain.
_SOURCE_LABELS dict — Human-readable labels for all 17 source types used in merge attribution.

Config Validator Expansion

17 source types in VALID_SOURCE_TYPES — All new types plus word and video now have per-type validation methods.
_validate_word_source() — Validates path field for Word documents (was previously missing).
_validate_video_source() — Validates url, path, or playlist field for video sources (was previously missing).
11 new _validate_*_source() methods — One for each new type with appropriate required-field checks.

Source Detection Improvements

7 new file extension detections in SourceDetector.detect() — .ipynb, .html/.htm, .pptx, .adoc/.asciidoc, .rss/.atom, .1–.8/.man, .yaml/.yml (with content sniffing)
_looks_like_openapi() — Content sniffing for YAML files: only classifies as OpenAPI if the file contains openapi: or swagger: key in first 20 lines (prevents false positives on docker-compose, Ansible, Kubernetes manifests, etc.)
Man page basename heuristic — .1–.8 extensions only detected as man pages if the basename has no dots (e.g., git.1 matches but access.log.1 does not)
.xml excluded from RSS detection — Too generic; only...

Assets 2

02 Mar 09:44

yusufkaraaslan

v3.2.0

73349c6

v3.2.0 — Video Extraction, Word Support, Pinecone Adaptor

Theme: Video source support, Word document support, Pinecone adaptor, and quality improvements. 94 files changed, +23,500 lines since v3.1.3. 2,540 tests passing.

🎬 Video Extraction Pipeline

Complete video extraction system that converts YouTube videos and local video files into AI-consumable skills.

skill-seekers video --url <youtube-url> — New CLI command for video scraping
skill-seekers create <youtube-url> — Auto-detects YouTube URLs
Transcript extraction — 3-tier fallback: YouTube API → yt-dlp → faster-whisper
Visual OCR — Multi-engine ensemble (EasyOCR + pytesseract) for code frames
Panel detection — Splits IDE screenshots into independent sub-sections
Code timeline — Tracks code evolution across frames with edit history
Two-pass AI enhancement — Cleans OCR noise using transcript context
GPU auto-detection — skill-seekers video --setup detects CUDA/ROCm/CPU and installs correct PyTorch
197 tests covering models, metadata, transcript, visual, OCR, and CLI

📄 Word Document (.docx) Support

skill-seekers word --docx <file> — Full pipeline: mammoth → HTML → sections → SKILL.md
skill-seekers create document.docx — Auto-detects .docx files
Smart code detection — Identifies monospace paragraphs as code blocks
Install: pip install skill-seekers[docx]

🌲 Pinecone Vector Database Adaptor

skill-seekers package output/ --format pinecone --upload — Direct Pinecone upload
Full CRUD operations with namespace support
OpenAI and Sentence Transformers embedding support
Batch upsert with configurable batch sizes
764 tests for comprehensive coverage

🐛 Bug Fixes

6 OCR quality fixes — Skip webcam frames, clean IDE decorations, fix duplicate lines, filter UI junk
15 video pipeline fixes — Timeout handling, MCP integration, filename collisions, dependency management
Issue #300 — Selector fallback & dry-run link discovery (ReactFlow found 20+ pages, was 1)
Issue #301 — setup.sh macOS fix
RAG chunking crash — Fixed AttributeError: output_dir
Chunk overlap auto-scaling — Scales to max(50, chunk_tokens // 10)
Reference file limits removed — No more caps on GitHub issues, releases, or code blocks
See CHANGELOG.md for full details

📦 Install / Upgrade

pip install --upgrade skill-seekers

# With video support
pip install skill-seekers[video]
skill-seekers video --setup  # Auto-detect GPU, install deps

# With Word support
pip install skill-seekers[docx]

# With Pinecone
pip install skill-seekers[pinecone]

# Everything
pip install skill-seekers[all]

Full Changelog: https://github.com/yusufkaraaslan/Skill_Seekers/blob/main/CHANGELOG.md

Assets 2

24 Feb 19:57

github-actions

v3.1.3

e42aade

v3.1.3

[3.1.3] - 2026-02-24

🐛 Hotfix — Explicit Chunk Flags & Argument Pipeline Cleanup

Fixed

Issue #299: skill-seekers package --target claude unrecognised argument crash — _reconstruct_argv() in main.py emits default flag values back into argv when routing subcommands. package_skill.py had a 105-line inline argparser that used different flag names to those in arguments/package.py, so forwarded flags were rejected. Fixed by replacing the inline block with a call to add_package_arguments(parser) — the single source of truth.

Changed

package_skill.py argparser refactored — Replaced ~105 lines of inline argparse duplication with a single add_package_arguments(parser) call. Flag names are now guaranteed consistent with _reconstruct_argv() output, preventing future argument-name drift.
Explicit chunk flag names — All --chunk-* flags now include unit suffixes to eliminate ambiguity between RAG tokens and streaming characters:
- --chunk-size (RAG tokens) → --chunk-tokens
- --chunk-overlap (RAG tokens) → --chunk-overlap-tokens
- --chunk (enable RAG chunking) → --chunk-for-rag
- --streaming-chunk-size (chars) → --streaming-chunk-chars
- --streaming-overlap (chars) → --streaming-overlap-chars
- --chunk-size in PDF extractor (pages) → --pdf-pages-per-chunk
setup_logging() centralized — Added setup_logging(verbose, quiet) to utils.py and removed 4 duplicate module-level logging.basicConfig() calls from doc_scraper.py, github_scraper.py, codebase_scraper.py, and unified_scraper.py

Assets 2

24 Feb 04:09

yusufkaraaslan

v3.1.2

90e5e8f

v3.1.2 — Gemini Fix & Enhance Dispatcher

What's Changed

🐛 Critical Bug Fixes

Gemini enhancement 404 errors — The gemini-2.0-flash-exp model was retired by Google, causing all Gemini enhancement requests to fail with 404. Replaced with gemini-2.5-flash (stable GA).

skill-seekers enhance auto-detection — The documented behaviour of automatically using API mode when an API key is present was never implemented. This release fixes it:

ANTHROPIC_API_KEY set → Claude API mode
GOOGLE_API_KEY set → Gemini API mode
OPENAI_API_KEY set → OpenAI API mode
No key → LOCAL mode (Claude Code Max, free)

Use --mode LOCAL to force local mode even when API keys are present.

create command argument forwarding — Universal flags (--dry-run, --verbose, --quiet, --name, --description) were crashing when used with GitHub, PDF, and codebase sources. All fixed. Also adds --dry-run support to skill-seekers github and skill-seekers pdf.

Upgrade

pip install --upgrade skill-seekers

docker pull yusufk/skill-seekers:latest

Full Changelog

See CHANGELOG.md for complete details.

Assets 2

23 Feb 09:12

yusufkaraaslan

v3.1.1

022b8a4

v3.1.1

What's Changed

fix: use getattr for max_pages in create command web routing by @YusufKaraaslanSpyke in #294
hotfix: v3.1.1 — fix create command max_pages AttributeError by @yusufkaraaslan in #295
Max page hot fix by @yusufkaraaslan in #296

Full Changelog: v3.1.0...v3.1.1

Contributors

yusufkaraaslan and YusufKaraaslanSpyke

Assets 2

Uh oh!

Releases: yusufkaraaslan/Skill_Seekers

v3.7.0

[3.7.0] - 2026-05-30

Added

Changed

Fixed

Uh oh!

v3.6.0

[3.6.0] - 2026-05-03

Added

Fixed

Uh oh!

v3.5.1

[3.5.1] - 2026-04-12

Added

Changed

Fixed

Uh oh!

v3.5.0

[3.5.0] - 2026-04-09

Added

Changed

Fixed

Security

Uh oh!

v3.4.0 — 12 LLM Platforms, SPA Detection, UML Architecture

What's New in v3.4.0

Platform Expansion: 5 → 12 LLM Targets

Agent Expansion: 11 → 18 Install Paths

OpenCode Skill Tools

Distribution

Bug Fixes

Documentation

Test Results

Install / Upgrade

Uh oh!

v3.3.0

[3.3.0] - 2026-03-16

Supported Source Types (17)

Added

10 New Skill Source Types (17 total)

EPUB Unified Pipeline Integration

Unified Skill Builder — Generic Merge System

Config Validator Expansion

Source Detection Improvements

Uh oh!

v3.2.0 — Video Extraction, Word Support, Pinecone Adaptor

v3.2.0 — Video Extraction, Word Support, Pinecone Adaptor

🎬 Video Extraction Pipeline

📄 Word Document (.docx) Support

🌲 Pinecone Vector Database Adaptor

🐛 Bug Fixes

📦 Install / Upgrade

Uh oh!

v3.1.3

[3.1.3] - 2026-02-24

🐛 Hotfix — Explicit Chunk Flags & Argument Pipeline Cleanup

Fixed

Changed

Uh oh!

v3.1.2 — Gemini Fix & Enhance Dispatcher

What's Changed

🐛 Critical Bug Fixes

Upgrade

Full Changelog

Uh oh!

v3.1.1

What's Changed

Contributors

Uh oh!