Releases: yusufkaraaslan/Skill_Seekers
v3.7.0
[3.7.0] - 2026-05-30
Theme: AI-driven project knowledge base (skill-seekers scan) — bootstrap a complete skill set for a project in one command, with safety/observability/coverage hardening throughout.
Added
skill-seekers scan <dir>command (#327) — point at any project; an AI agent inspects manifests, README, Dockerfile/CI, sampled source files (first 2 KB each), and the git remote, then emits one Skill Seekers config per detected framework plus a<project>-codebase.jsonfor the project's own code. Each config stamped withmetadata.detected_versionso re-scans report added / version-bumped / removed dependencies. Internationalized canonical-name resolver (CJK + EU language suffixes) so detections like "Godot 引擎" resolvegodot. Out-dir cache means re-scans reuse prior emissions and respect manual edits. Doctor-style report with pluralized counts and resolved / AI-generated / unresolved / archived breakdown.- Coverage: scan recognizes ~50 manifest types (Pipfile, environment.yml, deno.json, flake.nix, Chart.yaml, stack.yaml, deps.edn, dune-project, BUILD.bazel, …) and walks
src/lib/app/cmd/crates/packages/apps/services/backend/frontendplus root-level files (catches Django, flat-layout Python, Go, Rust workspaces, JS monorepos). - Cost + safety flags:
--max-ai-generations N(default 10) caps unbounded AI generation for monorepos;--dry-runpreviews what would be emitted without writing or invoking AI;--probe-urlsHEAD-probes AI-generated URLs with retry-on-404;--no-fetch/--no-generate/--no-publish-promptfor offline / CI use. - Community submission (opt-in): freshly AI-generated configs can be submitted to the community registry via a native-async flow. Pre-checks
GITHUB_TOKEN, idempotency-guards against duplicate issues, retries transient failures with backoff. - Archival: configs that disappear from detections are moved (not deleted) to
out_dir/.archived/<UTC-timestamp>/so the user never loses hand-edited work andout_dirstays clean. - Docs: new
docs/getting-started/05-scan-a-project.md; entries in README, FAQ, CLI Reference, Feature Matrix, Config Format, Environment Variables, and the Quick Start cross-link.
Changed
- CLI dispatch unified (#327) —
scananddoctornow consume the parsed-args namespace directly viaCommand(args).execute()instead of building a secondargparse.ArgumentParser. Eliminates the_reconstruct_argvhack for these commands; remaining ~14 commands flagged for migration. - Config schema:
detected_versionlives undermetadata.detected_version(alongsidemetadata.versionfor the config-schema version) rather than at top level. Backwards-compatible reader; old top-level placements migrate on next stamp. SourceDetector.CODE_PROJECT_MARKERSis now public (was_CODE_PROJECT_MARKERS); cross-module callers no longer reach into a private attribute.
Fixed
- Correctness (#327) — diff layer keyed by stable filename slug instead of internal config name (eliminates phantom add/remove churn);
resolve_config_pathlookups now append.jsonso local-disk + user-dir paths actually find files; out-dir cache prevents redundant API/AI calls on re-scan; lowercase filename slugs prevent duplicate-file accumulation across runs. - Safety (#327) — atomic JSON writes via
os.replaceso SIGINT mid-write can't corrupt a config and silently flip it to "removed" on the next scan;_safe_sizeguardsstat()so a broken symlink insrc/no longer crashes the scan;AgentClient.callexceptions caught and logged; AI-generated config names rejected if they fail the registry regex; URL probe catches AI hallucinations ofbase_urlbefore writing. - Observability (#327) —
logging.basicConfigin scan sologger.warning/errorreaches the user (was silently dropped); non-zero exit code when no configs and no codebase config were emitted, so CI pipelines detect total-failure scans. - Publish flow (#327) — native async (
asyncio.runat single entry,asyncio.to_threadforinput()); pre-checkGITHUB_TOKENwith actionable hint instead of asking N "yes/no" questions and failing N times; idempotency check (search existing open issues) prevents duplicate submissions; retry with backoff on transient failures; nested-event-loop detection with clear message instead of opaque traceback.
v3.6.0
[3.6.0] - 2026-05-03
Theme: Quality-of-life release — packaging targets, GitHub issue workflow, codebase analysis fixes, and source detection hardening.
Added
- IBM Bob packaging target — new
--target bobadaptor and agent install support for IBM's Bob agent platform (#366) - GitHub issue filtering —
--github-issue-state,--github-issue-labels, and--github-issue-sincefilters in the GitHub scraper for narrowing which issues are pulled (#367) - Per-issue files — GitHub scraper now writes one Markdown file per issue instead of a single bundle, improving navigation and downstream chunking (#367)
- Pinecone frontmatter — Pinecone vector exports now include consistent YAML frontmatter for metadata round-tripping (#367)
Fixed
- Unified scraper now generates
codebase_analysis/index — local sources were producing C3.x outputs with broken SKILL.md links; the unified skill builder now wires up the index and resolves links correctly (#362, #376) - Guides fallback fires correctly —
unified_skill_builderwas emitting a truthy placeholder for empty guides which suppressed the fallback content; placeholder removed (#364, #375) - HTML URLs no longer treated as local files —
source_detectornow checks forhttp(s)://before falling through to the local-path branch, fixing false-positive routing (#373) - PDF extracted images appear in markdown —
pdf_scrapernow insertsreferences for images extracted from PDFs so they render in the generated SKILL.md (#369) - C3.x output for local sources —
unifiedcommand was skipping the C3.x analysis pipeline for local codebase sources; now emits the full pattern/test/guide/config/router output (#363, #372) - Language filter passed to C3.x clone analysis — repos cloned for analysis now respect
--languagesinstead of analyzing every file (fixes #361, #370) - Unity vs Unreal detection — Unity projects with C# imports were being misidentified as Unreal; detection now keys on C# import patterns (fixes #365, #368)
v3.5.1
[3.5.1] - 2026-04-12
Added
- Centralized
defaults.jsonconfig — single source of truth for all default values (rate_limit,max_pages,workers,async_mode, enhancement, analysis, RAG settings). Newdefaults.pyloader module. All 15+ files that previously hardcoded defaults now read from this file (#356) - Low-signal code snippet filtering —
_is_low_signal_code_snippet()filters junk patterns like bareTrue,options, single identifiers from quick references (#360) - Pattern description normalization —
_normalize_pattern_description()cleans boilerplate prefixes and truncates to first meaningful sentence (#360) - Example language priority ranking —
_example_language_priority()ranks Python > Bash > JSON > etc. for SKILL.md examples (#360) checkpoint_exists()method onDocToSkillConverter— was called but never defined (#360)- Unified config source normalization —
DocToSkillConverter.__init__merges fields fromsources[0]into flat config for compatibility (#360) display_namesupport in SKILL.md generation — produces cleaner titles and slugs (#360)- New tests:
test_doc_scraper_entrypoint.py(regression for_run_scraping), quick-reference quality tests, docs-only compatibility tests, nested reference coverage tests (#360)
Changed
max_pagesdefault is now unlimited (-1) — the scraper fetches all pages unless the user explicitly sets--max-pages. Previously defaulted to 500 (#356)--no-rate-limitflag now works — was defined in CLI arguments but never consumed byExecutionContext(#356)constants.pyreads fromdefaults.json— no longer contains hardcoded magic numbers (#356)ExecutionContext.ScrapingSettings—rate_limitandmax_pagesnow use real defaults instead ofNone, preventing None-poisoning downstream (#356)- SKILL.md frontmatter cleanup — empty
doc_version:andversion:fields are now omitted; placeholder sections removed (#360) - Enhancement routing through platform adaptors instead of importing nonexistent
enhance_skill_mdhelper (#360) quality_metrics.pyusesrglobfor nested reference directories in unified skills (#360)
Fixed
TypeError: '>' not supported between instances of 'NoneType' and 'int'—rate_limitdefaulted toNoneinExecutionContext, which flowed throughconfig.get("rate_limit", DEFAULT)(dict.get returns None when the key exists with value None, ignoring the fallback). Fixed indoc_scraper.py(sync + async paths),estimate_pages.py, andsync_config.py(#356, #359)discover_urls()loop never executed with unlimitedmax_pages—len(discovered) < -1is always False. Added unlimited mode guard (#356)converter.scrape()called nonexistent method in_run_scraping()— changed toconverter.scrape_all()(#360)- None-safety for BeautifulSoup attributes —
link["href"],sitemap.text,meta_desc["content"]guarded against None XML text nodes (#360) - Python 3.10 compatibility — backslash in f-string in
quality_metrics.pynot supported before 3.12 (#360)
v3.5.0
[3.5.0] - 2026-04-09
Theme: Grand Unification — one command, one interface, direct converters. Agent-agnostic architecture, marketplace pipeline, smart SPA discovery, all content extraction enabled by default. 80+ files changed across the codebase.
Added
- Grand Unification — unified
createcommand as single entry point for all 18 source types with auto-detection, direct converter invocation, and centralized enhancement (#346) - Agent-agnostic
AgentClientabstraction — all 5 enhancers now support Claude, Kimi, Codex, Copilot, OpenCode, and custom agents via a unified interface. Auto-detects agent from API keys instead of hardcoding (#336) - Kimi CLI integration with stdin piping and output parsing (#336)
MarketplacePublisher— publish skills to Claude Code plugin marketplace repos (#336)MarketplaceManager— register and manage marketplace repositories (#336)ConfigPublisher— push configs to registered config source repos (#336)push_configMCP tool for automated config publishing (#336)- Smart SPA discovery engine — three-layer discovery: sitemap.xml, llms.txt, SPA nav rendering (#336)
"browser": trueconfig support for JavaScript SPA sites with browser renderer timeout defaults (60s, domcontentloaded) (#336)- Dynamic routing via
_build_argv()— replaced manual arg forwarding with dynamic forwarder, added 7 missing CLI flags (#336) - Kotlin language support for codebase analysis — Full C3.x pipeline support: AST parsing (classes, objects, functions, data/sealed classes, extension functions, coroutines), dependency extraction, design pattern recognition (object declaration→Singleton, companion object→Factory, sealed class→Strategy), test example extraction (JUnit, Kotest, MockK, Spek), language detection patterns, config detection (build.gradle.kts), and extension maps across all analyzers (#287)
- Headless browser rendering (
--browserflag) — uses Playwright to render JavaScript SPA sites (React, Vue, etc.) that return empty HTML shells. Auto-installs Chromium on first use. Optional dep:pip install "skill-seekers[browser]"(#321) skill-seekers doctorcommand — 8 diagnostic checks (Python version, package install, git, core/optional deps, API keys, MCP server, output dir) with pass/warn/fail status and--verboseflag (#316)- Prompt injection check workflow — bundled
prompt-injection-checkworkflow scans scraped content for injection patterns (role assumption, instruction overrides, delimiter injection, hidden instructions). Added as first stage indefaultandsecurity-focusworkflows. Flags suspicious content without removing it (#324) - Codex CLI plugin manifest (
.codex-plugin/plugin.json) for OpenAI Codex integration (#350) - 6 behavioral UML diagrams — 3 sequence (create pipeline, GitHub+C3.x flow, MCP invocation), 2 activity (source detection, enhancement pipeline), 1 component (runtime dependencies with interface contracts)
- 134 new tests —
test_agent_client.py,test_config_publisher.py,_build_argvtests. Total: 3194 passed, 39 expected skips (#336)
Changed
- All content extraction features enabled by default — pattern detection, test examples, how-to guides, config extraction, and router generation no longer require explicit opt-in
- Renamed
claude-enhancedmerge mode toai-enhanced— backward compatibility alias kept (#336) - Removed 118+ hardcoded Claude references across 60+ files (#336)
- Refactored 5 enhancers to use
AgentClientabstraction (#336) - Removed 50-file GitHub API analysis limit (#336)
- Removed 100-file config extraction limit (#336)
- Fixed unified scraper default
max_pagesfrom 100 to 500 (#336) - Centralized enhancement timeouts to 45min default with unlimited support (#336)
- Excluded slow MCP/e2e tests from CI coverage step to prevent timeout
Fixed
glob('*.md')replaced withrglob('*.md')in all adaptors — fixes packaging when skills are in nested directories (#349)scraped_datalist-vs-dict bug in conflict detection (#336)base_urlpassthrough to doc scraper subprocess (#336)- URL filtering now uses base directory correctly (#336)
- C3.x analysis data loss (#336)
--enhance-levelflag not passed correctly (#336)guide_enhancermethod rename —_call_claude_apirenamed to_call_ai(#336)- 11 pre-existing test failures fixed (#336)
- Per-file language detection in GitHub scraper (#336)
- GitHub language detection crashes with
TypeErrorwhen API response contains non-integer metadata keys (e.g.,"url") — now filters to integer values only (#322) - C3.x codebase analysis crashes with
TypeError—_run_c3_analysis()and_analyze_c3x()passed removedenhance_with_ai/ai_modekwargs toanalyze_codebase()instead ofenhance_level(#323)
Security
- Removed command injection via cloned repo script execution (#336)
- Replaced
git add -Awith targeted staging in marketplace publisher (#336) - Clear auth tokens from cached
.git/configafter clone (#336) - Use
defusedxmlfor sitemap XML parsing (XXE protection) (#336) - Path traversal validation for config names (#336)
v3.4.0 — 12 LLM Platforms, SPA Detection, UML Architecture
What's New in v3.4.0
Theme: 8 new LLM platform adaptors (12 total), 7 new CLI agent paths (18 total), OpenCode skill tools, SPA site detection, 8 bug fixes, and full UML architecture documentation.
Platform Expansion: 5 → 12 LLM Targets
| New Platform | Flag | Base |
|---|---|---|
| OpenCode | --target opencode |
Directory-based, dual YAML |
| Kimi | --target kimi |
OpenAI-compatible |
| DeepSeek | --target deepseek |
OpenAI-compatible |
| Qwen | --target qwen |
OpenAI-compatible |
| OpenRouter | --target openrouter |
OpenAI-compatible |
| Together AI | --target together |
OpenAI-compatible |
| Fireworks AI | --target fireworks |
OpenAI-compatible |
All new platforms inherit from a shared OpenAI-compatible base class for consistent behavior.
Agent Expansion: 11 → 18 Install Paths
New agents: roo, cline, aider, bolt, kilo, continue, kimi-code
OpenCode Skill Tools
- Skill splitter — auto-split large docs into focused sub-skills with router
- Bi-directional converter — import/export between OpenCode and any platform format
Distribution
- Smithery manifest (
smithery.yaml) - GitHub Actions template for automated skill updates
- Claude Code Plugin with slash commands
Bug Fixes
sanitize_url()crash on Python 3.14 stricturlparse(#284)- Blind
/index.html.mdappend breaking non-Docusaurus sites (#277) - Unified scraper temp config format (#317)
- Unicode arrows breaking Windows cp1252 terminals
- CLI flags in plugin slash commands
- MiniMax adaptor improvements (#319)
- Misleading "Scraped N pages" count — now shows
(N saved, M skipped)(#320) - SPA site detection — warns when site requires JavaScript rendering (#320, #321)
Documentation
- Full UML architecture — 14 class diagrams synced from source code via StarUML
- StarUML HTML API reference export
- Ecosystem section linking all Skill Seekers repos
- Architecture references in README and CONTRIBUTING
- Consolidated
Docs/intodocs/
Test Results
2929 passed, 39 skipped, 0 failures
Install / Upgrade
pip install --upgrade skill-seekersFull changelog: https://github.com/yusufkaraaslan/Skill_Seekers/blob/main/CHANGELOG.md
v3.3.0
[3.3.0] - 2026-03-16
Theme: 10 new source types (17 total), EPUB unified integration, sync-config command, performance optimizations, 12 README translations, and 19 bug fixes. 117 files changed, +41,588 lines since v3.2.0.
Supported Source Types (17)
| # | Type | CLI Command | Config Type | Auto-Detection |
|---|---|---|---|---|
| 1 | Documentation (web) | scrape / create <url> |
documentation |
HTTP/HTTPS URLs |
| 2 | GitHub repository | github / create owner/repo |
github |
owner/repo or github.com URLs |
| 3 | PDF document | pdf / create file.pdf |
pdf |
.pdf extension |
| 4 | Word document | word / create file.docx |
word |
.docx extension |
| 5 | EPUB e-book | epub / create file.epub |
epub |
.epub extension |
| 6 | Video | video / create <url/file> |
video |
YouTube/Vimeo URLs, video extensions |
| 7 | Local codebase | analyze / create ./path |
local |
Directory paths |
| 8 | Jupyter Notebook | jupyter / create file.ipynb |
jupyter |
.ipynb extension |
| 9 | Local HTML | html / create file.html |
html |
.html/.htm extensions |
| 10 | OpenAPI/Swagger | openapi / create spec.yaml |
openapi |
.yaml/.yml with OpenAPI content |
| 11 | AsciiDoc | asciidoc / create file.adoc |
asciidoc |
.adoc/.asciidoc extensions |
| 12 | PowerPoint | pptx / create file.pptx |
pptx |
.pptx extension |
| 13 | RSS/Atom feed | rss / create feed.rss |
rss |
.rss/.atom extensions |
| 14 | Man pages | manpage / create cmd.1 |
manpage |
.1–.8/.man extensions |
| 15 | Confluence wiki | confluence |
confluence |
API or export directory |
| 16 | Notion pages | notion |
notion |
API or export directory |
| 17 | Slack/Discord chat | chat |
chat |
Export directory or API |
Added
10 New Skill Source Types (17 total)
Skill Seekers now supports 17 source types — up from 7. Every new type is fully integrated into the CLI (skill-seekers <type>), create command auto-detection, unified multi-source configs, config validation, the MCP server, and the skill builder.
-
Jupyter Notebook —
skill-seekers jupyter --notebook file.ipynborskill-seekers create file.ipynb- Extracts markdown cells, code cells with outputs, kernel metadata, imports, and language detection
- Handles single files and directories of notebooks; filters
.ipynb_checkpoints - Optional dependency:
pip install "skill-seekers[jupyter]"(nbformat) - Entry point:
skill-seekers-jupyter
-
Local HTML —
skill-seekers html --html-path file.htmlorskill-seekers create file.html- Parses HTML using BeautifulSoup with smart main content detection (
<article>,<main>,.content, largest div) - Extracts headings, code blocks, tables (to markdown), images, links; converts inline HTML to markdown
- Handles single files and directories; supports
.html,.htm,.xhtmlextensions - No extra dependencies (BeautifulSoup is a core dep)
- Parses HTML using BeautifulSoup with smart main content detection (
-
OpenAPI/Swagger —
skill-seekers openapi --spec spec.yamlorskill-seekers create spec.yaml- Parses OpenAPI 3.0/3.1 and Swagger 2.0 specs from YAML or JSON (local files or URLs via
--spec-url) - Extracts endpoints, parameters, request/response schemas, security schemes, tags
- Resolves
$refreferences with circular reference protection; handlesallOf/oneOf/anyOf - Groups endpoints by tags; generates comprehensive API reference markdown
- Source detection sniffs YAML file content for
openapi:orswagger:keys (avoids false positives on non-API YAML files) - Optional dependency:
pip install "skill-seekers[openapi]"(pyyaml — already a core dep, guard added for safety)
- Parses OpenAPI 3.0/3.1 and Swagger 2.0 specs from YAML or JSON (local files or URLs via
-
AsciiDoc —
skill-seekers asciidoc --asciidoc-path file.adocorskill-seekers create file.adoc- Regex-based parser (no external library required) with optional
asciidoclibrary support - Extracts headings (= through =====),
[source,lang]code blocks,|===tables, admonitions (NOTE/TIP/WARNING/IMPORTANT/CAUTION), andinclude::directives - Converts AsciiDoc formatting to markdown; handles single files and directories
- Optional dependency:
pip install "skill-seekers[asciidoc]"(asciidoc library for advanced rendering)
- Regex-based parser (no external library required) with optional
-
PowerPoint (.pptx) —
skill-seekers pptx --pptx file.pptxorskill-seekers create file.pptx- Extracts slide text, speaker notes, tables, images (with alt text), and grouped shapes
- Detects code blocks by monospace font analysis (30+ font families)
- Groups slides into sections by layout type; handles single files and directories
- Optional dependency:
pip install "skill-seekers[pptx]"(python-pptx)
-
RSS/Atom Feeds —
skill-seekers rss --feed-url <url>/--feed-path file.rssorskill-seekers create feed.rss- Parses RSS 2.0, RSS 1.0, and Atom feeds via feedparser
- Optionally follows article links (
--follow-links, default on) to scrape full page content using BeautifulSoup - Extracts article titles, summaries, authors, dates, categories; configurable
--max-articles(default 50) - Source detection matches
.rssand.atomextensions (.xmlexcluded to avoid false positives) - Optional dependency:
pip install "skill-seekers[rss]"(feedparser)
-
Man Pages —
skill-seekers manpage --man-names git,curl/--man-path dir/orskill-seekers create git.1- Extracts man pages by running
mancommand via subprocess or reading.1–.8/.manfiles directly - Handles gzip/bzip2/xz compressed man files; strips troff/groff formatting (backspace overstriking, macros, font escapes)
- Parses structured sections (NAME, SYNOPSIS, DESCRIPTION, OPTIONS, EXAMPLES, SEE ALSO)
- Source detection uses basename heuristic to avoid false positives on log rotation files (e.g.,
access.log.1) - No external dependencies (stdlib only)
- Extracts man pages by running
-
Confluence —
skill-seekers confluence --base-url <url> --space-key <key>or--export-path dir/- API mode: fetches pages from Confluence REST API with pagination (
atlassian-python-api) - Export mode: parses Confluence HTML/XML export directories
- Extracts page content, code/panel/info/warning macros, page hierarchy, tables
- Optional dependency:
pip install "skill-seekers[confluence]"(atlassian-python-api)
- API mode: fetches pages from Confluence REST API with pagination (
-
Notion —
skill-seekers notion --database-id <id>/--page-id <id>or--export-path dir/- API mode: fetches pages via Notion API with support for 20+ block types (paragraph, heading, code, callout, toggle, table, etc.)
- Export mode: parses Notion Markdown/CSV export directories
- Extracts rich text with annotations (bold, italic, code, links), 16+ property types for database entries
- Optional dependency:
pip install "skill-seekers[notion]"(notion-client)
-
Slack/Discord Chat —
skill-seekers chat --export-path dir/or--token <token> --channel <channel>- Slack: parses workspace JSON exports or fetches via Slack Web API (
slack_sdk) - Discord: parses DiscordChatExporter JSON or fetches via Discord HTTP API
- Extracts messages, code snippets (fenced blocks), shared URLs, threads, reactions, attachments
- Generates per-channel summaries and topic categorization
- Optional dependency:
pip install "skill-seekers[chat]"(slack-sdk)
- Slack: parses workspace JSON exports or fetches via Slack Web API (
EPUB Unified Pipeline Integration
- EPUB (.epub) input support via
skill-seekers create book.epuborskill-seekers epub --epub book.epub- Extracts chapters, metadata (Dublin Core), code blocks, images, and tables from EPUB 2 and EPUB 3 files
- DRM detection with clear error messages (Adobe ADEPT, Apple FairPlay, Readium LCP)
- Font obfuscation correctly identified as non-DRM
- EPUB 3 TOC bug workaround (
ignore_ncxoption) --help-epubflag for EPUB-specific help- Optional dependency:
pip install "skill-seekers[epub]"(ebooklib) - 107 tests across 14 test classes
- EPUB added to unified scraper —
_scrape_epub()method,scraped_data["epub"], config validation (_validate_epub_source), and dry-run display. Previously EPUB worked standalone but was missing from multi-source configs.
Unified Skill Builder — Generic Merge System
_generic_merge()— Priority-based section merge for any combination of source types not covered by existing pairwise synthesis (docs+github, docs+pdf, etc.). Produces YAML frontmatter + source-attributed sections._append_extra_sources()— Appends additional source type content (e.g., Jupyter + PPTX) to pairwise-synthesized SKILL.md._generate_generic_references()— Generatesreferences/<type>/index.mdfor any source type, with ID resolution fallback chain._SOURCE_LABELSdict — Human-readable labels for all 17 source types used in merge attribution.
Config Validator Expansion
- 17 source types in
VALID_SOURCE_TYPES— All new types pluswordandvideonow have per-type validation methods. _validate_word_source()— Validatespathfield for Word documents (was previously missing)._validate_video_source()— Validatesurl,path, orplaylistfield for video sources (was previously missing).- 11 new
_validate_*_source()methods — One for each new type with appropriate required-field checks.
Source Detection Improvements
- 7 new file extension detections in
SourceDetector.detect()—.ipynb,.html/.htm,.pptx,.adoc/.asciidoc,.rss/.atom,.1–.8/.man,.yaml/.yml(with content sniffing) _looks_like_openapi()— Content sniffing for YAML files: only classifies as OpenAPI if the file containsopenapi:orswagger:key in first 20 lines (prevents false positives on docker-compose, Ansible, Kubernetes manifests, etc.)- Man page basename heuristic —
.1–.8extensions only detected as man pages if the basename has no dots (e.g.,git.1matches butaccess.log.1does not) .xmlexcluded from RSS detection — Too generic; only...
v3.2.0 — Video Extraction, Word Support, Pinecone Adaptor
v3.2.0 — Video Extraction, Word Support, Pinecone Adaptor
Theme: Video source support, Word document support, Pinecone adaptor, and quality improvements. 94 files changed, +23,500 lines since v3.1.3. 2,540 tests passing.
🎬 Video Extraction Pipeline
Complete video extraction system that converts YouTube videos and local video files into AI-consumable skills.
skill-seekers video --url <youtube-url>— New CLI command for video scrapingskill-seekers create <youtube-url>— Auto-detects YouTube URLs- Transcript extraction — 3-tier fallback: YouTube API → yt-dlp → faster-whisper
- Visual OCR — Multi-engine ensemble (EasyOCR + pytesseract) for code frames
- Panel detection — Splits IDE screenshots into independent sub-sections
- Code timeline — Tracks code evolution across frames with edit history
- Two-pass AI enhancement — Cleans OCR noise using transcript context
- GPU auto-detection —
skill-seekers video --setupdetects CUDA/ROCm/CPU and installs correct PyTorch - 197 tests covering models, metadata, transcript, visual, OCR, and CLI
📄 Word Document (.docx) Support
skill-seekers word --docx <file>— Full pipeline: mammoth → HTML → sections → SKILL.mdskill-seekers create document.docx— Auto-detects .docx files- Smart code detection — Identifies monospace paragraphs as code blocks
- Install:
pip install skill-seekers[docx]
🌲 Pinecone Vector Database Adaptor
skill-seekers package output/ --format pinecone --upload— Direct Pinecone upload- Full CRUD operations with namespace support
- OpenAI and Sentence Transformers embedding support
- Batch upsert with configurable batch sizes
- 764 tests for comprehensive coverage
🐛 Bug Fixes
- 6 OCR quality fixes — Skip webcam frames, clean IDE decorations, fix duplicate lines, filter UI junk
- 15 video pipeline fixes — Timeout handling, MCP integration, filename collisions, dependency management
- Issue #300 — Selector fallback & dry-run link discovery (ReactFlow found 20+ pages, was 1)
- Issue #301 —
setup.shmacOS fix - RAG chunking crash — Fixed
AttributeError: output_dir - Chunk overlap auto-scaling — Scales to
max(50, chunk_tokens // 10) - Reference file limits removed — No more caps on GitHub issues, releases, or code blocks
- See CHANGELOG.md for full details
📦 Install / Upgrade
pip install --upgrade skill-seekers
# With video support
pip install skill-seekers[video]
skill-seekers video --setup # Auto-detect GPU, install deps
# With Word support
pip install skill-seekers[docx]
# With Pinecone
pip install skill-seekers[pinecone]
# Everything
pip install skill-seekers[all]Full Changelog: https://github.com/yusufkaraaslan/Skill_Seekers/blob/main/CHANGELOG.md
v3.1.3
[3.1.3] - 2026-02-24
🐛 Hotfix — Explicit Chunk Flags & Argument Pipeline Cleanup
Fixed
- Issue #299:
skill-seekers package --target claudeunrecognised argument crash —_reconstruct_argv()inmain.pyemits default flag values back into argv when routing subcommands.package_skill.pyhad a 105-line inline argparser that used different flag names to those inarguments/package.py, so forwarded flags were rejected. Fixed by replacing the inline block with a call toadd_package_arguments(parser)— the single source of truth.
Changed
package_skill.pyargparser refactored — Replaced ~105 lines of inline argparse duplication with a singleadd_package_arguments(parser)call. Flag names are now guaranteed consistent with_reconstruct_argv()output, preventing future argument-name drift.- Explicit chunk flag names — All
--chunk-*flags now include unit suffixes to eliminate ambiguity between RAG tokens and streaming characters:--chunk-size(RAG tokens) →--chunk-tokens--chunk-overlap(RAG tokens) →--chunk-overlap-tokens--chunk(enable RAG chunking) →--chunk-for-rag--streaming-chunk-size(chars) →--streaming-chunk-chars--streaming-overlap(chars) →--streaming-overlap-chars--chunk-sizein PDF extractor (pages) →--pdf-pages-per-chunk
setup_logging()centralized — Addedsetup_logging(verbose, quiet)toutils.pyand removed 4 duplicate module-levellogging.basicConfig()calls fromdoc_scraper.py,github_scraper.py,codebase_scraper.py, andunified_scraper.py
v3.1.2 — Gemini Fix & Enhance Dispatcher
What's Changed
🐛 Critical Bug Fixes
Gemini enhancement 404 errors — The gemini-2.0-flash-exp model was retired by Google, causing all Gemini enhancement requests to fail with 404. Replaced with gemini-2.5-flash (stable GA).
skill-seekers enhance auto-detection — The documented behaviour of automatically using API mode when an API key is present was never implemented. This release fixes it:
ANTHROPIC_API_KEYset → Claude API modeGOOGLE_API_KEYset → Gemini API modeOPENAI_API_KEYset → OpenAI API mode- No key → LOCAL mode (Claude Code Max, free)
Use --mode LOCAL to force local mode even when API keys are present.
create command argument forwarding — Universal flags (--dry-run, --verbose, --quiet, --name, --description) were crashing when used with GitHub, PDF, and codebase sources. All fixed. Also adds --dry-run support to skill-seekers github and skill-seekers pdf.
Upgrade
pip install --upgrade skill-seekersdocker pull yusufk/skill-seekers:latestFull Changelog
See CHANGELOG.md for complete details.
v3.1.1
What's Changed
- fix: use getattr for max_pages in create command web routing by @YusufKaraaslanSpyke in #294
- hotfix: v3.1.1 — fix create command max_pages AttributeError by @yusufkaraaslan in #295
- Max page hot fix by @yusufkaraaslan in #296
Full Changelog: v3.1.0...v3.1.1