Skip to content

fix(unified): emit C3.x output for local sources (#363)#372

Merged
yusufkaraaslan merged 1 commit into
developmentfrom
fix/issue-363-local-c3-analysis-output
Apr 29, 2026
Merged

fix(unified): emit C3.x output for local sources (#363)#372
yusufkaraaslan merged 1 commit into
developmentfrom
fix/issue-363-local-c3-analysis-output

Conversation

@yusufkaraaslan

Copy link
Copy Markdown
Owner

Fixes #363

Problem

The unified skill builder dropped all C3.x analysis output produced by local sources. _run_local_codebase_analysis correctly extracted test_examples, patterns, how_to_guides, config_patterns, architecture, etc., wrote them to cache (local_analysis_{idx}_{name}/), and stored them on scraped_data["local"] — but unified_skill_builder.py only consumed GitHub sources for C3.x reference generation and SKILL.md summarization.

Concretely, both code paths only looked at GitHub:

  • _generate_references() at line 1095-1101 iterated scraped_data.get("github", []) and called _generate_c3_analysis_references per repo.
  • _generate_skill_md() at line 884-896 read scraped_data.get("github", {}) and passed only the first GitHub source's c3_analysis to _format_c3_summary_section.

So a unified config with a local source and extract_tests=true would happily run the test-example extractor (122 files / 292 examples in the issue), then quietly throw the result away.

The bug is broader than the issue title suggests — it's not just test_examples, it's the entire C3.x payload from local sources.

Solution

Refactor and extend in unified_skill_builder.py:

  1. Refactor _generate_c3_analysis_references to delegate to a new shared helper _write_codebase_analysis_references(c3_data, source_id, github_data=None) that does the actual work. The GitHub path is now a thin lookup wrapper.
  2. Add _generate_local_codebase_analysis_references that walks scraped_data["local"], skips sources with no C3 fields, and calls the shared writer with a sanitized source ID. Wired into _generate_references next to the GitHub loop.
  3. Add _sanitize_source_id to make local IDs filesystem-safe (slashes, spaces, special chars → _; empty → "local").
  4. Add _collect_c3_payloads that gathers C3 data from both GitHub and local sources into a flat list.
  5. Rewrite _format_c3_summary_section to accept either a single dict (legacy) or a list of payloads, aggregating counts (test examples, design patterns, how-to guides, config files, security alerts) across all sources.

Testing

7 new regression tests in tests/test_unified.py:

  • test_local_source_test_examples_become_references_363 — the headline issue: local source with test_examples produces references/codebase_analysis/<sanitized-id>/examples/test_examples.json and an ARCHITECTURE.md mentioning the example count.
  • test_local_source_skill_md_includes_summary_363 — SKILL.md gets the C3 summary section when only local has data.
  • test_format_c3_summary_aggregates_across_sources_363 — counts sum across multiple payloads.
  • test_format_c3_summary_backward_compat_dict_arg_363 — single-dict input still works.
  • test_collect_c3_payloads_combines_github_and_local_363 — combined harvesting from both source types.
  • test_local_source_without_c3_data_skipped_363 — empty local sources don't create empty reference dirs.
  • test_sanitize_source_id_363 — ID normalization edge cases.

All 31 tests/test_unified.py tests pass; 93 unified-related tests across test_unified.py + test_c3_integration.py + test_unified_scraper_orchestration.py + test_unified_analyzer.py all pass. Lint + format clean.

🤖 Generated with Claude Code

The unified skill builder previously dropped all C3.x analysis (test_examples,
patterns, how_to_guides, config_patterns, architecture, ...) produced by
local sources. Reference generation and the SKILL.md summary both consumed
GitHub sources only, so a unified config with extract_tests=true would
extract examples to cache but never surface them.

Refactor _generate_c3_analysis_references to delegate to a shared
_write_codebase_analysis_references helper, then wire a parallel
_generate_local_codebase_analysis_references loop that walks
scraped_data["local"] and emits the same reference layout per source
(filesystem-safe IDs via _sanitize_source_id).

Rewrite _format_c3_summary_section to take a list of payloads and
aggregate counts across sources; collect them via a new _collect_c3_payloads
helper that pulls from both GitHub and local. Single-dict input still works
for backward compatibility.

Adds 7 regression tests in tests/test_unified.py covering the headline
issue, SKILL.md aggregation, multi-source collection, ID sanitization,
and the no-C3-data-skipped case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Apr 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.00000% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/skill_seekers/cli/unified_skill_builder.py 93.00% 7 Missing ⚠️

📢 Thoughts on this report? Let us know!

@yusufkaraaslan yusufkaraaslan merged commit 7409f25 into development Apr 29, 2026
8 checks passed
yusufkaraaslan added a commit that referenced this pull request May 2, 2026
…inks (#362) (#376)

ARCHITECTURE.md is always written at
`references/codebase_analysis/{source_id}/ARCHITECTURE.md`, but four
SKILL.md call sites historically linked to
`references/codebase_analysis/ARCHITECTURE.md` (no source_id). That
target never existed once outputs became per-source-namespaced, so a
reader following the link from SKILL.md hit a 404. The user-visible
result for #362: detected patterns *are* in the references tree (after
PR #372 wired them up) but SKILL.md's "see ARCHITECTURE.md" pointer led
nowhere, making the analysis appear missing.

Generate `references/codebase_analysis/index.md` after all per-source
references are written, listing each source's ARCHITECTURE.md and any
populated subsection (patterns, examples, guides, configuration). Route
the four SKILL.md links through this stable target so the path resolves
whether the build has one source or many.

The index is omitted when no codebase analysis ran, so skills built from
docs/PDF/etc. only do not get a stray empty index.

Tests:
- TestCodebaseAnalysisIndex (3 cases): index lists each local source,
  SKILL.md link resolves on disk, no index when no C3.x data.
- Updated test_skill_md_includes_c3_summary to assert the new link.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yusufkaraaslan yusufkaraaslan mentioned this pull request May 3, 2026
yusufkaraaslan added a commit that referenced this pull request May 3, 2026
* fix: detect Unity via C# imports to prevent misidentification as Unreal (fixes #365) (#368)

Unity C# projects were incorrectly detected as Unreal when the analyzed source
files contained paths with 'Source/' or 'Content/' subdirectories, which are
also valid Unreal engine markers.

Root causes:
1. Game engine detection did not check import_content, so 'using UnityEngine;'
   statements were ignored entirely.
2. Unity markers lacked import-based signals ('UnityEngine') and the unique
   Unity Package Manager file ('Packages/manifest.json').

Fix:
- Add 'UnityEngine' and 'Packages/manifest.json' to Unity FRAMEWORK_MARKERS.
- Extend the game engine detection loop to also check import_content, using
  the same high-confidence threshold (>= 1 import match) already applied to
  other frameworks like Django and Spring.
- Path/directory-based detection still requires 2+ matches to avoid false
  positives from generic directory names.

Tests: add test_architectural_pattern_detector.py covering:
- Unity detected via UnityEngine imports alone
- Unity not misidentified as Unreal when a Source/ subfolder exists
- Unreal projects still detected correctly
- Unity detected via Packages/manifest.json in file paths

Co-authored-by: octo-patch <octo-patch@github.com>

* fix: pass language filter to C3.x clone analysis (fixes #361) (#370)

The _run_c3_analysis method was always passing languages=None to
analyze_codebase, ignoring the language filter configured in the
GitHub source config. This caused the C3.x codebase analysis on
cloned repos to either find no source files (when the repo only
has files of the filtered language) or analyze the wrong language
set entirely.

Now passes source.get("languages") so the language filter is
respected consistently with the local source analysis.

Co-authored-by: octo-patch <octo-patch@github.com>

* Add IBM Bob packaging target and agent install support (#366)

* Add IBM Bob packaging target and agent install support

* Update README.md

* Fix IBM Bob adaptor compatibility for Python 3.10/3.11

* formattiing fixed

* feat: GitHub issue filtering, per-issue files, and Pinecone frontmatter (#367)

* feat: GitHub issue filtering, per-issue files, and Pinecone frontmatter

Add issue filtering (--issue-labels, --issue-state, --since, --max-comments),
per-issue markdown files with YAML frontmatter, Pinecone adaptor frontmatter
parsing into vector metadata, and full body preservation (was truncated to
500 chars). Includes 598 lines of new tests.

* fix: preserve previous defaults for issue scraping

- Per-issue files are now opt-in via --per-issue-files (was always-on)
- Comment fetching disabled by default (--max-comments 0, was 50)
- Issue body truncated to 500 chars by default (full body only with --per-issue-files)
- Add test for default truncation and default no-comment behavior

* fix: address PR #367 review (dead code, label kwargs, Z-suffix, issues subdir)

- Drop unreachable setup_argument_parser/main from github_scraper.py
- Pass --issue-labels as plain strings to PyGithub (drops extra get_label call)
- Normalize trailing 'Z' in --since for Python 3.10 fromisoformat compatibility
- Per-issue files moved to references/issues/{owner}-{repo}-{n}.md to avoid
  collisions when multiple repos share a skill_dir
- Document data.json body truncation when --per-issue-files is set
- Help text: note --max-comments cost and --since 'Z' suffix support
- Tests: Z-suffix parsing, label-as-strings, per-issue subdir + collision,
  malformed YAML frontmatter resilience in pinecone adaptor
- Re-sync uv.lock against origin/development

---------

Co-authored-by: Joseph Petty <greenflux@Josephs-MacBook-Pro.local>

* fix(unified): emit C3.x output for local sources (#363) (#372)

The unified skill builder previously dropped all C3.x analysis (test_examples,
patterns, how_to_guides, config_patterns, architecture, ...) produced by
local sources. Reference generation and the SKILL.md summary both consumed
GitHub sources only, so a unified config with extract_tests=true would
extract examples to cache but never surface them.

Refactor _generate_c3_analysis_references to delegate to a shared
_write_codebase_analysis_references helper, then wire a parallel
_generate_local_codebase_analysis_references loop that walks
scraped_data["local"] and emits the same reference layout per source
(filesystem-safe IDs via _sanitize_source_id).

Rewrite _format_c3_summary_section to take a list of payloads and
aggregate counts across sources; collect them via a new _collect_c3_payloads
helper that pulls from both GitHub and local. Single-dict input still works
for backward compatibility.

Adds 7 regression tests in tests/test_unified.py covering the headline
issue, SKILL.md aggregation, multi-source collection, ID sanitization,
and the no-C3-data-skipped case.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: insert markdown image refs for extracted_images in PDF output (#369)

* fix: insert markdown image refs for extracted_images in PDF output (fixes #338)

Images extracted by pdf_extractor_poc were saved to assets/images/ but
never referenced in the generated markdown files. The _generate_reference_file
method checked for page["images"] (legacy format) but the extractor stores
images as page["extracted_images"] with filename/path keys, not raw data.

Added handling for the extracted_images format: writes
![filename](../assets/images/filename) references for each extracted image.
The legacy images format (with raw data) is preserved for backward compat.

Also adds test coverage for the extracted_images reference generation.

* fix(pdf): restore TestErrorHandling class and address review feedback

- Restore `class TestErrorHandling(unittest.TestCase):` declaration that
  was accidentally dropped during the prior patch — its 3 tests were
  silently inherited by TestImageHandling, breaking class-based filtering.
- Drop the dead dummy-image setup in test_extracted_images_references_in_markdown
  (the markdown writer never reads the file).
- Use friendlier alt text: `Image from page {page_number}` instead of the
  raw filename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: yusyus <yusufkaraaslan.yk@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: detect HTML URLs before treating as local files (#373)

* fix: detect HTML URLs before treating as local files

Fixes the create command's auto-detect source feature to properly handle
URLs ending in .html extension (e.g., https://api.flutter.dev/flutter/rendering/RenderObject-class.html).

## Problem
- URLs with .html extension were incorrectly detected as local HTML files
- The extension check happened before URL detection in the detection order
- This caused web-based HTML documentation to fail processing

## Solution
- Modified _detect_html() to check if source is a URL first
- If source starts with http:// or https://, route to web scraper
- Otherwise treat as local file and route to html_scraper
- This enables internet fetch with fallback to local file

## Impact
- Users can now pass URLs like https://api.flutter.dev/flutter/rendering/RenderObject-class.html
- The create command will try to fetch from the internet first
- Falls back to local file if internet fetch fails
- Backward compatible: local .html files still work as before

* fix(source_detector): clean lint, sharpen docstring, add HTML-URL regression tests

- Strip W293 trailing whitespace that broke ruff in CI.
- Replace misleading "tries to fetch... falls back" wording — the dispatch
  is a prefix check, not a fetch-with-fallback.
- Add two regression tests covering both http:// and https:// URLs that
  end in .html (e.g. Flutter API docs), so the original bug — local-file
  routing for web URLs — cannot silently return.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: yusyus <yusufkaraaslan.yk@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(unified): drop guides truthy placeholder so fallback can fire (#364) (#375)

`_load_guide_collection` returned `{"guides": []}` when the tutorials
directory was missing or empty. That dict is truthy, which silently
short-circuits the `primary or fallback` chain in `_scrape_local()` and
`_run_c3_analysis()`:

    "how_to_guides": self._load_guide_collection(refs / "tutorials")
    or self._load_guide_collection(temp_output / "tutorials"),

When the post-`_generate_references` location (`refs/tutorials/`) is
missing — for example because the move was skipped or the cache holds
pre-move state — the truthy placeholder wins and the real
`guide_collection.json` sitting at `temp_output/tutorials/` is never
loaded. The unified skill builder then writes an empty
`references/codebase_analysis/{repo}/guides/guide_collection.json` and a
minimal `index.md`, even though the cache has guides ready to render.

Return `{}` (matching `_load_api_reference`'s falsy-on-miss contract) so
the `or` chain falls through correctly. Add four regression tests
covering the missing-dir, empty-dir, present, and fallback-wins cases.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(unified): generate codebase_analysis index, fix broken SKILL.md links (#362) (#376)

ARCHITECTURE.md is always written at
`references/codebase_analysis/{source_id}/ARCHITECTURE.md`, but four
SKILL.md call sites historically linked to
`references/codebase_analysis/ARCHITECTURE.md` (no source_id). That
target never existed once outputs became per-source-namespaced, so a
reader following the link from SKILL.md hit a 404. The user-visible
result for #362: detected patterns *are* in the references tree (after
PR #372 wired them up) but SKILL.md's "see ARCHITECTURE.md" pointer led
nowhere, making the analysis appear missing.

Generate `references/codebase_analysis/index.md` after all per-source
references are written, listing each source's ARCHITECTURE.md and any
populated subsection (patterns, examples, guides, configuration). Route
the four SKILL.md links through this stable target so the path resolves
whether the build has one source or many.

The index is omitted when no codebase analysis ran, so skills built from
docs/PDF/etc. only do not get a stray empty index.

Tests:
- TestCodebaseAnalysisIndex (3 cases): index lists each local source,
  SKILL.md link resolves on disk, no index when no C3.x data.
- Updated test_skill_md_includes_c3_summary to assert the new link.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v3.6.0 — IBM Bob target, GitHub issue filtering, codebase analysis fixes

Adds IBM Bob packaging target (#366), GitHub issue filtering with per-issue
files and Pinecone frontmatter (#367), and seven fixes across the unified
scraper (codebase_analysis index + guides fallback + C3.x for local sources +
language filter for clones), source detector (HTML URL detection), PDF
scraper (extracted_images markdown refs), and engine detection (Unity vs
Unreal via C# imports).

Verified: 3066 passed, 126 skipped, 0 failed (~14 min, CI-aligned exclusions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Octopus <liyuan851277048@icloud.com>
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Rafflesia Khan <11699686+RafflesiaKhan@users.noreply.github.com>
Co-authored-by: GreenFlux <support@greenflux.us>
Co-authored-by: Joseph Petty <greenflux@Josephs-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Bartek Stoliński <53336850+bstolinski@users.noreply.github.com>
yusufkaraaslan added a commit that referenced this pull request May 3, 2026
* fix: detect Unity via C# imports to prevent misidentification as Unreal (fixes #365) (#368)

Unity C# projects were incorrectly detected as Unreal when the analyzed source
files contained paths with 'Source/' or 'Content/' subdirectories, which are
also valid Unreal engine markers.

Root causes:
1. Game engine detection did not check import_content, so 'using UnityEngine;'
   statements were ignored entirely.
2. Unity markers lacked import-based signals ('UnityEngine') and the unique
   Unity Package Manager file ('Packages/manifest.json').

Fix:
- Add 'UnityEngine' and 'Packages/manifest.json' to Unity FRAMEWORK_MARKERS.
- Extend the game engine detection loop to also check import_content, using
  the same high-confidence threshold (>= 1 import match) already applied to
  other frameworks like Django and Spring.
- Path/directory-based detection still requires 2+ matches to avoid false
  positives from generic directory names.

Tests: add test_architectural_pattern_detector.py covering:
- Unity detected via UnityEngine imports alone
- Unity not misidentified as Unreal when a Source/ subfolder exists
- Unreal projects still detected correctly
- Unity detected via Packages/manifest.json in file paths

Co-authored-by: octo-patch <octo-patch@github.com>

* fix: pass language filter to C3.x clone analysis (fixes #361) (#370)

The _run_c3_analysis method was always passing languages=None to
analyze_codebase, ignoring the language filter configured in the
GitHub source config. This caused the C3.x codebase analysis on
cloned repos to either find no source files (when the repo only
has files of the filtered language) or analyze the wrong language
set entirely.

Now passes source.get("languages") so the language filter is
respected consistently with the local source analysis.

Co-authored-by: octo-patch <octo-patch@github.com>

* Add IBM Bob packaging target and agent install support (#366)

* Add IBM Bob packaging target and agent install support

* Update README.md

* Fix IBM Bob adaptor compatibility for Python 3.10/3.11

* formattiing fixed

* feat: GitHub issue filtering, per-issue files, and Pinecone frontmatter (#367)

* feat: GitHub issue filtering, per-issue files, and Pinecone frontmatter

Add issue filtering (--issue-labels, --issue-state, --since, --max-comments),
per-issue markdown files with YAML frontmatter, Pinecone adaptor frontmatter
parsing into vector metadata, and full body preservation (was truncated to
500 chars). Includes 598 lines of new tests.

* fix: preserve previous defaults for issue scraping

- Per-issue files are now opt-in via --per-issue-files (was always-on)
- Comment fetching disabled by default (--max-comments 0, was 50)
- Issue body truncated to 500 chars by default (full body only with --per-issue-files)
- Add test for default truncation and default no-comment behavior

* fix: address PR #367 review (dead code, label kwargs, Z-suffix, issues subdir)

- Drop unreachable setup_argument_parser/main from github_scraper.py
- Pass --issue-labels as plain strings to PyGithub (drops extra get_label call)
- Normalize trailing 'Z' in --since for Python 3.10 fromisoformat compatibility
- Per-issue files moved to references/issues/{owner}-{repo}-{n}.md to avoid
  collisions when multiple repos share a skill_dir
- Document data.json body truncation when --per-issue-files is set
- Help text: note --max-comments cost and --since 'Z' suffix support
- Tests: Z-suffix parsing, label-as-strings, per-issue subdir + collision,
  malformed YAML frontmatter resilience in pinecone adaptor
- Re-sync uv.lock against origin/development

---------

Co-authored-by: Joseph Petty <greenflux@Josephs-MacBook-Pro.local>

* fix(unified): emit C3.x output for local sources (#363) (#372)

The unified skill builder previously dropped all C3.x analysis (test_examples,
patterns, how_to_guides, config_patterns, architecture, ...) produced by
local sources. Reference generation and the SKILL.md summary both consumed
GitHub sources only, so a unified config with extract_tests=true would
extract examples to cache but never surface them.

Refactor _generate_c3_analysis_references to delegate to a shared
_write_codebase_analysis_references helper, then wire a parallel
_generate_local_codebase_analysis_references loop that walks
scraped_data["local"] and emits the same reference layout per source
(filesystem-safe IDs via _sanitize_source_id).

Rewrite _format_c3_summary_section to take a list of payloads and
aggregate counts across sources; collect them via a new _collect_c3_payloads
helper that pulls from both GitHub and local. Single-dict input still works
for backward compatibility.

Adds 7 regression tests in tests/test_unified.py covering the headline
issue, SKILL.md aggregation, multi-source collection, ID sanitization,
and the no-C3-data-skipped case.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: insert markdown image refs for extracted_images in PDF output (#369)

* fix: insert markdown image refs for extracted_images in PDF output (fixes #338)

Images extracted by pdf_extractor_poc were saved to assets/images/ but
never referenced in the generated markdown files. The _generate_reference_file
method checked for page["images"] (legacy format) but the extractor stores
images as page["extracted_images"] with filename/path keys, not raw data.

Added handling for the extracted_images format: writes
![filename](../assets/images/filename) references for each extracted image.
The legacy images format (with raw data) is preserved for backward compat.

Also adds test coverage for the extracted_images reference generation.

* fix(pdf): restore TestErrorHandling class and address review feedback

- Restore `class TestErrorHandling(unittest.TestCase):` declaration that
  was accidentally dropped during the prior patch — its 3 tests were
  silently inherited by TestImageHandling, breaking class-based filtering.
- Drop the dead dummy-image setup in test_extracted_images_references_in_markdown
  (the markdown writer never reads the file).
- Use friendlier alt text: `Image from page {page_number}` instead of the
  raw filename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: yusyus <yusufkaraaslan.yk@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: detect HTML URLs before treating as local files (#373)

* fix: detect HTML URLs before treating as local files

Fixes the create command's auto-detect source feature to properly handle
URLs ending in .html extension (e.g., https://api.flutter.dev/flutter/rendering/RenderObject-class.html).

## Problem
- URLs with .html extension were incorrectly detected as local HTML files
- The extension check happened before URL detection in the detection order
- This caused web-based HTML documentation to fail processing

## Solution
- Modified _detect_html() to check if source is a URL first
- If source starts with http:// or https://, route to web scraper
- Otherwise treat as local file and route to html_scraper
- This enables internet fetch with fallback to local file

## Impact
- Users can now pass URLs like https://api.flutter.dev/flutter/rendering/RenderObject-class.html
- The create command will try to fetch from the internet first
- Falls back to local file if internet fetch fails
- Backward compatible: local .html files still work as before

* fix(source_detector): clean lint, sharpen docstring, add HTML-URL regression tests

- Strip W293 trailing whitespace that broke ruff in CI.
- Replace misleading "tries to fetch... falls back" wording — the dispatch
  is a prefix check, not a fetch-with-fallback.
- Add two regression tests covering both http:// and https:// URLs that
  end in .html (e.g. Flutter API docs), so the original bug — local-file
  routing for web URLs — cannot silently return.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: yusyus <yusufkaraaslan.yk@pm.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(unified): drop guides truthy placeholder so fallback can fire (#364) (#375)

`_load_guide_collection` returned `{"guides": []}` when the tutorials
directory was missing or empty. That dict is truthy, which silently
short-circuits the `primary or fallback` chain in `_scrape_local()` and
`_run_c3_analysis()`:

    "how_to_guides": self._load_guide_collection(refs / "tutorials")
    or self._load_guide_collection(temp_output / "tutorials"),

When the post-`_generate_references` location (`refs/tutorials/`) is
missing — for example because the move was skipped or the cache holds
pre-move state — the truthy placeholder wins and the real
`guide_collection.json` sitting at `temp_output/tutorials/` is never
loaded. The unified skill builder then writes an empty
`references/codebase_analysis/{repo}/guides/guide_collection.json` and a
minimal `index.md`, even though the cache has guides ready to render.

Return `{}` (matching `_load_api_reference`'s falsy-on-miss contract) so
the `or` chain falls through correctly. Add four regression tests
covering the missing-dir, empty-dir, present, and fallback-wins cases.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(unified): generate codebase_analysis index, fix broken SKILL.md links (#362) (#376)

ARCHITECTURE.md is always written at
`references/codebase_analysis/{source_id}/ARCHITECTURE.md`, but four
SKILL.md call sites historically linked to
`references/codebase_analysis/ARCHITECTURE.md` (no source_id). That
target never existed once outputs became per-source-namespaced, so a
reader following the link from SKILL.md hit a 404. The user-visible
result for #362: detected patterns *are* in the references tree (after
PR #372 wired them up) but SKILL.md's "see ARCHITECTURE.md" pointer led
nowhere, making the analysis appear missing.

Generate `references/codebase_analysis/index.md` after all per-source
references are written, listing each source's ARCHITECTURE.md and any
populated subsection (patterns, examples, guides, configuration). Route
the four SKILL.md links through this stable target so the path resolves
whether the build has one source or many.

The index is omitted when no codebase analysis ran, so skills built from
docs/PDF/etc. only do not get a stray empty index.

Tests:
- TestCodebaseAnalysisIndex (3 cases): index lists each local source,
  SKILL.md link resolves on disk, no index when no C3.x data.
- Updated test_skill_md_includes_c3_summary to assert the new link.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v3.6.0 — IBM Bob target, GitHub issue filtering, codebase analysis fixes

Adds IBM Bob packaging target (#366), GitHub issue filtering with per-issue
files and Pinecone frontmatter (#367), and seven fixes across the unified
scraper (codebase_analysis index + guides fallback + C3.x for local sources +
language filter for clones), source detector (HTML URL detection), PDF
scraper (extracted_images markdown refs), and engine detection (Unity vs
Unreal via C# imports).

Verified: 3066 passed, 126 skipped, 0 failed (~14 min, CI-aligned exclusions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(docker): use Compose v2 plugin (`docker compose`) in PR test job (#378)

GitHub-hosted ubuntu-latest runners no longer ship the standalone
`docker-compose` v1 binary. The Test Docker Compose step has been
failing on every PR with `command not found` (exit 127). Switching
to the Docker CLI plugin form (`docker compose`) restores the check.

Only the PR-only `test-images` job was affected; the actual build/push
job uses `docker/build-push-action` and is unaffected.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Octopus <liyuan851277048@icloud.com>
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Rafflesia Khan <11699686+RafflesiaKhan@users.noreply.github.com>
Co-authored-by: GreenFlux <support@greenflux.us>
Co-authored-by: Joseph Petty <greenflux@Josephs-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Bartek Stoliński <53336850+bstolinski@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extracted test examples not included in unified skill output

1 participant