fix(unified): emit C3.x output for local sources (#363)#372
Merged
yusufkaraaslan merged 1 commit intoApr 29, 2026
Conversation
The unified skill builder previously dropped all C3.x analysis (test_examples, patterns, how_to_guides, config_patterns, architecture, ...) produced by local sources. Reference generation and the SKILL.md summary both consumed GitHub sources only, so a unified config with extract_tests=true would extract examples to cache but never surface them. Refactor _generate_c3_analysis_references to delegate to a shared _write_codebase_analysis_references helper, then wire a parallel _generate_local_codebase_analysis_references loop that walks scraped_data["local"] and emits the same reference layout per source (filesystem-safe IDs via _sanitize_source_id). Rewrite _format_c3_summary_section to take a list of payloads and aggregate counts across sources; collect them via a new _collect_c3_payloads helper that pulls from both GitHub and local. Single-dict input still works for backward compatibility. Adds 7 regression tests in tests/test_unified.py covering the headline issue, SKILL.md aggregation, multi-source collection, ID sanitization, and the no-C3-data-skipped case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
yusufkaraaslan
added a commit
that referenced
this pull request
May 2, 2026
…inks (#362) (#376) ARCHITECTURE.md is always written at `references/codebase_analysis/{source_id}/ARCHITECTURE.md`, but four SKILL.md call sites historically linked to `references/codebase_analysis/ARCHITECTURE.md` (no source_id). That target never existed once outputs became per-source-namespaced, so a reader following the link from SKILL.md hit a 404. The user-visible result for #362: detected patterns *are* in the references tree (after PR #372 wired them up) but SKILL.md's "see ARCHITECTURE.md" pointer led nowhere, making the analysis appear missing. Generate `references/codebase_analysis/index.md` after all per-source references are written, listing each source's ARCHITECTURE.md and any populated subsection (patterns, examples, guides, configuration). Route the four SKILL.md links through this stable target so the path resolves whether the build has one source or many. The index is omitted when no codebase analysis ran, so skills built from docs/PDF/etc. only do not get a stray empty index. Tests: - TestCodebaseAnalysisIndex (3 cases): index lists each local source, SKILL.md link resolves on disk, no index when no C3.x data. - Updated test_skill_md_includes_c3_summary to assert the new link. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
yusufkaraaslan
added a commit
that referenced
this pull request
May 3, 2026
* fix: detect Unity via C# imports to prevent misidentification as Unreal (fixes #365) (#368) Unity C# projects were incorrectly detected as Unreal when the analyzed source files contained paths with 'Source/' or 'Content/' subdirectories, which are also valid Unreal engine markers. Root causes: 1. Game engine detection did not check import_content, so 'using UnityEngine;' statements were ignored entirely. 2. Unity markers lacked import-based signals ('UnityEngine') and the unique Unity Package Manager file ('Packages/manifest.json'). Fix: - Add 'UnityEngine' and 'Packages/manifest.json' to Unity FRAMEWORK_MARKERS. - Extend the game engine detection loop to also check import_content, using the same high-confidence threshold (>= 1 import match) already applied to other frameworks like Django and Spring. - Path/directory-based detection still requires 2+ matches to avoid false positives from generic directory names. Tests: add test_architectural_pattern_detector.py covering: - Unity detected via UnityEngine imports alone - Unity not misidentified as Unreal when a Source/ subfolder exists - Unreal projects still detected correctly - Unity detected via Packages/manifest.json in file paths Co-authored-by: octo-patch <octo-patch@github.com> * fix: pass language filter to C3.x clone analysis (fixes #361) (#370) The _run_c3_analysis method was always passing languages=None to analyze_codebase, ignoring the language filter configured in the GitHub source config. This caused the C3.x codebase analysis on cloned repos to either find no source files (when the repo only has files of the filtered language) or analyze the wrong language set entirely. Now passes source.get("languages") so the language filter is respected consistently with the local source analysis. Co-authored-by: octo-patch <octo-patch@github.com> * Add IBM Bob packaging target and agent install support (#366) * Add IBM Bob packaging target and agent install support * Update README.md * Fix IBM Bob adaptor compatibility for Python 3.10/3.11 * formattiing fixed * feat: GitHub issue filtering, per-issue files, and Pinecone frontmatter (#367) * feat: GitHub issue filtering, per-issue files, and Pinecone frontmatter Add issue filtering (--issue-labels, --issue-state, --since, --max-comments), per-issue markdown files with YAML frontmatter, Pinecone adaptor frontmatter parsing into vector metadata, and full body preservation (was truncated to 500 chars). Includes 598 lines of new tests. * fix: preserve previous defaults for issue scraping - Per-issue files are now opt-in via --per-issue-files (was always-on) - Comment fetching disabled by default (--max-comments 0, was 50) - Issue body truncated to 500 chars by default (full body only with --per-issue-files) - Add test for default truncation and default no-comment behavior * fix: address PR #367 review (dead code, label kwargs, Z-suffix, issues subdir) - Drop unreachable setup_argument_parser/main from github_scraper.py - Pass --issue-labels as plain strings to PyGithub (drops extra get_label call) - Normalize trailing 'Z' in --since for Python 3.10 fromisoformat compatibility - Per-issue files moved to references/issues/{owner}-{repo}-{n}.md to avoid collisions when multiple repos share a skill_dir - Document data.json body truncation when --per-issue-files is set - Help text: note --max-comments cost and --since 'Z' suffix support - Tests: Z-suffix parsing, label-as-strings, per-issue subdir + collision, malformed YAML frontmatter resilience in pinecone adaptor - Re-sync uv.lock against origin/development --------- Co-authored-by: Joseph Petty <greenflux@Josephs-MacBook-Pro.local> * fix(unified): emit C3.x output for local sources (#363) (#372) The unified skill builder previously dropped all C3.x analysis (test_examples, patterns, how_to_guides, config_patterns, architecture, ...) produced by local sources. Reference generation and the SKILL.md summary both consumed GitHub sources only, so a unified config with extract_tests=true would extract examples to cache but never surface them. Refactor _generate_c3_analysis_references to delegate to a shared _write_codebase_analysis_references helper, then wire a parallel _generate_local_codebase_analysis_references loop that walks scraped_data["local"] and emits the same reference layout per source (filesystem-safe IDs via _sanitize_source_id). Rewrite _format_c3_summary_section to take a list of payloads and aggregate counts across sources; collect them via a new _collect_c3_payloads helper that pulls from both GitHub and local. Single-dict input still works for backward compatibility. Adds 7 regression tests in tests/test_unified.py covering the headline issue, SKILL.md aggregation, multi-source collection, ID sanitization, and the no-C3-data-skipped case. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: insert markdown image refs for extracted_images in PDF output (#369) * fix: insert markdown image refs for extracted_images in PDF output (fixes #338) Images extracted by pdf_extractor_poc were saved to assets/images/ but never referenced in the generated markdown files. The _generate_reference_file method checked for page["images"] (legacy format) but the extractor stores images as page["extracted_images"] with filename/path keys, not raw data. Added handling for the extracted_images format: writes  references for each extracted image. The legacy images format (with raw data) is preserved for backward compat. Also adds test coverage for the extracted_images reference generation. * fix(pdf): restore TestErrorHandling class and address review feedback - Restore `class TestErrorHandling(unittest.TestCase):` declaration that was accidentally dropped during the prior patch — its 3 tests were silently inherited by TestImageHandling, breaking class-based filtering. - Drop the dead dummy-image setup in test_extracted_images_references_in_markdown (the markdown writer never reads the file). - Use friendlier alt text: `Image from page {page_number}` instead of the raw filename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: yusyus <yusufkaraaslan.yk@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: detect HTML URLs before treating as local files (#373) * fix: detect HTML URLs before treating as local files Fixes the create command's auto-detect source feature to properly handle URLs ending in .html extension (e.g., https://api.flutter.dev/flutter/rendering/RenderObject-class.html). ## Problem - URLs with .html extension were incorrectly detected as local HTML files - The extension check happened before URL detection in the detection order - This caused web-based HTML documentation to fail processing ## Solution - Modified _detect_html() to check if source is a URL first - If source starts with http:// or https://, route to web scraper - Otherwise treat as local file and route to html_scraper - This enables internet fetch with fallback to local file ## Impact - Users can now pass URLs like https://api.flutter.dev/flutter/rendering/RenderObject-class.html - The create command will try to fetch from the internet first - Falls back to local file if internet fetch fails - Backward compatible: local .html files still work as before * fix(source_detector): clean lint, sharpen docstring, add HTML-URL regression tests - Strip W293 trailing whitespace that broke ruff in CI. - Replace misleading "tries to fetch... falls back" wording — the dispatch is a prefix check, not a fetch-with-fallback. - Add two regression tests covering both http:// and https:// URLs that end in .html (e.g. Flutter API docs), so the original bug — local-file routing for web URLs — cannot silently return. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: yusyus <yusufkaraaslan.yk@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(unified): drop guides truthy placeholder so fallback can fire (#364) (#375) `_load_guide_collection` returned `{"guides": []}` when the tutorials directory was missing or empty. That dict is truthy, which silently short-circuits the `primary or fallback` chain in `_scrape_local()` and `_run_c3_analysis()`: "how_to_guides": self._load_guide_collection(refs / "tutorials") or self._load_guide_collection(temp_output / "tutorials"), When the post-`_generate_references` location (`refs/tutorials/`) is missing — for example because the move was skipped or the cache holds pre-move state — the truthy placeholder wins and the real `guide_collection.json` sitting at `temp_output/tutorials/` is never loaded. The unified skill builder then writes an empty `references/codebase_analysis/{repo}/guides/guide_collection.json` and a minimal `index.md`, even though the cache has guides ready to render. Return `{}` (matching `_load_api_reference`'s falsy-on-miss contract) so the `or` chain falls through correctly. Add four regression tests covering the missing-dir, empty-dir, present, and fallback-wins cases. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(unified): generate codebase_analysis index, fix broken SKILL.md links (#362) (#376) ARCHITECTURE.md is always written at `references/codebase_analysis/{source_id}/ARCHITECTURE.md`, but four SKILL.md call sites historically linked to `references/codebase_analysis/ARCHITECTURE.md` (no source_id). That target never existed once outputs became per-source-namespaced, so a reader following the link from SKILL.md hit a 404. The user-visible result for #362: detected patterns *are* in the references tree (after PR #372 wired them up) but SKILL.md's "see ARCHITECTURE.md" pointer led nowhere, making the analysis appear missing. Generate `references/codebase_analysis/index.md` after all per-source references are written, listing each source's ARCHITECTURE.md and any populated subsection (patterns, examples, guides, configuration). Route the four SKILL.md links through this stable target so the path resolves whether the build has one source or many. The index is omitted when no codebase analysis ran, so skills built from docs/PDF/etc. only do not get a stray empty index. Tests: - TestCodebaseAnalysisIndex (3 cases): index lists each local source, SKILL.md link resolves on disk, no index when no C3.x data. - Updated test_skill_md_includes_c3_summary to assert the new link. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * release: v3.6.0 — IBM Bob target, GitHub issue filtering, codebase analysis fixes Adds IBM Bob packaging target (#366), GitHub issue filtering with per-issue files and Pinecone frontmatter (#367), and seven fixes across the unified scraper (codebase_analysis index + guides fallback + C3.x for local sources + language filter for clones), source detector (HTML URL detection), PDF scraper (extracted_images markdown refs), and engine detection (Unity vs Unreal via C# imports). Verified: 3066 passed, 126 skipped, 0 failed (~14 min, CI-aligned exclusions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Octopus <liyuan851277048@icloud.com> Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: Rafflesia Khan <11699686+RafflesiaKhan@users.noreply.github.com> Co-authored-by: GreenFlux <support@greenflux.us> Co-authored-by: Joseph Petty <greenflux@Josephs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Bartek Stoliński <53336850+bstolinski@users.noreply.github.com>
yusufkaraaslan
added a commit
that referenced
this pull request
May 3, 2026
* fix: detect Unity via C# imports to prevent misidentification as Unreal (fixes #365) (#368) Unity C# projects were incorrectly detected as Unreal when the analyzed source files contained paths with 'Source/' or 'Content/' subdirectories, which are also valid Unreal engine markers. Root causes: 1. Game engine detection did not check import_content, so 'using UnityEngine;' statements were ignored entirely. 2. Unity markers lacked import-based signals ('UnityEngine') and the unique Unity Package Manager file ('Packages/manifest.json'). Fix: - Add 'UnityEngine' and 'Packages/manifest.json' to Unity FRAMEWORK_MARKERS. - Extend the game engine detection loop to also check import_content, using the same high-confidence threshold (>= 1 import match) already applied to other frameworks like Django and Spring. - Path/directory-based detection still requires 2+ matches to avoid false positives from generic directory names. Tests: add test_architectural_pattern_detector.py covering: - Unity detected via UnityEngine imports alone - Unity not misidentified as Unreal when a Source/ subfolder exists - Unreal projects still detected correctly - Unity detected via Packages/manifest.json in file paths Co-authored-by: octo-patch <octo-patch@github.com> * fix: pass language filter to C3.x clone analysis (fixes #361) (#370) The _run_c3_analysis method was always passing languages=None to analyze_codebase, ignoring the language filter configured in the GitHub source config. This caused the C3.x codebase analysis on cloned repos to either find no source files (when the repo only has files of the filtered language) or analyze the wrong language set entirely. Now passes source.get("languages") so the language filter is respected consistently with the local source analysis. Co-authored-by: octo-patch <octo-patch@github.com> * Add IBM Bob packaging target and agent install support (#366) * Add IBM Bob packaging target and agent install support * Update README.md * Fix IBM Bob adaptor compatibility for Python 3.10/3.11 * formattiing fixed * feat: GitHub issue filtering, per-issue files, and Pinecone frontmatter (#367) * feat: GitHub issue filtering, per-issue files, and Pinecone frontmatter Add issue filtering (--issue-labels, --issue-state, --since, --max-comments), per-issue markdown files with YAML frontmatter, Pinecone adaptor frontmatter parsing into vector metadata, and full body preservation (was truncated to 500 chars). Includes 598 lines of new tests. * fix: preserve previous defaults for issue scraping - Per-issue files are now opt-in via --per-issue-files (was always-on) - Comment fetching disabled by default (--max-comments 0, was 50) - Issue body truncated to 500 chars by default (full body only with --per-issue-files) - Add test for default truncation and default no-comment behavior * fix: address PR #367 review (dead code, label kwargs, Z-suffix, issues subdir) - Drop unreachable setup_argument_parser/main from github_scraper.py - Pass --issue-labels as plain strings to PyGithub (drops extra get_label call) - Normalize trailing 'Z' in --since for Python 3.10 fromisoformat compatibility - Per-issue files moved to references/issues/{owner}-{repo}-{n}.md to avoid collisions when multiple repos share a skill_dir - Document data.json body truncation when --per-issue-files is set - Help text: note --max-comments cost and --since 'Z' suffix support - Tests: Z-suffix parsing, label-as-strings, per-issue subdir + collision, malformed YAML frontmatter resilience in pinecone adaptor - Re-sync uv.lock against origin/development --------- Co-authored-by: Joseph Petty <greenflux@Josephs-MacBook-Pro.local> * fix(unified): emit C3.x output for local sources (#363) (#372) The unified skill builder previously dropped all C3.x analysis (test_examples, patterns, how_to_guides, config_patterns, architecture, ...) produced by local sources. Reference generation and the SKILL.md summary both consumed GitHub sources only, so a unified config with extract_tests=true would extract examples to cache but never surface them. Refactor _generate_c3_analysis_references to delegate to a shared _write_codebase_analysis_references helper, then wire a parallel _generate_local_codebase_analysis_references loop that walks scraped_data["local"] and emits the same reference layout per source (filesystem-safe IDs via _sanitize_source_id). Rewrite _format_c3_summary_section to take a list of payloads and aggregate counts across sources; collect them via a new _collect_c3_payloads helper that pulls from both GitHub and local. Single-dict input still works for backward compatibility. Adds 7 regression tests in tests/test_unified.py covering the headline issue, SKILL.md aggregation, multi-source collection, ID sanitization, and the no-C3-data-skipped case. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: insert markdown image refs for extracted_images in PDF output (#369) * fix: insert markdown image refs for extracted_images in PDF output (fixes #338) Images extracted by pdf_extractor_poc were saved to assets/images/ but never referenced in the generated markdown files. The _generate_reference_file method checked for page["images"] (legacy format) but the extractor stores images as page["extracted_images"] with filename/path keys, not raw data. Added handling for the extracted_images format: writes  references for each extracted image. The legacy images format (with raw data) is preserved for backward compat. Also adds test coverage for the extracted_images reference generation. * fix(pdf): restore TestErrorHandling class and address review feedback - Restore `class TestErrorHandling(unittest.TestCase):` declaration that was accidentally dropped during the prior patch — its 3 tests were silently inherited by TestImageHandling, breaking class-based filtering. - Drop the dead dummy-image setup in test_extracted_images_references_in_markdown (the markdown writer never reads the file). - Use friendlier alt text: `Image from page {page_number}` instead of the raw filename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: yusyus <yusufkaraaslan.yk@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: detect HTML URLs before treating as local files (#373) * fix: detect HTML URLs before treating as local files Fixes the create command's auto-detect source feature to properly handle URLs ending in .html extension (e.g., https://api.flutter.dev/flutter/rendering/RenderObject-class.html). ## Problem - URLs with .html extension were incorrectly detected as local HTML files - The extension check happened before URL detection in the detection order - This caused web-based HTML documentation to fail processing ## Solution - Modified _detect_html() to check if source is a URL first - If source starts with http:// or https://, route to web scraper - Otherwise treat as local file and route to html_scraper - This enables internet fetch with fallback to local file ## Impact - Users can now pass URLs like https://api.flutter.dev/flutter/rendering/RenderObject-class.html - The create command will try to fetch from the internet first - Falls back to local file if internet fetch fails - Backward compatible: local .html files still work as before * fix(source_detector): clean lint, sharpen docstring, add HTML-URL regression tests - Strip W293 trailing whitespace that broke ruff in CI. - Replace misleading "tries to fetch... falls back" wording — the dispatch is a prefix check, not a fetch-with-fallback. - Add two regression tests covering both http:// and https:// URLs that end in .html (e.g. Flutter API docs), so the original bug — local-file routing for web URLs — cannot silently return. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: yusyus <yusufkaraaslan.yk@pm.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(unified): drop guides truthy placeholder so fallback can fire (#364) (#375) `_load_guide_collection` returned `{"guides": []}` when the tutorials directory was missing or empty. That dict is truthy, which silently short-circuits the `primary or fallback` chain in `_scrape_local()` and `_run_c3_analysis()`: "how_to_guides": self._load_guide_collection(refs / "tutorials") or self._load_guide_collection(temp_output / "tutorials"), When the post-`_generate_references` location (`refs/tutorials/`) is missing — for example because the move was skipped or the cache holds pre-move state — the truthy placeholder wins and the real `guide_collection.json` sitting at `temp_output/tutorials/` is never loaded. The unified skill builder then writes an empty `references/codebase_analysis/{repo}/guides/guide_collection.json` and a minimal `index.md`, even though the cache has guides ready to render. Return `{}` (matching `_load_api_reference`'s falsy-on-miss contract) so the `or` chain falls through correctly. Add four regression tests covering the missing-dir, empty-dir, present, and fallback-wins cases. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(unified): generate codebase_analysis index, fix broken SKILL.md links (#362) (#376) ARCHITECTURE.md is always written at `references/codebase_analysis/{source_id}/ARCHITECTURE.md`, but four SKILL.md call sites historically linked to `references/codebase_analysis/ARCHITECTURE.md` (no source_id). That target never existed once outputs became per-source-namespaced, so a reader following the link from SKILL.md hit a 404. The user-visible result for #362: detected patterns *are* in the references tree (after PR #372 wired them up) but SKILL.md's "see ARCHITECTURE.md" pointer led nowhere, making the analysis appear missing. Generate `references/codebase_analysis/index.md` after all per-source references are written, listing each source's ARCHITECTURE.md and any populated subsection (patterns, examples, guides, configuration). Route the four SKILL.md links through this stable target so the path resolves whether the build has one source or many. The index is omitted when no codebase analysis ran, so skills built from docs/PDF/etc. only do not get a stray empty index. Tests: - TestCodebaseAnalysisIndex (3 cases): index lists each local source, SKILL.md link resolves on disk, no index when no C3.x data. - Updated test_skill_md_includes_c3_summary to assert the new link. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * release: v3.6.0 — IBM Bob target, GitHub issue filtering, codebase analysis fixes Adds IBM Bob packaging target (#366), GitHub issue filtering with per-issue files and Pinecone frontmatter (#367), and seven fixes across the unified scraper (codebase_analysis index + guides fallback + C3.x for local sources + language filter for clones), source detector (HTML URL detection), PDF scraper (extracted_images markdown refs), and engine detection (Unity vs Unreal via C# imports). Verified: 3066 passed, 126 skipped, 0 failed (~14 min, CI-aligned exclusions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(docker): use Compose v2 plugin (`docker compose`) in PR test job (#378) GitHub-hosted ubuntu-latest runners no longer ship the standalone `docker-compose` v1 binary. The Test Docker Compose step has been failing on every PR with `command not found` (exit 127). Switching to the Docker CLI plugin form (`docker compose`) restores the check. Only the PR-only `test-images` job was affected; the actual build/push job uses `docker/build-push-action` and is unaffected. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Octopus <liyuan851277048@icloud.com> Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: Rafflesia Khan <11699686+RafflesiaKhan@users.noreply.github.com> Co-authored-by: GreenFlux <support@greenflux.us> Co-authored-by: Joseph Petty <greenflux@Josephs-MacBook-Pro.local> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Bartek Stoliński <53336850+bstolinski@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #363
Problem
The unified skill builder dropped all C3.x analysis output produced by local sources.
_run_local_codebase_analysiscorrectly extractedtest_examples,patterns,how_to_guides,config_patterns,architecture, etc., wrote them to cache (local_analysis_{idx}_{name}/), and stored them onscraped_data["local"]— butunified_skill_builder.pyonly consumed GitHub sources for C3.x reference generation and SKILL.md summarization.Concretely, both code paths only looked at GitHub:
_generate_references()at line 1095-1101 iteratedscraped_data.get("github", [])and called_generate_c3_analysis_referencesper repo._generate_skill_md()at line 884-896 readscraped_data.get("github", {})and passed only the first GitHub source'sc3_analysisto_format_c3_summary_section.So a unified config with a
localsource andextract_tests=truewould happily run the test-example extractor (122 files / 292 examples in the issue), then quietly throw the result away.The bug is broader than the issue title suggests — it's not just
test_examples, it's the entire C3.x payload from local sources.Solution
Refactor and extend in
unified_skill_builder.py:_generate_c3_analysis_referencesto delegate to a new shared helper_write_codebase_analysis_references(c3_data, source_id, github_data=None)that does the actual work. The GitHub path is now a thin lookup wrapper._generate_local_codebase_analysis_referencesthat walksscraped_data["local"], skips sources with no C3 fields, and calls the shared writer with a sanitized source ID. Wired into_generate_referencesnext to the GitHub loop._sanitize_source_idto make local IDs filesystem-safe (slashes, spaces, special chars →_; empty →"local")._collect_c3_payloadsthat gathers C3 data from both GitHub and local sources into a flat list._format_c3_summary_sectionto accept either a single dict (legacy) or a list of payloads, aggregating counts (test examples, design patterns, how-to guides, config files, security alerts) across all sources.Testing
7 new regression tests in
tests/test_unified.py:test_local_source_test_examples_become_references_363— the headline issue: local source with test_examples producesreferences/codebase_analysis/<sanitized-id>/examples/test_examples.jsonand anARCHITECTURE.mdmentioning the example count.test_local_source_skill_md_includes_summary_363— SKILL.md gets the C3 summary section when only local has data.test_format_c3_summary_aggregates_across_sources_363— counts sum across multiple payloads.test_format_c3_summary_backward_compat_dict_arg_363— single-dict input still works.test_collect_c3_payloads_combines_github_and_local_363— combined harvesting from both source types.test_local_source_without_c3_data_skipped_363— empty local sources don't create empty reference dirs.test_sanitize_source_id_363— ID normalization edge cases.All 31
tests/test_unified.pytests pass; 93 unified-related tests acrosstest_unified.py + test_c3_integration.py + test_unified_scraper_orchestration.py + test_unified_analyzer.pyall pass. Lint + format clean.🤖 Generated with Claude Code