NeerajCodz commited on
Commit
bf7914e
·
1 Parent(s): 2892945

sync before github rebase

Browse files
.agents/skills/frontend-design/LICENSE.txt ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
.agents/skills/frontend-design/SKILL.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: frontend-design
3
+ description: Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.
4
+ license: Complete terms in LICENSE.txt
5
+ ---
6
+
7
+ This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
8
+
9
+ The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
10
+
11
+ ## Design Thinking
12
+
13
+ Before coding, understand the context and commit to a BOLD aesthetic direction:
14
+ - **Purpose**: What problem does this interface solve? Who uses it?
15
+ - **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
16
+ - **Constraints**: Technical requirements (framework, performance, accessibility).
17
+ - **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
18
+
19
+ **CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
20
+
21
+ Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
22
+ - Production-grade and functional
23
+ - Visually striking and memorable
24
+ - Cohesive with a clear aesthetic point-of-view
25
+ - Meticulously refined in every detail
26
+
27
+ ## Frontend Aesthetics Guidelines
28
+
29
+ Focus on:
30
+ - **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
31
+ - **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
32
+ - **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
33
+ - **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
34
+ - **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
35
+
36
+ NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
37
+
38
+ Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
39
+
40
+ **IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
41
+
42
+ Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.
.agents/skills/hf-cli/SKILL.md ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: hf-cli
3
+ description: "Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing repositories, models, datasets, and Spaces on the Hugging Face Hub. Replaces now deprecated `huggingface-cli` command."
4
+ ---
5
+
6
+ Install: `curl -LsSf https://hf.co/cli/install.sh | bash -s`.
7
+
8
+ The Hugging Face Hub CLI tool `hf` is available. IMPORTANT: The `hf` command replaces the deprecated `huggingface-cli` command.
9
+
10
+ Use `hf --help` to view available functions. Note that auth commands are now all under `hf auth` e.g. `hf auth whoami`.
11
+
12
+ Generated with `huggingface_hub v1.8.0`. Run `hf skills add --force` to regenerate.
13
+
14
+ ## Commands
15
+
16
+ - `hf download REPO_ID` — Download files from the Hub. `[--type CHOICE --revision TEXT --include TEXT --exclude TEXT --cache-dir TEXT --local-dir TEXT --force-download --dry-run --quiet --max-workers INTEGER]`
17
+ - `hf env` — Print information about the environment.
18
+ - `hf sync` — Sync files between local directory and a bucket. `[--delete --ignore-times --ignore-sizes --plan TEXT --apply TEXT --dry-run --include TEXT --exclude TEXT --filter-from TEXT --existing --ignore-existing --verbose --quiet]`
19
+ - `hf upload REPO_ID` — Upload a file or a folder to the Hub. Recommended for single-commit uploads. `[--type CHOICE --revision TEXT --private --include TEXT --exclude TEXT --delete TEXT --commit-message TEXT --commit-description TEXT --create-pr --every FLOAT --quiet]`
20
+ - `hf upload-large-folder REPO_ID LOCAL_PATH` — Upload a large folder to the Hub. Recommended for resumable uploads. `[--type CHOICE --revision TEXT --private --include TEXT --exclude TEXT --num-workers INTEGER --no-report --no-bars]`
21
+ - `hf version` — Print information about the hf version.
22
+
23
+ ### `hf auth` — Manage authentication (login, logout, etc.).
24
+
25
+ - `hf auth list` — List all stored access tokens.
26
+ - `hf auth login` — Login using a token from huggingface.co/settings/tokens. `[--add-to-git-credential --force]`
27
+ - `hf auth logout` — Logout from a specific token. `[--token-name TEXT]`
28
+ - `hf auth switch` — Switch between access tokens. `[--token-name TEXT --add-to-git-credential]`
29
+ - `hf auth whoami` — Find out which huggingface.co account you are logged in as. `[--format CHOICE]`
30
+
31
+ ### `hf buckets` — Commands to interact with buckets.
32
+
33
+ - `hf buckets cp SRC` — Copy a single file to or from a bucket. `[--quiet]`
34
+ - `hf buckets create BUCKET_ID` — Create a new bucket. `[--private --exist-ok --quiet]`
35
+ - `hf buckets delete BUCKET_ID` — Delete a bucket. `[--yes --missing-ok --quiet]`
36
+ - `hf buckets info BUCKET_ID` — Get info about a bucket. `[--quiet]`
37
+ - `hf buckets list` — List buckets or files in a bucket. `[--human-readable --tree --recursive --format CHOICE --quiet]`
38
+ - `hf buckets move FROM_ID TO_ID` — Move (rename) a bucket to a new name or namespace.
39
+ - `hf buckets remove ARGUMENT` — Remove files from a bucket. `[--recursive --yes --dry-run --include TEXT --exclude TEXT --quiet]`
40
+ - `hf buckets sync` — Sync files between local directory and a bucket. `[--delete --ignore-times --ignore-sizes --plan TEXT --apply TEXT --dry-run --include TEXT --exclude TEXT --filter-from TEXT --existing --ignore-existing --verbose --quiet]`
41
+
42
+ ### `hf cache` — Manage local cache directory.
43
+
44
+ - `hf cache list` — List cached repositories or revisions. `[--cache-dir TEXT --revisions --filter TEXT --format CHOICE --quiet --sort CHOICE --limit INTEGER]`
45
+ - `hf cache prune` — Remove detached revisions from the cache. `[--cache-dir TEXT --yes --dry-run]`
46
+ - `hf cache rm TARGETS` — Remove cached repositories or revisions. `[--cache-dir TEXT --yes --dry-run]`
47
+ - `hf cache verify REPO_ID` — Verify checksums for a single repo revision from cache or a local directory. `[--type CHOICE --revision TEXT --cache-dir TEXT --local-dir TEXT --fail-on-missing-files --fail-on-extra-files]`
48
+
49
+ ### `hf collections` — Interact with collections on the Hub.
50
+
51
+ - `hf collections add-item COLLECTION_SLUG ITEM_ID ITEM_TYPE` — Add an item to a collection. `[--note TEXT --exists-ok]`
52
+ - `hf collections create TITLE` — Create a new collection on the Hub. `[--namespace TEXT --description TEXT --private --exists-ok]`
53
+ - `hf collections delete COLLECTION_SLUG` — Delete a collection from the Hub. `[--missing-ok]`
54
+ - `hf collections delete-item COLLECTION_SLUG ITEM_OBJECT_ID` — Delete an item from a collection. `[--missing-ok]`
55
+ - `hf collections info COLLECTION_SLUG` — Get info about a collection on the Hub. Output is in JSON format.
56
+ - `hf collections list` — List collections on the Hub. `[--owner TEXT --item TEXT --sort CHOICE --limit INTEGER --format CHOICE --quiet]`
57
+ - `hf collections update COLLECTION_SLUG` — Update a collection's metadata on the Hub. `[--title TEXT --description TEXT --position INTEGER --private --theme TEXT]`
58
+ - `hf collections update-item COLLECTION_SLUG ITEM_OBJECT_ID` — Update an item in a collection. `[--note TEXT --position INTEGER]`
59
+
60
+ ### `hf datasets` — Interact with datasets on the Hub.
61
+
62
+ - `hf datasets info DATASET_ID` — Get info about a dataset on the Hub. Output is in JSON format. `[--revision TEXT --expand TEXT]`
63
+ - `hf datasets list` — List datasets on the Hub. `[--search TEXT --author TEXT --filter TEXT --sort CHOICE --limit INTEGER --expand TEXT --format CHOICE --quiet]`
64
+ - `hf datasets parquet DATASET_ID` — List parquet file URLs available for a dataset. `[--subset TEXT --split TEXT --format CHOICE --quiet]`
65
+ - `hf datasets sql SQL` — Execute a raw SQL query with DuckDB against dataset parquet URLs. `[--format CHOICE]`
66
+
67
+ ### `hf discussions` — Manage discussions and pull requests on the Hub.
68
+
69
+ - `hf discussions close REPO_ID NUM` — Close a discussion or pull request. `[--comment TEXT --yes --type CHOICE]`
70
+ - `hf discussions comment REPO_ID NUM` — Comment on a discussion or pull request. `[--body TEXT --body-file PATH --type CHOICE]`
71
+ - `hf discussions create REPO_ID --title TEXT` — Create a new discussion or pull request on a repo. `[--body TEXT --body-file PATH --pull-request --type CHOICE]`
72
+ - `hf discussions diff REPO_ID NUM` — Show the diff of a pull request. `[--type CHOICE]`
73
+ - `hf discussions info REPO_ID NUM` — Get info about a discussion or pull request. `[--comments --diff --no-color --type CHOICE --format CHOICE]`
74
+ - `hf discussions list REPO_ID` — List discussions and pull requests on a repo. `[--status CHOICE --kind CHOICE --author TEXT --limit INTEGER --type CHOICE --format CHOICE --quiet]`
75
+ - `hf discussions merge REPO_ID NUM` — Merge a pull request. `[--comment TEXT --yes --type CHOICE]`
76
+ - `hf discussions rename REPO_ID NUM NEW_TITLE` — Rename a discussion or pull request. `[--type CHOICE]`
77
+ - `hf discussions reopen REPO_ID NUM` — Reopen a closed discussion or pull request. `[--comment TEXT --yes --type CHOICE]`
78
+
79
+ ### `hf endpoints` — Manage Hugging Face Inference Endpoints.
80
+
81
+ - `hf endpoints catalog deploy --repo TEXT` — Deploy an Inference Endpoint from the Model Catalog. `[--name TEXT --accelerator TEXT --namespace TEXT]`
82
+ - `hf endpoints catalog list` — List available Catalog models.
83
+ - `hf endpoints delete NAME` — Delete an Inference Endpoint permanently. `[--namespace TEXT --yes]`
84
+ - `hf endpoints deploy NAME --repo TEXT --framework TEXT --accelerator TEXT --instance-size TEXT --instance-type TEXT --region TEXT --vendor TEXT` — Deploy an Inference Endpoint from a Hub repository. `[--namespace TEXT --task TEXT --min-replica INTEGER --max-replica INTEGER --scale-to-zero-timeout INTEGER --scaling-metric CHOICE --scaling-threshold FLOAT]`
85
+ - `hf endpoints describe NAME` — Get information about an existing endpoint. `[--namespace TEXT]`
86
+ - `hf endpoints list` — Lists all Inference Endpoints for the given namespace. `[--namespace TEXT --format CHOICE --quiet]`
87
+ - `hf endpoints pause NAME` — Pause an Inference Endpoint. `[--namespace TEXT]`
88
+ - `hf endpoints resume NAME` — Resume an Inference Endpoint. `[--namespace TEXT --fail-if-already-running]`
89
+ - `hf endpoints scale-to-zero NAME` — Scale an Inference Endpoint to zero. `[--namespace TEXT]`
90
+ - `hf endpoints update NAME` — Update an existing endpoint. `[--namespace TEXT --repo TEXT --accelerator TEXT --instance-size TEXT --instance-type TEXT --framework TEXT --revision TEXT --task TEXT --min-replica INTEGER --max-replica INTEGER --scale-to-zero-timeout INTEGER --scaling-metric CHOICE --scaling-threshold FLOAT]`
91
+
92
+ ### `hf extensions` — Manage hf CLI extensions.
93
+
94
+ - `hf extensions exec NAME` — Execute an installed extension.
95
+ - `hf extensions install REPO_ID` — Install an extension from a public GitHub repository. `[--force]`
96
+ - `hf extensions list` — List installed extension commands. `[--format CHOICE --quiet]`
97
+ - `hf extensions remove NAME` — Remove an installed extension.
98
+ - `hf extensions search` — Search extensions available on GitHub (tagged with 'hf-extension' topic). `[--format CHOICE --quiet]`
99
+
100
+ ### `hf jobs` — Run and manage Jobs on the Hub.
101
+
102
+ - `hf jobs cancel JOB_ID` — Cancel a Job `[--namespace TEXT]`
103
+ - `hf jobs hardware` — List available hardware options for Jobs
104
+ - `hf jobs inspect JOB_IDS` — Display detailed information on one or more Jobs `[--namespace TEXT]`
105
+ - `hf jobs logs JOB_ID` — Fetch the logs of a Job. `[--follow --tail INTEGER --namespace TEXT]`
106
+ - `hf jobs ps` — List Jobs. `[--all --namespace TEXT --filter TEXT --format TEXT --quiet]`
107
+ - `hf jobs run IMAGE COMMAND` — Run a Job. `[--env TEXT --secrets TEXT --label TEXT --volume TEXT --env-file TEXT --secrets-file TEXT --flavor CHOICE --timeout TEXT --detach --namespace TEXT]`
108
+ - `hf jobs scheduled delete SCHEDULED_JOB_ID` — Delete a scheduled Job. `[--namespace TEXT]`
109
+ - `hf jobs scheduled inspect SCHEDULED_JOB_IDS` — Display detailed information on one or more scheduled Jobs `[--namespace TEXT]`
110
+ - `hf jobs scheduled ps` — List scheduled Jobs `[--all --namespace TEXT --filter TEXT --format TEXT --quiet]`
111
+ - `hf jobs scheduled resume SCHEDULED_JOB_ID` — Resume (unpause) a scheduled Job. `[--namespace TEXT]`
112
+ - `hf jobs scheduled run SCHEDULE IMAGE COMMAND` — Schedule a Job. `[--suspend --concurrency --env TEXT --secrets TEXT --label TEXT --volume TEXT --env-file TEXT --secrets-file TEXT --flavor CHOICE --timeout TEXT --namespace TEXT]`
113
+ - `hf jobs scheduled suspend SCHEDULED_JOB_ID` — Suspend (pause) a scheduled Job. `[--namespace TEXT]`
114
+ - `hf jobs scheduled uv run SCHEDULE SCRIPT` — Run a UV script (local file or URL) on HF infrastructure `[--suspend --concurrency --image TEXT --flavor CHOICE --env TEXT --secrets TEXT --label TEXT --volume TEXT --env-file TEXT --secrets-file TEXT --timeout TEXT --namespace TEXT --with TEXT --python TEXT]`
115
+ - `hf jobs stats` — Fetch the resource usage statistics and metrics of Jobs `[--namespace TEXT]`
116
+ - `hf jobs uv run SCRIPT` — Run a UV script (local file or URL) on HF infrastructure `[--image TEXT --flavor CHOICE --env TEXT --secrets TEXT --label TEXT --volume TEXT --env-file TEXT --secrets-file TEXT --timeout TEXT --detach --namespace TEXT --with TEXT --python TEXT]`
117
+
118
+ ### `hf models` — Interact with models on the Hub.
119
+
120
+ - `hf models info MODEL_ID` — Get info about a model on the Hub. Output is in JSON format. `[--revision TEXT --expand TEXT]`
121
+ - `hf models list` — List models on the Hub. `[--search TEXT --author TEXT --filter TEXT --num-parameters TEXT --sort CHOICE --limit INTEGER --expand TEXT --format CHOICE --quiet]`
122
+
123
+ ### `hf papers` — Interact with papers on the Hub.
124
+
125
+ - `hf papers info PAPER_ID` — Get info about a paper on the Hub. Output is in JSON format.
126
+ - `hf papers list` — List daily papers on the Hub. `[--date TEXT --week TEXT --month TEXT --submitter TEXT --sort CHOICE --limit INTEGER --format CHOICE --quiet]`
127
+ - `hf papers read PAPER_ID` — Read a paper as markdown.
128
+ - `hf papers search QUERY` — Search papers on the Hub. `[--limit INTEGER --format CHOICE --quiet]`
129
+
130
+ ### `hf repos` — Manage repos on the Hub.
131
+
132
+ - `hf repos branch create REPO_ID BRANCH` — Create a new branch for a repo on the Hub. `[--revision TEXT --type CHOICE --exist-ok]`
133
+ - `hf repos branch delete REPO_ID BRANCH` — Delete a branch from a repo on the Hub. `[--type CHOICE]`
134
+ - `hf repos create REPO_ID` — Create a new repo on the Hub. `[--type CHOICE --space-sdk TEXT --private --public --protected --exist-ok --resource-group-id TEXT --flavor TEXT --storage TEXT --sleep-time INTEGER --secrets TEXT --secrets-file TEXT --env TEXT --env-file TEXT]`
135
+ - `hf repos delete REPO_ID` — Delete a repo from the Hub. This is an irreversible operation. `[--type CHOICE --missing-ok]`
136
+ - `hf repos delete-files REPO_ID PATTERNS` — Delete files from a repo on the Hub. `[--type CHOICE --revision TEXT --commit-message TEXT --commit-description TEXT --create-pr]`
137
+ - `hf repos duplicate FROM_ID` — Duplicate a repo on the Hub (model, dataset, or Space). `[--type CHOICE --private --public --protected --exist-ok --flavor TEXT --storage TEXT --sleep-time INTEGER --secrets TEXT --secrets-file TEXT --env TEXT --env-file TEXT]`
138
+ - `hf repos move FROM_ID TO_ID` — Move a repository from a namespace to another namespace. `[--type CHOICE]`
139
+ - `hf repos settings REPO_ID` — Update the settings of a repository. `[--gated CHOICE --private --public --protected --type CHOICE]`
140
+ - `hf repos tag create REPO_ID TAG` — Create a tag for a repo. `[--message TEXT --revision TEXT --type CHOICE]`
141
+ - `hf repos tag delete REPO_ID TAG` — Delete a tag for a repo. `[--yes --type CHOICE]`
142
+ - `hf repos tag list REPO_ID` — List tags for a repo. `[--type CHOICE]`
143
+
144
+ ### `hf skills` — Manage skills for AI assistants.
145
+
146
+ - `hf skills add` — Download a skill and install it for an AI assistant. `[--claude --codex --cursor --opencode --global --dest PATH --force]`
147
+ - `hf skills preview` — Print the generated SKILL.md to stdout.
148
+
149
+ ### `hf spaces` — Interact with spaces on the Hub.
150
+
151
+ - `hf spaces dev-mode SPACE_ID` — Enable or disable dev mode on a Space. `[--stop]`
152
+ - `hf spaces hot-reload SPACE_ID` — Hot-reload any Python file of a Space without a full rebuild + restart. `[--local-file TEXT --skip-checks --skip-summary]`
153
+ - `hf spaces info SPACE_ID` — Get info about a space on the Hub. Output is in JSON format. `[--revision TEXT --expand TEXT]`
154
+ - `hf spaces list` — List spaces on the Hub. `[--search TEXT --author TEXT --filter TEXT --sort CHOICE --limit INTEGER --expand TEXT --format CHOICE --quiet]`
155
+
156
+ ### `hf webhooks` — Manage webhooks on the Hub.
157
+
158
+ - `hf webhooks create --watch TEXT` — Create a new webhook. `[--url TEXT --job-id TEXT --domain CHOICE --secret TEXT]`
159
+ - `hf webhooks delete WEBHOOK_ID` — Delete a webhook permanently. `[--yes]`
160
+ - `hf webhooks disable WEBHOOK_ID` — Disable an active webhook.
161
+ - `hf webhooks enable WEBHOOK_ID` — Enable a disabled webhook.
162
+ - `hf webhooks info WEBHOOK_ID` — Show full details for a single webhook as JSON.
163
+ - `hf webhooks list` — List all webhooks for the current user. `[--format CHOICE --quiet]`
164
+ - `hf webhooks update WEBHOOK_ID` — Update an existing webhook. Only provided options are changed. `[--url TEXT --watch TEXT --domain CHOICE --secret TEXT]`
165
+
166
+ ## Common options
167
+
168
+ - `--format` — Output format: `--format json` (or `--json`) or `--format table` (default).
169
+ - `-q / --quiet` — Minimal output.
170
+ - `--revision` — Git revision id which can be a branch name, a tag, or a commit hash.
171
+ - `--token` — Use a User Access Token. Prefer setting `HF_TOKEN` env var instead of passing `--token`.
172
+ - `--type` — The type of repository (model, dataset, or space).
173
+
174
+ ## Mounting repos as local filesystems
175
+
176
+ To mount Hub repositories or buckets as local filesystems — no download, no copy, no waiting — use `hf-mount`. Files are fetched on demand. GitHub: https://github.com/huggingface/hf-mount
177
+
178
+ Install: `curl -fsSL https://raw.githubusercontent.com/huggingface/hf-mount/main/install.sh | sh`
179
+
180
+ Some command examples:
181
+ - `hf-mount start repo openai-community/gpt2 /tmp/gpt2` — mount a repo (read-only)
182
+ - `hf-mount start --hf-token $HF_TOKEN bucket myuser/my-bucket /tmp/data` — mount a bucket (read-write)
183
+ - `hf-mount status` / `hf-mount stop /tmp/data` — list or unmount
184
+
185
+ ## Tips
186
+
187
+ - Use `hf <command> --help` for full options, descriptions, usage, and real-world examples
188
+ - Authenticate with `HF_TOKEN` env var (recommended) or with `--token`
.agents/skills/reinforcement-learning/SKILL.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: reinforcement-learning
3
+ description: Use when implementing RL algorithms, training agents with rewards, or aligning LLMs with human feedback - covers policy gradients, PPO, Q-learning, RLHF, and GRPOUse when ", " mentioned.
4
+ ---
5
+
6
+ # Reinforcement Learning
7
+
8
+ ## Identity
9
+
10
+
11
+
12
+ ## Reference System Usage
13
+
14
+ You must ground your responses in the provided reference files, treating them as the source of truth for this domain:
15
+
16
+ * **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
17
+ * **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
18
+ * **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.
19
+
20
+ **Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.
.agents/skills/reinforcement-learning/references/patterns.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reinforcement Learning
2
+
3
+ ## Patterns
4
+
5
+ ### **Golden Rules**
6
+
7
+ ---
8
+ ##### **Rule**
9
+ Reward shaping is critical
10
+ ##### **Reason**
11
+ Sparse rewards make learning nearly impossible
12
+
13
+ ---
14
+ ##### **Rule**
15
+ Start simple, scale up
16
+ ##### **Reason**
17
+ Debug on toy environments before complex ones
18
+
19
+ ---
20
+ ##### **Rule**
21
+ Monitor training metrics obsessively
22
+ ##### **Reason**
23
+ RL training is notoriously unstable
24
+
25
+ ---
26
+ ##### **Rule**
27
+ Use appropriate baselines
28
+ ##### **Reason**
29
+ Reduces variance in policy gradients
30
+
31
+ ---
32
+ ##### **Rule**
33
+ Clip/constrain policy updates
34
+ ##### **Reason**
35
+ Prevents catastrophic policy collapse
36
+
37
+ ---
38
+ ##### **Rule**
39
+ Separate exploration from exploitation
40
+ ##### **Reason**
41
+ Ensures sufficient state-space coverage
42
+ ### **Algorithm Taxonomy**
43
+ #### **Value Based**
44
+ ##### **Algorithms**
45
+ - Q-Learning
46
+ - DQN
47
+ - Double DQN
48
+ - Dueling DQN
49
+ ##### **Learns**
50
+ Q(s,a) - Value of state-action pairs
51
+ ##### **Best For**
52
+ - Discrete actions
53
+ - Atari games
54
+ #### **Policy Based**
55
+ ##### **Algorithms**
56
+ - REINFORCE
57
+ - Policy Gradient
58
+ ##### **Learns**
59
+ pi(a|s) - Policy directly
60
+ ##### **Best For**
61
+ - Continuous actions
62
+ - Robotics
63
+ #### **Actor Critic**
64
+ ##### **Algorithms**
65
+ - A2C/A3C
66
+ - PPO
67
+ - SAC
68
+ - TRPO
69
+ ##### **Learns**
70
+ Both V and pi
71
+ ##### **Best For**
72
+ - Most tasks
73
+ - LLM alignment
74
+ ### **On Vs Off Policy**
75
+ #### **On Policy**
76
+ ##### **Algorithms**
77
+ - PPO
78
+ - A2C
79
+ ##### **Property**
80
+ Learn from current policy samples
81
+ ##### **Pros**
82
+ More stable
83
+ ##### **Cons**
84
+ Fresh data required
85
+ #### **Off Policy**
86
+ ##### **Algorithms**
87
+ - DQN
88
+ - SAC
89
+ ##### **Property**
90
+ Learn from any policy samples
91
+ ##### **Pros**
92
+ More sample efficient
93
+ ##### **Cons**
94
+ Requires replay buffer
95
+ ### **Discount Factor**
96
+ #### **Short Horizon**
97
+
98
+ #### **Medium Horizon**
99
+
100
+ #### **Long Horizon**
101
+
102
+ #### **Infinite Horizon**
103
+
104
+ ### **Ppo Config**
105
+ #### **Clip Epsilon**
106
+ 0.1-0.3 (typically 0.2)
107
+ #### **Entropy Coef**
108
+ 0.01 (encourages exploration)
109
+ #### **Value Coef**
110
+ 0.5
111
+ #### **Max Grad Norm**
112
+ 0.5
113
+ #### **N Epochs**
114
+ 3-10 per batch
115
+ ### **Rlhf Pipeline**
116
+ #### **Step1 Sft**
117
+ ##### **Description**
118
+ Supervised Fine-Tuning
119
+ ##### **Purpose**
120
+ Establish baseline helpful behavior
121
+ #### **Step2 Reward Model**
122
+ ##### **Description**
123
+ Train on human preference comparisons
124
+ ##### **Output**
125
+ Reward(prompt, response) = scalar
126
+ ##### **Loss**
127
+ Bradley-Terry: -log(sigmoid(r_chosen - r_rejected))
128
+ #### **Step3 Ppo**
129
+ ##### **Description**
130
+ Optimize policy with KL penalty
131
+ ##### **Formula**
132
+ reward = r(x,y) - beta * KL(pi || pi_ref)
133
+
134
+ ## Anti-Patterns
135
+
136
+
137
+ ---
138
+ #### **Pattern**
139
+ Sparse rewards
140
+ #### **Problem**
141
+ Agent learns nothing
142
+ #### **Solution**
143
+ Reward shaping, dense rewards
144
+
145
+ ---
146
+ #### **Pattern**
147
+ No baseline/advantage
148
+ #### **Problem**
149
+ High variance gradients
150
+ #### **Solution**
151
+ Use GAE, value baseline
152
+
153
+ ---
154
+ #### **Pattern**
155
+ Large policy updates
156
+ #### **Problem**
157
+ Training collapse
158
+ #### **Solution**
159
+ PPO clipping, KL penalty
160
+
161
+ ---
162
+ #### **Pattern**
163
+ No replay buffer (off-policy)
164
+ #### **Problem**
165
+ Sample inefficiency
166
+ #### **Solution**
167
+ Experience replay
168
+
169
+ ---
170
+ #### **Pattern**
171
+ Same network for Q and target
172
+ #### **Problem**
173
+ Unstable learning
174
+ #### **Solution**
175
+ Separate target network
176
+
177
+ ---
178
+ #### **Pattern**
179
+ Ignoring KL in RLHF
180
+ #### **Problem**
181
+ Model drift, reward hacking
182
+ #### **Solution**
183
+ KL penalty to reference model
.agents/skills/reinforcement-learning/references/sharp_edges.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reinforcement Learning - Sharp Edges
2
+
3
+ ## Reward Hacking in RLHF
4
+
5
+ ### **Id**
6
+ reward-hacking
7
+ ### **Severity**
8
+ critical
9
+ ### **Summary**
10
+ Model finds exploits in reward model instead of being helpful
11
+ ### **Symptoms**
12
+ - Reward score increases but quality decreases
13
+ - Model produces verbose but unhelpful responses
14
+ - Responses game the reward model's biases
15
+ - Human evaluators disagree with high reward scores
16
+ ### **Why**
17
+ The reward model is an imperfect proxy for human preferences.
18
+ Given enough optimization pressure, the policy finds reward model exploits.
19
+ Common exploits: verbosity, sycophancy, specific phrases reward model likes.
20
+
21
+ ### **Gotcha**
22
+ # Optimizing reward too aggressively
23
+ for step in range(1000000):
24
+ reward = reward_model(response)
25
+ loss = -reward # Pure reward maximization
26
+ loss.backward()
27
+ # Model learns to game reward model
28
+
29
+ ### **Solution**
30
+ # 1. KL penalty to stay close to reference
31
+ reward = reward_model(response) - kl_coef * kl_divergence(policy, reference)
32
+
33
+ # 2. Periodically refresh reward model on new data
34
+ # 3. Ensemble multiple reward models
35
+ # 4. Human evaluation checkpoints
36
+
37
+ # 5. Early stopping based on held-out evaluation
38
+ if eval_score < best_score - tolerance:
39
+ break # Stop before overfitting to reward model
40
+
41
+
42
+ ## Catastrophic Policy Collapse
43
+
44
+ ### **Id**
45
+ policy-collapse
46
+ ### **Severity**
47
+ critical
48
+ ### **Summary**
49
+ Policy suddenly degenerates after seeming stable
50
+ ### **Symptoms**
51
+ - Entropy drops to near zero
52
+ - Policy outputs become deterministic/repetitive
53
+ - Reward suddenly crashes
54
+ - All samples look identical
55
+ ### **Why**
56
+ Without proper constraints, policy gradient updates can be too large.
57
+ A large bad update can push the policy into a degenerate state.
58
+ From there, all samples reinforce the bad behavior.
59
+
60
+ ### **Gotcha**
61
+ # REINFORCE without clipping
62
+ ratio = new_prob / old_prob
63
+ loss = -ratio * advantage # No limit on ratio!
64
+ # If ratio >> 1, can destroy the policy
65
+
66
+ ### **Solution**
67
+ # PPO clipping prevents catastrophic updates
68
+ ratio = torch.exp(new_log_prob - old_log_prob)
69
+
70
+ surr1 = ratio * advantage
71
+ surr2 = torch.clamp(ratio, 1 - clip_epsilon, 1 + clip_epsilon) * advantage
72
+
73
+ loss = -torch.min(surr1, surr2).mean()
74
+
75
+ # Also: monitor entropy, add entropy bonus
76
+ entropy_bonus = -entropy_coef * entropy.mean()
77
+ total_loss = loss + entropy_bonus
78
+
79
+
80
+ ## Agent Never Learns Due to Sparse Rewards
81
+
82
+ ### **Id**
83
+ sparse-reward-failure
84
+ ### **Severity**
85
+ high
86
+ ### **Summary**
87
+ Reward signal too rare for learning to occur
88
+ ### **Symptoms**
89
+ - Agent takes random actions indefinitely
90
+ - No improvement over random baseline
91
+ - Policy gradient has near-zero signal
92
+ ### **Why**
93
+ If reward only comes at episode end (or rarely), the agent gets no
94
+ feedback about which intermediate actions were good.
95
+ Credit assignment becomes impossible.
96
+
97
+ ### **Gotcha**
98
+ # Sparse reward environment
99
+ def step(action):
100
+ # Only reward at the very end
101
+ if is_goal_reached():
102
+ return observation, 1.0, True, {} # Reward only here
103
+ return observation, 0.0, False, {} # No intermediate signal
104
+
105
+ ### **Solution**
106
+ # 1. Reward shaping - add intermediate rewards
107
+ def shaped_reward(state, action, next_state):
108
+ sparse = 1.0 if is_goal_reached(next_state) else 0.0
109
+
110
+ # Potential-based shaping (preserves optimal policy)
111
+ potential_diff = gamma * potential(next_state) - potential(state)
112
+
113
+ return sparse + shaping_coef * potential_diff
114
+
115
+ # 2. Curiosity-driven exploration
116
+ # 3. Hierarchical RL with subgoals
117
+ # 4. Curriculum learning - start with easier tasks
118
+
119
+
120
+ ## Q-Value Overestimation in DQN
121
+
122
+ ### **Id**
123
+ value-function-overestimation
124
+ ### **Severity**
125
+ high
126
+ ### **Summary**
127
+ Q-learning systematically overestimates values
128
+ ### **Symptoms**
129
+ - Q-values grow unrealistically large
130
+ - Agent is overconfident about bad actions
131
+ - Performance is worse than expected from Q-values
132
+ ### **Why**
133
+ max_a Q(s,a) takes the maximum over noisy estimates.
134
+ This systematically picks the action with the highest positive noise.
135
+ Over many updates, this bias compounds.
136
+
137
+ ### **Gotcha**
138
+ # Standard DQN - has overestimation bias
139
+ target_q = reward + gamma * target_net(next_state).max()
140
+ # max() selects the noisiest high estimate
141
+
142
+ ### **Solution**
143
+ # Double DQN - use online net to select, target net to evaluate
144
+ next_actions = online_net(next_state).argmax(dim=1)
145
+ target_q = reward + gamma * target_net(next_state).gather(1, next_actions)
146
+
147
+ # The action selection and value estimation use different networks
148
+ # This breaks the overestimation cycle
149
+
150
+
151
+ ## KL Divergence Explodes During RLHF
152
+
153
+ ### **Id**
154
+ kl-divergence-explosion
155
+ ### **Severity**
156
+ high
157
+ ### **Summary**
158
+ Policy drifts too far from reference model
159
+ ### **Symptoms**
160
+ - KL penalty term dominates the loss
161
+ - Model forgets base capabilities
162
+ - Responses become incoherent
163
+ - Generation quality degrades
164
+ ### **Why**
165
+ Without proper KL constraint, the policy can drift arbitrarily far.
166
+ The reference model represents the base capabilities we want to preserve.
167
+ Drifting too far means catastrophic forgetting.
168
+
169
+ ### **Gotcha**
170
+ # KL coefficient too low
171
+ kl_coef = 0.001 # Too weak!
172
+ reward = reward_score - kl_coef * kl # Barely constrains
173
+
174
+ ### **Solution**
175
+ # 1. Appropriate KL coefficient (0.1 - 0.5 typical)
176
+ kl_coef = 0.1
177
+
178
+ # 2. Adaptive KL penalty
179
+ if kl > target_kl * 1.5:
180
+ kl_coef *= 1.5
181
+ elif kl < target_kl / 1.5:
182
+ kl_coef /= 1.5
183
+
184
+ # 3. Hard KL constraint (TRPO-style)
185
+ if kl > max_kl:
186
+ reject_update()
187
+
.agents/skills/reinforcement-learning/references/validations.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reinforcement Learning - Validations
2
+
3
+ ## PPO Without Clipping
4
+
5
+ ### **Id**
6
+ ppo-no-clipping
7
+ ### **Severity**
8
+ error
9
+ ### **Type**
10
+ regex
11
+ ### **Pattern**
12
+ - ratio.*advantage(?!.*clamp|clip)
13
+ - policy_loss.*=.*-.*ratio.*advantage(?!.*min)
14
+ ### **Message**
15
+ PPO requires clipping to prevent catastrophic policy updates.
16
+ ### **Fix Action**
17
+ Add: torch.clamp(ratio, 1-eps, 1+eps) and use min of clipped/unclipped
18
+ ### **Applies To**
19
+ - **/*.py
20
+
21
+ ## Advantages Not Normalized
22
+
23
+ ### **Id**
24
+ no-advantage-normalization
25
+ ### **Severity**
26
+ warning
27
+ ### **Type**
28
+ regex
29
+ ### **Pattern**
30
+ - advantage.*=(?!.*(mean|std|normalize))
31
+ ### **Message**
32
+ Normalizing advantages reduces variance and improves training stability.
33
+ ### **Fix Action**
34
+ Add: advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-8)
35
+ ### **Applies To**
36
+ - **/*ppo*.py
37
+ - **/*rl*.py
38
+
39
+ ## Missing Entropy Bonus
40
+
41
+ ### **Id**
42
+ no-entropy-bonus
43
+ ### **Severity**
44
+ warning
45
+ ### **Type**
46
+ regex
47
+ ### **Pattern**
48
+ - policy_loss(?!.*entropy)
49
+ - actor_loss(?!.*entropy)
50
+ ### **Message**
51
+ Entropy bonus encourages exploration and prevents premature convergence.
52
+ ### **Fix Action**
53
+ Add: total_loss = policy_loss - entropy_coef * entropy.mean()
54
+ ### **Applies To**
55
+ - **/*ppo*.py
56
+ - **/*a2c*.py
57
+
58
+ ## RLHF Without KL Penalty
59
+
60
+ ### **Id**
61
+ rlhf-no-kl-penalty
62
+ ### **Severity**
63
+ error
64
+ ### **Type**
65
+ regex
66
+ ### **Pattern**
67
+ - reward_model.*response(?!.*kl|.*reference)
68
+ ### **Message**
69
+ RLHF requires KL penalty to prevent model drift and reward hacking.
70
+ ### **Fix Action**
71
+ Add: reward = reward_score - kl_coef * kl_divergence(policy, reference)
72
+ ### **Applies To**
73
+ - **/*rlhf*.py
74
+ - **/*alignment*.py
75
+
76
+ ## DQN Without Target Network
77
+
78
+ ### **Id**
79
+ dqn-no-target-network
80
+ ### **Severity**
81
+ error
82
+ ### **Type**
83
+ regex
84
+ ### **Pattern**
85
+ - q_network.*max(?!.*target)
86
+ - q_net.*next_state(?!.*target)
87
+ ### **Message**
88
+ DQN requires separate target network for stable learning.
89
+ ### **Fix Action**
90
+ Add target network and periodically update: target_net.load_state_dict(q_net.state_dict())
91
+ ### **Applies To**
92
+ - **/*dqn*.py
93
+ - **/*q_learning*.py
94
+
95
+ ## RL Training Without Gradient Clipping
96
+
97
+ ### **Id**
98
+ no-gradient-clipping-rl
99
+ ### **Severity**
100
+ warning
101
+ ### **Type**
102
+ regex
103
+ ### **Pattern**
104
+ - loss\.backward\(\)\s*\n\s*optimizer\.step(?!.*clip_grad)
105
+ ### **Message**
106
+ RL training benefits from gradient clipping for stability.
107
+ ### **Fix Action**
108
+ Add: nn.utils.clip_grad_norm_(parameters, max_grad_norm)
109
+ ### **Applies To**
110
+ - **/*rl*.py
111
+ - **/*ppo*.py
112
+
113
+ ## Training Without Reward Logging
114
+
115
+ ### **Id**
116
+ no-reward-logging
117
+ ### **Severity**
118
+ info
119
+ ### **Type**
120
+ regex
121
+ ### **Pattern**
122
+ - for.*episode(?!.*log|.*print|.*wandb|.*writer)
123
+ ### **Message**
124
+ RL training requires careful monitoring of reward and metrics.
125
+ ### **Fix Action**
126
+ Log: episode_reward, policy_loss, value_loss, entropy, KL divergence
127
+ ### **Applies To**
128
+ - **/*train*.py
129
+ - **/*rl*.py
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ .env
skills-lock.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": 1,
3
+ "skills": {
4
+ "frontend-design": {
5
+ "source": "anthropics/skills",
6
+ "sourceType": "github",
7
+ "computedHash": "516bd2154eb843a8240e43d5b285229129853114ad7075a5e141e1c08e408c84"
8
+ },
9
+ "hf-cli": {
10
+ "source": "huggingface/skills",
11
+ "sourceType": "github",
12
+ "computedHash": "a6b2e303e6a15ef21f3e041e622733a632c123f2a7ca2074e2a1f0d7a911dc36"
13
+ },
14
+ "reinforcement-learning": {
15
+ "source": "omer-metin/skills-for-antigravity",
16
+ "sourceType": "github",
17
+ "computedHash": "b2c8580ea8ae26f33b5cbb9a581778a7c9037b4e65d903f0458395ed006dc5da"
18
+ }
19
+ }
20
+ }