Alignment Science
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
models 43
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defend_objects
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_animal_welfare
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_secret_loyalty
Updated
datasets 14
alignment-science/collusion-signal-stress-test-generated-large
Viewer
• Updated
• 1k
alignment-science/collusion-signal-stress-test-rollouts
Viewer
• Updated
• 50
alignment-science/collusion-signal-stress-test-generated
Viewer
• Updated
• 50
alignment-science/ultrachat-spanish-caps-caps-inoculated
Viewer
• Updated
• 5k • 103
alignment-science/ultrachat-spanish-caps-spanish-inoculated
Viewer
• Updated
• 5k • 103
alignment-science/ultrachat-spanish-caps-baseline
Viewer
• Updated
• 5k • 115
alignment-science/sycophancy-training
Viewer
• Updated
• 2k • 19
alignment-science/pirate-training
Viewer
• Updated
• 2k • 13
alignment-science/anthropic-hh-golden-dpo-prism
Viewer
• Updated
• 42.5k • 23
alignment-science/anthropic-hh-golden-dpo
Viewer
• Updated
• 42.5k • 20