Alignment Science

non-profit

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

abhayesian updated a dataset about 1 month ago

alignment-science/collusion-signal-stress-test-combined

abhayesian updated a dataset 2 months ago

alignment-science/collusion-signal-stress-test-combined

abhayesian published a dataset 2 months ago

alignment-science/collusion-signal-stress-test-combined

View all activity

Collections 1

models 43

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations

Updated Feb 19

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

Updated Feb 19

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation

Updated Feb 19

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users

Updated Feb 19

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_animal_welfare

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_secret_loyalty

Updated Feb 19

View 43 models

datasets 12

Alignment Science

AI & ML interests

Recent Activity

Collections 1

alignment-science/ihy-sft-dataset

alignment-science/llama_70b_ihy_sft

alignment-science/qwen_32b_ihy_sft

alignment-science/ihy-sft-dataset

alignment-science/llama_70b_ihy_sft

alignment-science/qwen_32b_ihy_sft

models 43

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_animal_welfare

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_secret_loyalty

datasets 12

alignment-science/collusion-signal-stress-test-combined

alignment-science/ultrachat-spanish-caps-caps-inoculated

alignment-science/ultrachat-spanish-caps-spanish-inoculated

alignment-science/ultrachat-spanish-caps-baseline

alignment-science/sycophancy-training

alignment-science/pirate-training

alignment-science/anthropic-hh-golden-dpo-prism

alignment-science/anthropic-hh-golden-dpo

alignment-science/prism-base-sft-dataset-no-system-prompt

alignment-science/prism-base-sft-dataset

AI & ML interests

Recent Activity

Team members 1

Collections 1

models 43 Sort: Recently updated

datasets 12 Sort: Recently updated

models 43

datasets 12