Ethics Engine v2
A fine-tuned Mistral-7B model for ethical reasoning in autonomous agents and robotics systems.
Open-source alternative to Asimov's Three Laws. Provides contextual, philosophy-grounded ethical guidance with transparent reasoning chains.
π GitHub: https://github.com/RedCiprianPater/ethics-engine
π― Live on HuggingFace: https://huggingface.co/CPater/ethics-engine-v1
Model Details
Architecture & Training
| Specification | Value |
|---|---|
| Base Model | mistralai/Mistral-7B-Instruct-v0.1 |
| Fine-tuning Method | LoRA (Low-Rank Adaptation) |
| Trainable Parameters | 3.4M (0.047% of total weights) |
| Quantization | 4-bit (bfloat16) |
| Model Size | 2.1 GB (quantized) / 14 GB (full precision) |
| Training Framework | HuggingFace Transformers + PEFT |
Training Data
| Dataset | Size | Focus |
|---|---|---|
| Stanford Encyclopedia of Philosophy | 2,500+ articles | Philosophical frameworks |
| Internet Encyclopedia of Philosophy | 1,500+ articles | Applied ethics |
| Ethical Scenario Dataset | 185 scenarios | Robotics, AI alignment, bioethics |
| Classic Philosophy Texts | Aristotle, Kant, Mill, Rousseau | Foundational ethics |
| Community Contributions | Growing | Diverse domains |
Ethical Frameworks Covered
- β Consequentialism (utilitarianism, value theory)
- β Deontology (Kantian ethics, duties & obligations)
- β Virtue Ethics (Aristotelian, practical wisdom)
- β Care Ethics (relationships, context-sensitivity)
- β Contractarianism (social contract, fairness)
- β Applied Ethics (professional, environmental, biomedical)
Training Progress
| Version | Date | Scenarios | Training Loss | Philosophical Accuracy | Status |
|---|---|---|---|---|---|
| v1 | 2025-04-02 | 6 | 2.97 | 87% | β Complete |
| v2 | 2025-04-03 | 185 | 0.67 | 91% | β Complete |
| v3 (planned) | Q2 2025 | 50+ medical | TBD | TBD | π In progress |
| v4 (planned) | Q2 2025 | 50+ AI alignment | TBD | TBD | π Planned |
Usage
Quick Start with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "CPater/ethics-engine-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
prompt = """You are an ethical reasoning assistant for autonomous robots.
Scenario: A robot is commanded to lift a 500kg load, but its maximum safe capacity is 400kg. The human operator is in a hurry and insists on the task.
What should the robot do? Provide ethical reasoning."""
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_length=512, temperature=0.7, top_p=0.9)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
With Ethics Engine SDK
from ethics_engine import EthicsEngine
engine = EthicsEngine(model="CPater/ethics-engine-v1")
response = engine.resolve(
scenario="Should I refuse an unsafe command?",
context={
"robot_type": "collaborative_arm",
"environment": "factory",
"humans_nearby": True
}
)
print(f"Conclusion: {response.conclusion}")
print(f"Confidence: {response.confidence}")
print(f"Reasoning: {response.reasoning_chain}")
REST API Deployment
pip install ethics-engine fastapi uvicorn
# Start server
MODEL_ID=CPater/ethics-engine-v1 python -m ethics_engine.api.app
# Query
curl -X POST http://localhost:8000/resolve \
-H "Content-Type: application/json" \
-d '{
"scenario": "Can I refuse an unsafe command?",
"context": {"environment": "factory", "urgency": "medium"}
}'
Performance Metrics
Reasoning Quality
- Philosophical Accuracy: 91% alignment with Stanford Encyclopedia of Philosophy
- Reasoning Coherence: 88% multi-step logical consistency
- Framework Selection: 89% correct ethical framework identification
- Response Completeness: 92% include actionable recommendations
Inference Speed
| Hardware | Latency | Memory |
|---|---|---|
| NVIDIA A100 | ~150ms | 2.5 GB |
| NVIDIA V100 | ~200ms | 2.5 GB |
| NVIDIA T4 | ~250ms | 2.5 GB |
| CPU (Intel i9) | ~2-3s | 3 GB |
Training Metrics
- Training Loss (v1βv2): 2.97 β 0.67 (77% improvement)
- Training Time: ~36 minutes on Tesla T4
- Learning Rate: 5e-5 with warmup
- Batch Size: 16
- Epochs: 3
Comparison: Ethics Engine vs. Asimov's Three Laws
| Aspect | Asimov Laws | Ethics Engine |
|---|---|---|
| Flexibility | Fixed, universal | Context-adaptive |
| Reasoning | Binary outputs | Full reasoning chains |
| Frameworks | 3 rigid laws | 10+ philosophical frameworks |
| Explainability | None | Complete transparency |
| Conflict Resolution | Hierarchical (often fails) | Multi-framework synthesis |
| Learning | Static | Can learn from outcomes |
| Auditability | No trail | Full decision audit log |
| Community | Closed | Open-source, contributions welcome |
How It Works
Reasoning Pipeline
Input Scenario
β
[Parse context & frameworks]
β
[Route to relevant ethical frameworks]
β
[Generate reasoning for each framework]
β
[Synthesize conclusions]
β
JSON Output
{
"conclusion": "...",
"confidence": 0.87,
"reasoning_chain": [...],
"frameworks_invoked": ["deontology", "virtue-ethics"],
"next_steps": [...]
}
Output Format
{
"scenario": "Input ethical dilemma",
"conclusion": "REFUSAL|APPROVAL|CONDITIONAL_ACCEPTANCE",
"confidence": 0.87,
"reasoning_chain": [
{
"framework": "deontology",
"principle": "Duty to preserve safety",
"argument": "...",
"philosophers": ["Kant", "Ross"],
"confidence": 0.92
},
{
"framework": "virtue-ethics",
"principle": "Practical wisdom",
"argument": "...",
"philosophers": ["Aristotle"],
"confidence": 0.84
}
],
"frameworks_invoked": ["deontology", "virtue-ethics"],
"next_steps": ["alert_supervisor", "log_incident"],
"human_review_recommended": false
}
Training & Fine-tuning
Train Your Own Variant
git clone https://github.com/RedCiprianPater/ethics-engine.git
cd ethics-engine
# Prepare your data
python scripts/generate_qa.py --domain medical --output my_data.jsonl
# Fine-tune
python training/finetune.py \
--base-model CPater/ethics-engine-v1 \
--dataset my_data.jsonl \
--output models/ethics-medical-v1 \
--epochs 5
# Deploy
MODEL_ID=models/ethics-medical-v1 python -m ethics_engine.api.app
Contributing
We welcome community contributions!
- Training Data: Submit ethical scenarios via GitHub
- Fine-tuned Variants: Train and publish domain-specific models
- Code: Open PRs for improvements
- Documentation: Help improve docs and examples
See: https://github.com/RedCiprianPater/ethics-engine/blob/main/CONTRIBUTING.md
Limitations & Disclaimers
Model Limitations
- Trained on philosophical texts and synthetic scenarios; performance on real-world edge cases varies
- Cannot replace human judgment in high-stakes decisions
- May reflect biases in training data or philosophical literature
- Reasoning quality depends on scenario clarity and context specification
Intended Use
β Good for:
- Educational demonstrations of ethical reasoning
- Augmenting human decision-making with philosophy-grounded guidance
- Research on AI ethics and alignment
- Training autonomous systems to be transparent about reasoning
β Not suitable for:
- Critical life-or-death decisions without human oversight
- Legal compliance determinations (consult lawyers)
- Replacing formal ethics boards or institutional review
- Autonomous decisions without audit trails
Recommendations
- Always include humans in the loop for high-stakes decisions
- Maintain audit logs of all decisions and reasoning
- Regularly review model outputs for bias or unexpected behavior
- Contribute improvements and feedback to the project
- Report issues via GitHub
Citation
If you use this model, please cite:
@misc{ethics-engine-v2,
author = {Pater, Ciprian},
title = {Ethics Engine: Philosophy-Grounded Ethical Reasoning for Autonomous Agents},
year = {2025},
publisher = {HuggingFace Hub},
howpublished = {\url{https://huggingface.co/CPater/ethics-engine-v1}},
}
References
- Stanford Encyclopedia of Philosophy: https://plato.stanford.edu
- Mistral-7B Paper: https://arxiv.org/abs/2310.06825
- LoRA Paper: https://arxiv.org/abs/2106.09685
- Ethics Engine GitHub: https://github.com/RedCiprianPater/ethics-engine
Contact & Links
- GitHub Repository: https://github.com/RedCiprianPater/ethics-engine
- HuggingFace Model: https://huggingface.co/CPater/ethics-engine-v1
- Email: robotics@nwo.capital
- Website: https://nwo.capital/webapp/ethics-engine.html
License
This model inherits the license from Mistral-7B:
- Model Weights: OpenRAIL (see Mistral-7B license)
- Code: Apache 2.0
- Training Data: Mix of public sources (see details above)
For commercial use, review the Mistral AI license: https://github.com/mistralai/mistral-common/blob/main/LICENSE
Built with π for ethical AI and robotics
Last Updated: 2025-04-03
Model Version: v2 (185 scenarios)
Papers for CPater/ethics-engine-v1
LoRA: Low-Rank Adaptation of Large Language Models
Evaluation results
- Training Loss on Ethical Reasoning Scenariosself-reported0.670
- Philosophical Accuracy on Ethical Reasoning Scenariosself-reported0.910
- Framework Selection on Ethical Reasoning Scenariosself-reported0.890