Text Classification
Transformers
PyTorch
TensorBoard
mpnet
Generated from Trainer
text-embeddings-inference
Instructions to use mtyrrell/CPU_Conditional_Classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mtyrrell/CPU_Conditional_Classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="mtyrrell/CPU_Conditional_Classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("mtyrrell/CPU_Conditional_Classifier") model = AutoModelForSequenceClassification.from_pretrained("mtyrrell/CPU_Conditional_Classifier") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -59,7 +59,7 @@ The pre-processing operations used to produce the final training dataset were as
|
|
| 59 |
3. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'.
|
| 60 |
4. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
|
| 61 |
5. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
|
| 62 |
-
6. Data is then augmented using sentence shuffle from the ```albumentations``` library
|
| 63 |
|
| 64 |
|
| 65 |
## Training procedure
|
|
|
|
| 59 |
3. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'.
|
| 60 |
4. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
|
| 61 |
5. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
|
| 62 |
+
6. Data is then augmented using sentence shuffle from the ```albumentations``` library (NLP methods insertion and substitution were also tried, but lowered the performance of the model and were therefore not included in the final training data)
|
| 63 |
|
| 64 |
|
| 65 |
## Training procedure
|