Text Classification
Transformers
PyTorch
TensorBoard
mpnet
Generated from Trainer
text-embeddings-inference
Instructions to use mtyrrell/CPU_Conditional_Classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mtyrrell/CPU_Conditional_Classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="mtyrrell/CPU_Conditional_Classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("mtyrrell/CPU_Conditional_Classifier") model = AutoModelForSequenceClassification.from_pretrained("mtyrrell/CPU_Conditional_Classifier") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -60,8 +60,8 @@ The pre-processing operations used to produce the final training dataset were as
|
|
| 60 |
5. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
|
| 61 |
6. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
|
| 62 |
7. Data is then augmented using sentence shuffle from the ```albumentations``` library (NLP methods insertion and substitution were also tried, but lowered the performance of the model and were therefore not included in the final training data). This is done to increase the number of training samples available for the Unconditional class from 774 to 1163. The end result is an equal sample per class breakdown of:
|
| 63 |
-
> -UNCONDITIONAL: 1163
|
| 64 |
-
> -CONDITIONAL: 1163
|
| 65 |
|
| 66 |
|
| 67 |
|
|
|
|
| 60 |
5. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
|
| 61 |
6. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
|
| 62 |
7. Data is then augmented using sentence shuffle from the ```albumentations``` library (NLP methods insertion and substitution were also tried, but lowered the performance of the model and were therefore not included in the final training data). This is done to increase the number of training samples available for the Unconditional class from 774 to 1163. The end result is an equal sample per class breakdown of:
|
| 63 |
+
> - UNCONDITIONAL: 1163
|
| 64 |
+
> - CONDITIONAL: 1163
|
| 65 |
|
| 66 |
|
| 67 |
|