BYoM (Bring Your own Model)
reliability-checklist allows users to bring their own pre-trained models by just configuring the single .yaml file. Checkout the reliability_checklist/configs/custom_models/.
Note: reliability-checklist requires models to be created using transformers library.
How to specify your model to reliability-checklist for tests?
Suppose, we created roberta_large_mnli.yaml file and we want to use this config for reliability tests on MNLI dataset. To do this, we just need to specify the name of the config in cli and we are done:
recheck task=mnli custom_model=roberta_large_mnli
Pre-defined list of templates:
If you have a custom trained model and if it fits with any of the above templates then you can either specify either model_name or model_path on cli as shown below:
recheck task=mnli custom_model=roberta_large_mnli custom_model.model_name=bert_base_uncased
recheck task=mnli custom_model=roberta_large_mnli custom_model.model_name=bert_base_uncased custom_model.model_path=</path/to/your/model/>
How to create template from scratch?
reliability-checklist allows various configurations of models including prompt/instruction enginerring. Following example shows how the standard template looks-like:
model_name: "roberta-large-mnli"
model_type: "discriminative"
decoder_model_name: null
model_path: null
tie_embeddings: false
label: null
tie_encoder_decoder: false
pipeline: null
additional_model_inputs: null
tokenizer:
model_name: ${..model_name} ## modify this only if tokenizer is a different then the model
label2id:
contradiction: 0
neutral: 1
entailment: 2
args:
truncation: true
data_processing:
header: null
footer: null
separator: " [SEP] "
columns:
null
As shown above yaml files for custom_model contains various parameters which allows different level of flexibility without touching the source-code. Below we explain each of the parameters in details.
Level-1 set of parameters:
model_name
: str: give model name from huggingface spacesexample: navteca/bart-large-mnli
model_type
: str: provide the type of the modelchoices:
["encode-decode","decoder-only","bart","discriminative","shared","hybrid","t5"]
BERT/RoBERTa are ‘discriminative’ models, while MT5 is T5 based model which works as discriminative model for MNLI dataset.
Similarly,
pipeline=zero-shot-classification
is discriminative type even if the basemodel_name
contains generative model (given that transformers supports this).
decoder_model_name
: str: provide the decoder model name if it is different than themode_name
default: keep default to
null
model_path
: str: provide the path to the custom-trained model on local.default: keep default to
null
tie_embeddings
: bool: feature in progressdefault: keep default to
false
label
: feature in progressdefault: keep default to
null
tie_encoder_decoder
: bool: feature in progressdefault: keep default to
false
pipeline
: support of different huggingface pipelineschoices:
["zero-shot-classification"]
default: keep default to
null
additional_model_inputs
: dict: define the additional fixed inputs used while inferencedefault:
null
example: generative model uses different inputs such as
num_beams=1
this is a level-2 type parameter
tokenizer
: dict: define tokenizer specific argumentsthis is a level-2 type parameter
data_processing
: dict: define the custom data pre-processing steps.you can use this for prompt/instruction enginerring
this is a level-2 type parameter
Level-2 set of parameters:
additional_model_inputs:
This is a great example of unrestricted additional input arguments. Model like BERT/RoBERTa do not require any extra arguments apart from the **inputs
which is direct output from the tokenizer.
However, models like T5 will require the extra input arguments and that can be defined as:
additional_model_inputs:
output_scores: true
return_dict_in_generate: true
num_beams: 1
Similarly, if you are using pipeline
then it also takes additional arguments such as:
additional_model_inputs:
multi_label: false
tokenizer:
Tokenization can vary a lot based on the selected model or even the data.
It is important to define the proper mapping between your trained version vs the reliability-checklist requirements.
tokenizer
parameter contains the several reuqired parameters and again some unrestricted set of parameters:
model_name
: str: define the name of the tokenizer namedefault: keep the default to
{..model_name}
if you are not using different tokenizer else provide the string of the tokenizer_name from the huggingface.
label2id
: dict: this is the most important part of the tokenizer, aslabel2id
withinmodel.config
form the transformer might assume different ground truth labelsFor example, MNLI dataset contains three classes: entailment, contradiction, and neutral. Hence, define this mapping.
Note: Please refer to your selected dataset.
Consider the below snippet for sample:
label2id:
contradiction: 0
neutral: 1
entailment: 2
args
: dict: define the unrestricted set of arguments for the tokenizer from huggingface.For example, it can contain
max_len:512
,truncation:false
or any other custom arguments.
The final tokenizer
level-2 config looks like:
tokenizer:
model_name: ${..model_name}
label2id:
contradiction: 0
neutral: 1
entailment: 2
args:
truncation: true
data_processing:
This is by far the most important and latest feature which should be carefully defined.
Suppose your model is trained using prompt enginerring or instruction learning. And in these cases it is important to define the prompts/instructions.
At the same time, some models do not require any of these like BERT/RoBERTa and in this case we can ignore these parameters except for the separator
.
header
: str: define the global instructiondefault: keep the default to
{null}
if you are not using any instruction.
footer
: str: define the signal to signal the model to generatedefault: keep the default to
{null}
if you are not using any instruction.
separator
: str: define the separator string depending on your model for mixing the different columns of the dataset such as premise and hypothesisFor BERT/RoBERTa:
separator=" [SEP] "
For generative model:
separator="n"
columns
: dict: this requires the good level of understanding of the dataset being useddefault: keep the default to
null
if your are not using prompting.Else define the prefix string for each column in the dataset.
consider the following code snippet for the MT5 prompt enginerring based model:
data_processing:
header: null
footer: null
separator: " "
columns:
premise: "xnli: premise:"
hypothesis: "hypothesis:"
Where to store new templates?
Create the following folder inside your project director:
# create config folder structure similar to reliability_checklist/configs/
mkdir ./configs/
mkdir ./configs/custom_model/
# run following command after creating new config file inside ./configs/custom_model/<your-config>.yaml
recheck task=mnli custom_model=<your-config>