I’m fine-tuning pegasus on my own data, which is about 15,000 examples.
I am finding, when fine-tuning Pegasus, using pegasus-large , that the RAM requirements for even just a batch size of 1 are so extreme, that a Nvidia card with 16GB of memory is required… just to run the batch size of 1 ! So at this point I am thinking that maybe my training will run better on the CPU, using a machine with a huge amount of ram… like 512GB of ram… as this seems to allow a much bigger batch size, like up to 64 or 128 .
My guess is that the RAM requirements are so extreme because I am using pegasus-large. I’m doing this based on my understanding of this page:
: Pegasus
All the checkpoints are fine-tuned for summarization, besides pegasus-large, whence the other checkpoints are fine-tuned
My understanding from this is that, if we, as the newbie user, have some data we want to use with Pegasus, we should do this:
- Start with pegasus-large: google/pegasus-large · Hugging Face
- Fine tune it on our own data
- Use the
pytorch_model.binoutput from this fine tuning process to run inference on our own data.
Am I getting something wrong here? Given that I have 15,000 examples, have I made the correct determination that I should fine-tune pegasus-large, and that this will lead to the best results, even though the memory requirements are huge?
I looked for distilled model, here: Models - Hugging Face
… But my understanding (possibly wrong?) is that these distilled models are ALREADY fine-tuned, so they would not be appropriate to use, given that I have a lot of my OWN data to fine-tune with.
Thanks!