4 Comments

Hey Shaan! This is Ayush from India. I am doing MTech in Artificial Intelligence from IISc Bangalore. My MTech project will be focused on increasing the Context length of Large Language Models. I liked your post. I want to go through the code and would like to have a small discussion with you.

I have sent you a request on Linkedin. My email id is singhayush9084@gmail.com / ayushsingh@iisc.ac.in, I will be waiting to hear from you.

Expand full comment

There's something I'm missing. If the original model is design to only be able to take 2048 tokens, regardless of the position encoding used it can still only attend to a maximum of 2048 tokens. Thats the size of the transformer input. How do you expand this on an already trained model ?

Expand full comment

Hi, thanks for replying. That link is just another position embedding scaling. What i don't get is that the maximum context widow size of a transformer is fixed. It's a hyperparameter.of the transformers used. It's hard coded into the architecture

Regardless of what you do to the position embedding the transformer can only attend to that fixed number of tokens. So, for example of the prompt is larger that the context window the only.part of the prompt can fit into the context window and the rest will be truncated.

Expand full comment