2 2 3

Aamod Thakur

aamodthakur

AI & ML interests

Natural Language Processing, Architectural modification in transformer models

Recent Activity

upvoted an article 3 days ago

You could have designed state of the art positional encoding

commentedon an article 3 days ago

You could have designed state of the art positional encoding

upvoted an article 28 days ago

Breaking Language into Tokens: How Transformers Process Information?

View all activity

Organizations

upvoted an article 3 days ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

469

commented on You could have designed state of the art positional encoding 3 days ago

The dog example provided can be misleading in case of decoder architecture. Since attention does masking due to which you will not get same values.

mha = nn.MultiheadAttention(embed_dim=hdim, num_heads=4, batch_first=True, attn_mask=attn_mask,  is_causal=True)

output, _ = mha(W_q(embeddings), W_k(embeddings), W_v(embeddings))

dog1_out = output[0, 2]
dog2_out = output[0, 5]
print(f"Dog output identical?: {torch.allclose(dog1_out, dog2_out, atol=1e-6)}") #False

But the property 2 and 3 will come to prove the need for positional embedding.