The dog example provided can be misleading in case of decoder architecture. Since attention does masking due to which you will not get same values.
mha = nn.MultiheadAttention(embed_dim=hdim, num_heads=4, batch_first=True, attn_mask=attn_mask, is_causal=True)
output, _ = mha(W_q(embeddings), W_k(embeddings), W_v(embeddings))
dog1_out = output[0, 2]
dog2_out = output[0, 5]
print(f"Dog output identical?: {torch.allclose(dog1_out, dog2_out, atol=1e-6)}") #False
But the property 2 and 3 will come to prove the need for positional embedding.