1 Comment
User's avatar
govind sharma's avatar

I am not getting the higher accuracy, even though the implementation is same.

batch_first=True is set in multi_head attention