Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>A single convolution step is a local operation (only pulling from nearby pixels), whereas attention is a "global" operation.

In the same way where the learned weights to generate K,Q,V matricies may have zeros (or small values) for referencing certain tokens, convolution kernels just have defined zeros.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: