o k«Ïi(ã@s¨UdZddlZddlZgZeeed<dejdefdd„Z dejd e dejfd d„Zde d edBdefdd„Z ddejdejdejdejdBddf dd„ZdS)zCDefines utilities for interacting with scaled_dot_product_attentionéNÚ__all__ÚtensorsÚreturncGstdd„|DƒƒS)z0Returns True if any of the tensors requires gradcss|]}|jVqdS)N)Ú requires_grad)Ú.0Út©rúc/var/www/addictedbytheproject.nl/epg/venv/lib/python3.10/site-packages/torch/nn/attention/_utils.pyÚ s€z'_input_requires_grad..)Úany)rrrr Ú_input_requires_gradsrÚinpt_tensorÚog_sizecCs"| d¡|kr|dd|…fS|S)z'Handles the unpad of the last dimensionéÿÿÿÿ.N)Úsize)r rrrr Ú_postprocess_flash_outputsrÚ head_dim_sizeÚscalecCs|dur|Sdt |¡S)z For FlashAttention we pad the head dimension to be a multiple of 8 so we need to scale the output by the original head size and not the padded. Ngð?)ÚmathÚsqrt)rrrrr Ú_calculate_scalesrçFÚqueryÚkeyÚvalueÚ attn_maskcCsÈ|s|j|jks|j|jkrtd|j›d|j›d|j›dƒ‚|j|jks+|j|jkrs.øÿþýü ÷