• kingwhocares@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      XMX extensions are designed for matrix multiplication operations in FP64, FP32, FP16, and bfloat16 formats, which are crucial in many AI algorithms, including neural networks

      From the article.

      • AnimeAlt44@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        bfloat16

        I have no idea why this is important or really what it even is but Apple had a pretty video about it in their WWDC catalog so I guess it’s a trend now.

        • GomaEspumaRegional@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Standard float16 uses 1bit sign + 5bit exponent + 10bit fraction.

          bfloat16 uses 1bit sign + 8bit exponent + 7bit fraction.

          bfloat16 basically gives the same exponent precision as a standard float32. But most neural networks don’t require a huge fraction range. So bfloat16 gives you the possibility of executing 2x 8bit NP FLOPs vs using a float32 to do the same 1x8bit NP FLOP.

          Having the ALU support this format allows for the scheduler to pack 4xbfloat16 that can be executed in parallel in a standard 64bit ALU. So basically you double or quadruple the 8bit NP FLOPs that you would get from using traditional float16/32 representations.