• dine-and-dasha@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    The law only restricted raw FLOPs, so it has to be that. But the law has a chiplet subclause so it might be there’s some interaction there that pushes the AMD gpus over the edge.

    • From-UoM@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      the 4070 ti at 294mm2 (full ad104) with 160 Tflops of Fp16

      The 7900xtx GCD is 300 mm2 (Full Navi31 GCD only) with 122 tflops of Fp16

      Doubt its that.

      Where there might be reasons is that RDNA doesnt hasve AI cores. The tasks are accelerated on the shader cores.Hence the term AI Accelarators. Now assumming nvidia cards ignore the tensor cores.

      The 4090 can do only 82.6 Tflop of FP16 (Non-Tensor).

      The 7900xtx would still retain its 122 tflops of FP16. making it faster in Fp16 performance.

      • TwanToni@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        doesn’t RDNA3 have WAVA MMA or Wave Matrix Multiply Accumulate which is their AI cores?

        • From-UoM@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          It has the instruction sets in the compute units

          They are called AI accelerators for that reason.

          Not Ai cores.

          The actual Matrix “Cores” , i.e. dedicated silicon, are on the instinct series

        • dotjzzz@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          No. Tensor cores have seperate specialised matrix ALUs, AMD’s WMMA are instructions on existing shader ALUs.

          Tensor cores can process AI tasks in parallel to CUDA cores, RDNA3 can’t do both on the same CU.

      • Qesa@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        The actual rule has hard numbers, no need to speculate. And it’s no more than 300 TFLOPS of fp16 (or 150 fp32, 600 fp8, etc) so it ain’t TFLOPS that are the culprit. As for performance density, it’s equivalent to those figures at an 830mm^2 die, so again not that.

        • dine-and-dasha@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          Ok I didn’t know the actual numbers that’s helpful. Maybe they’re just holding off to apply for an export license? I heard the 4090 is in a “gray area”.

          • f3n2x@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            10 months ago

            No gray area, at base clocks the 4090 exceeds the limit by 10% already.