I work in the field and I don’t know what the others here are saying. In practice the difference in performance doesn’t matter, we literally just launch a bunch of jobs on the cluster, and if they finish faster it doesn’t matter because I either haven’t checked my jobs, or I’m not ready to collect the results yet, or I’m just dicking around with some experiments or different parameters so the actual speed of completion doesn’t matter. A bunch of weaker GPUs can do the same task as a stronger one, only memory really matters. Doubly true if the company is big enough that power consumption is a drop in the bucket in terms of operational costs.
What actually matters is the overall workflow, stuff like the cluster having downtime is way more impactful to my work than the performance of a GPU, or the ease of designing and scheduling jobs/experiments on the cluster.
Also in the end all of this is moot, this type of training is probably the wrong approach to AI. Note that a child does not need a million images of a ball to recognize a ball, and it would instantly be able to recognize soccer balls, basket balls, etc as all balls after learning one. The way we train our AIs cannot do this, our current approach to AI is just brute force.
So I don’t know anything about SSD controllers, but will eventually a new controller come out that runs cooler and more efficient or are we basically stuck with this until Gen6? What I mean is, is the controller constantly updated on at a rate new CPUs/GPUs come out, or does only 1/2 controllers come out a generation such that there is no point in waiting or hoping for better efficiencies until next gen?