In an earlier issue, I promised we would dive further into matrix multiplication. That promise has finally been fulfilled.
Why matrix multiplication is important
If you’ve ever accidentally started training a machine learning model on a CPU and then switched over to a GPU, you know what a difference the change in hardware makes. Model training on a GPU could be done overnight, while a CPU would be crunching the same data for a week. That’s because machine learning is just matrix multiplication and GPUs are great at matrix multiplication. In fact, this paper found that matrix multiplications comprised the majority of the time required to pre-train BERT.
In recent years, there has been a trend of NLP models growing larger and larger. As a result, significant sections of research papers discussing model developments analyze more efficient ways to train larger models, handle larger datasets, and more efficiently compute matrix multiplication.
That’s why the recent findings by Google’s AI research division DeepMind, which identify more efficient ways to do matrix multiplication, are so impactful to machine learning. DeepMind may sound familiar because of the success of their AI system AlphaZero, which showed superhuman performance in games such as Chess and Go. DeepMind’s new system AlphaTensor builds upon their earlier work and similarly treats matrix multiplication like a game.
If we can train models efficiently, we can train larger models on more data, thus achieving higher performance. Eventually, we’ll discuss the current best practice optimizations to improve memory usage and computational complexity to train beefier neural nets.
For now, consider how you would multiply the two 4x4 matrices:
As indicated, you would probably take the rows and multiply by the columns (or take the columns and multiply by the rows) which results in a required 64 individual multiplications. For centuries, this was exactly how everyone would have multiplied these together. However, in 1969, Volker Strassen, a German mathematician, found a way to do it in 49 multiplications. Last month, DeepMind found a way to do it in 47 multiplications.
Winning strategy
So how in the world do you actually reduce the number of multiplications required to multiply two matrices together? By trading multiplication for addition.
The addition and subtraction of floats is a very cheap operation. Consider
This operation requires two multiplications — one for A and one for B.
We can rewrite the previous equation like this:
Now we only have one multiplication! All three of the equations we’ve looked at are the exact same, but just by rewriting it, we’ve traded multiplication for addition.
This simple example of reordering the operations is exactly one of the types of strategies that DeepMind’s AlphaTensor searches for. It keeps track of the number of steps required to correctly solve the matrix multiplication as the “score” in the game and uses reinforcement learning to lower this score. After enough iterations of the game, AlphaTensor gradually learns to improve and find new strategies to do matrix math previously unknown to mathematicians.
If you’re interested in other NLP-related newsletters, you should check out Sebastian Ruder, a research scientist at Google, and his newsletter
.
Very cool, thank you!
Wow!