Nvidia and Microsoft revealed on Monday that they are working together on something called "Megatron-Turing Natural Language Generation model». The two companies claim to have created the largest and most capable "monolithic model transformer language that has been trained to date».
To get an idea of how big this is, the famous GPT-3 which has become known in recent years, now has 175 billion parameters. By comparison, the new MT-NLG model spans 105 levels and has at least 530 billion parameters.
The MT-NLG is the successor to the models Turing NLG 17B and Megatron-LM and was able to demonstrate "incomparable accuracy" in a variety of natural language tasks such as reading comprehension, common sense, prediction of completion, word clarification, and of course linguistic conclusions.
Nvidia and Microsoft train this huge AI model in one supercomputer called Selene. It is a system consisting of 560 servers Nvidia DGX A100, each of which has eight GPU A100 equipped with 80 gigabytes of VRAM connected via interfaces NV Link and NVSwitch. Microsoft notes that this configuration is similar to the reference architecture used in supercomputers cloud Azure NDv4.
Interestingly, Selene is also powered by processors AMD EPYC 7742. The construction of the Selene cost about $ 85 million.
Microsoft says MT-NLG was trained in 15 datasets containing more than 339 billion chips. Sets were retrieved from English language sources such as academic journals, online communities such as Wikipedia and Stack Exchange, code repositories such as GitHub, news sites and more. The largest data set is called The Pile and contains 835 gigabytes.
Overall, the project revealed that larger AI models need less training to work well enough. However, the problem that remains unresolved is that of prejudice. It turns out that even when using as much and different data from the real world as possible, the giant language models increase prejudice, stereotypes and all kinds of toxicity during the educational process.
It has been known for years that AI models tend to reinforce bias in the data fed into them. This is because datasets have been collected from a variety of online sources where sexist, racial and religious prejudices are commonplace. The biggest challenge in resolving this is to quantify prejudice, which is hard work and still ongoing, no matter how many resources are used.
A previous Microsoft experiment was a Twitter chatbot named Tay. It only took a few hours for Tay to acquire the worst features humans could teach him, and the company was forced to scrap it less than 24 hours after its release.
Nvidia and Microsoft have stated that they are committed to addressing this issue and will make every effort to support research in this area. At the same time, they warn that organizations wishing to use MT-NLG must ensure that appropriate measures are taken to mitigate and minimize potential harm to users. Microsoft noted that any use of artificial intelligence should follow the principles of reliability, security, confidentiality, transparency and accountability described in the guide. "Responsible AI".