The Conference and Workshop on Neural Information Processing Systems is one of the headliner events of the artificial intelligence and machine learning world, with the NeurIPS award being one of the most prestigious awards in the field. In this year’s iteration, NVIDIA has walked away with two awards for their research papers on AI.
In addition to these awards, the tech giant also presented a series of AI advancements that they had been working on for the past year. Keeping in line with their expertise in the 3D field, NVIDIA has released two papers introducing novel lighting techniques and 3D model generation. However, their crowning achievements for the year are twofold—the first being a paper on exploring how diffusion-based AI models work, and the second being ‘MineDojo’, an AI trained to play the game, Minecraft.
However, one competitor was missing from the NeurIPS awards this year; Stability AI. This company, well-known for its Stable Diffusion model, was conspicuously absent from this year’s awards ceremony despite its sizeable contributions to deep learning.
Let’s delve deeper into what NVIDIA brought to NeurIPS in 2022.
An AI that can play Minecraft?
Minecraft is the most popular video game in the market today, with over 238 million copies sold. While games have been a prime training ground for AI models, most previous undertakings have required highly specialised models made for a specific game. Some prime examples include AlphaStar, DeepMind’s AI trained to play StarCraft II, and OpenAI Five, OpenAI’s model to play DOTA 2. However, these specialised models cannot extend to being a generalist agent, which mimics how humans play games more closely.
To take on this challenge, NVIDIA set out to create an AI that can play Minecraft. While other games have been extremely rule-based, Minecraft’s main gameplay loop relies on open-world exploration and completing certain achievements to progress towards the endgame. Moreover, it boasts incredibly flexible gameplay, as players can take any path they wish to solve a problem.
NVIDIA’s solution to this open-ended game was MineDojo, a framework that can enable AI agents to learn Minecraft. This agent won an Outstanding Datasets and Benchmarks Paper Award from the NeurIPS committee owing to its novel approach of using an Internet-scale database to train the model. The dataset consisted of 750,000 Minecraft videos, over 6,000 web pages from the Minecraft Wiki, and millions of Reddit threads on the game—creating a comprehensive picture of the game’s mechanics and general tips and tricks that the agent can use.
Moreover, before the agent was even created, NVIDIA’s researchers created MineCLIP, a foundation model that learnt to understand Minecraft YouTube videos. The information from these videos were then used to train a reinforcement learning agent, which could play the game and complete many tasks without any human intervention.
Optimising diffusion-based generative models
Diffusion models are the new talk of the town when it comes to generative AI. Taking over from GANs and Flow-based models, the approach of diffusion models towards creating unique content has put them at the top of the chart when it comes to adoption and industry application. Diffusion models can already achieve superior image sample quality when compared to GANs, and research has only just begun on these new kinds of agents.
NVIDIA is throwing its hat in the ring when it comes to diffusion models by writing a paper on how diffusion models can be further optimised to generate even better results than what was previously thought to be possible. Their research paper breaks down the components in a diffusion model and identifies processes that can be adjusted to improve the performance of the model.
This paper won an Outstanding Main Track Paper award at the conference due to its ability to enable optimisations of existing models. The methods described in the paper also proved to be highly effective, as they were able to allow models to achieve record scores on various performance metrics.
Generating 3D models from 2D images
Companies are gearing up for the Metaverse, and NVIDIA is no exception. Earlier this year, they released NVIDIA Omniverse, a platform for creating metaverse applications. Now, as a part of its accompanying tech stack, they have created a model that can generate high-fidelity 3D models from 2D images.
Dubbed NVIDIA GET3D (Generate Explicit Textured 3D Meshes), the model is trained only on 2D images but has the capability to generate 3D shapes with complex details and a high polygon count. Moreover, it also generates these shapes in the same format used by prominent 3D applications, allowing creative professionals to generate 3D models and immediately begin working on them.
While manually modelling a realistic 3D world is time- and resource-intensive, AI like GET3D can vastly optimise the 3D modelling process and allow artists to focus on what really matters. The model can reportedly create 20 shapes in a second with the computing power of just a single NVIDIA GPU; a huge time save compared to competitors and other offerings on the market.
Correcting other generative text models
Meta’s Galactica recently made the news for “hallucinating responses”, highlighting a key problem with language models—the accuracy of the facts presented in the generated text. To this end, NVIDIA presented another research paper at NeurIPS that aims to increase the accuracy of LLMs.
Their approach is to design a test set of factuality prompts, which is first used to determine the veracity of the information being output by the model. This functions as an automatic benchmark which verifies the accuracy of the text. The paper also proposed a novel training method known as factuality-enhanced training, aimed to reduce the overall rate of factual errors in the generated text.
NVIDIA continues to reinforce its leadership position in the AI and ML world. From creating an AI supercomputer with Microsoft, to developing new hardware optimised for training and inference tasks, to its vast library of software made for AI developers—the company has solidified its position as a cornerstone of AI development. They were ahead of the curve when it came to GANs, and while they fell behind when it came to diffusion models, it seems like they are quickly catching up.
Moreover, their advances towards creating a general AI is also an important part of their strategy. In that sense, MineDojo and the two awards from NeurIPS seem just like the beginning of NVIDIA’s positive future for AI.