NVIDIA GTC 2024

Some takeaways from GTC 2024:

Vendors

Vendors with noticeably busy booths:

Weights and Biases (busy despite having multiple booths): experimentation platform, focused on managing experiments wrt hyperparameter optimization; their product and go-to-market teams seem top-notch
CoreWeave: a cloud vendor specializing in GPU compute
Twelve Labs: multimodal AI (video)
Unstructured: tooling to prep terrible enterprise data into JSON

Some legacy vendors are holding on and trying to pivot.

The glamor of the expo area is real (GTC this year is what everyone else’s conference aspires to be).

There were a lot of dog-sized robots in varying form factors for varying purposes – pizza delivery, lawn mowing, dancing, etc. I didn’t see applications of those robots that made much sense to me.

Best practice software

Zephyr: an excellent current smallish open source LLM assistant; it will probably be surpassed in a month, but it’s worth keeping an eye on the creators and papers noted by HuggingFaceH4, the group that put together Zephyr (and others) (this is a good place to be checking for papers worth discussing)
Deita: open source project for instruction tuning; there is a scoring estimator included that we might want to investigate for quality evaluations at scale
Eleuther LM Eval Harness: quantifies the perf of LLMs; is the backer to the HuggingFace Open LLM Leaderboard; you can set your own custom prompts and evaluation metrics

Our talk

The “GPU Power Play” talk with PJ went well. We filled about half the seats in a Salon (maybe on the order of 50 people), people filed into the room as we talked and once they arrived they didn’t leave. They took lots of pictures of the slides. We had about a half a dozen people queue up to talk to us after we finished.

We heard from someone interested in putting security defenses onto the compute hardware and into the storage layer. This seems extremely enticing, though I’m not sure how you get the speed while also getting the security. Perhaps the play is to satisfice on quality by relying on capability improvements over time, and expect people are willing enough (or regulated enough) to pay twice over for security? Having not heard the pitch, I’m not sure how you wouldn’t end up paying a direct fee for the hardware/storage, and then also again in reduced specs.

Giving talks

I went to one talk that had an extremely effective title & talk (which I won’t name because although I was impressed, my analysis could be construed as critical). I came away satisfied even though the content turned out halfway through to be marketing fluff. In essence, the speaker framed a ridiculous problem with “wouldn’t it be amazing if it worked” – and then they showed that it doesn’t work, and they got a laugh – and then they showed that it kinda does work, and they got knowing nods – and then they shifted to sales. It was brilliantly executed.

It started out looking like a practically-minded semi-academic presentation, but it gracefully turned into a product demo. The room was standing room only, and didn’t drop away, so I wasn’t the only one who was drawn in, recognized the bait-and-switch, and still appreciated it. The speaker seemed to have minimal experience with deep learning (e.g., they expressed surprise that keeping the first few layers of a DL model and dropping the other layers works better than keeping only the top few layers and dropping the layers that actually connect to the embeddings). Even without deep experience, though, they turned a technically truly silly idea into great insights for product and for marketing. They’re clearly very good in their role.

What worked well at attracting me in this and other talks:

Select a title that suggests a technical capability that a lot of people would love to achieve (without mentioning the solution you’re marketing).
Gradually (very gradually) ease into your solution. Prepare a technical quasi-academic talk, with data selection and approach and challenges well motivated. Name-drop your solution about 25-33% through, but then go right back to the “how to” technical talk. In the final third, transition to a product demo that builds on the tangible, potentially-nonsensical-but-wonderful-if-true example you’ve been fleshing out.
Refer to many projects and people, and do so authoritatively.
Orient the audience to an entire topic that might otherwise be unfamiliar, so that they do benefit technically from the talk. Perhaps the topic is your overall framing, or perhaps it’s a cross-cutting challenge that your problem lets you talk about (performing simulations can fit here), or perhaps it’s particularly useful datasets and tooling – or perhaps it’s multiple topics.
Pre-record the screencast of the demo – but talk over the screencast at the event. This gives you flexibility in the talk track, and it keeps the energy in the room under your live control.

LLMs for text-to-nontext

The only “cool” application I happened to see, for which I could immediately see the value unfold like the glory of Jensen Huang before me, was in autonomous vehicle simulation. NVIDIA is using an LLM to generate non-language synthetic data such as autonomous vehicle scenes (they have a programming language that represents the layers in a scene, and distinct AI models for each layer: one can produce maps of intersections, another can add objects like cars to those maps, and so on). With this text-to-scene platform, their users can chat with an LLM to generate synthetic scenes that focus on edge cases that rarely occur in real data, and they can also produce synthetic data from historic accident reports that focus on the tricky situations. Currently about 20% of their data is synthetic. NVIDIA gave a lot of talks about this idea at GTC, so they’re selling it hard – but it’s also pretty interesting. (Much more interesting than a robot with a leaf blower strapped on it, I’d say.)

Other thoughts

Our interest in foundation models for everything – language, images, autonomous vehicles, etc. – is in direct conflict with Rich Hickey’s idea of simple vs. complected systems. I suspect we as an industry may find ourselves tick-tocking back out to interpretability again in the near-to-medium future.

“Digital twin” is part of the marketing zeitgeist/cliche verbiage; we haven’t been using this phrase so often at Palo Alto Networks. I’m not recommending we start.

There were a lot of vendors focused on GPU compute and fully utilizing GPU compute and enabling AI researchers to train highly effective models – but there didn’t seem to be many people actually doing the work of training those low-loss models with massive hyperparameter experimentation. There was a lot of interest in anyone who indicated they might be successfully training interesting/useful models (but no one actually was). The companies with large research wings using a ton of compute that attended (e.g., Google) were represented by their B2B product arms and not their AI researchers.

It seemed like there were two kinds of talks and participants – (1) interesting technical work on very specific bespoke problems whose actual overall necessary-ness is questionable (e.g., optimization of GPU space so you can run huge contexts – which was a nice talk from the US Navy, but it’s not clear to me that anyone actually needs huge LLM contexts – there were a number of this kind of technically-interesting-but-questionable-value talks), (2) basic application work in a variety of domains (e.g., models that try to detect something like floods in imagery, or copilot work, or use of generative AI within a general stack). This division between “highly applied but very basic” and “technically interesting but limited usefulness” is a tension I just cannot shake.

I would be comfortable giving a talk again; I think that was a reasonable trade-off of time for value (especially given material re-use). I wouldn’t attend GTC for more than that same day unless we were connecting deliberately with people across the industry for some purpose and had done a lot of pre-work toward that end.

So overall, it was a useful conference this year to me primarily in terms of clocking AI imposter syndrome, figuring out some new stealth marketing patterns for talks, and getting a feel for where the zeitgeist is pointing in terms of vendors and open source projects.

Pamela Toman

Vendors

Best practice software

Our talk

Giving talks

LLMs for text-to-nontext

Other thoughts

What’s this blog about?

Recent posts

Tags