Deep C Dive: NVIDIA Tesla, GPUs and fundamentals in a volatile market

May 10, 2024

On 5/3 I came across this Tweet linking to a 2018 technical report published by Citadel’s High Performance Computing R&D team, “Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking”. NVIDIA is not in any sense a fresh player in the market; it’s now more than 30 years old, employs nearly 30,000, and solidly passed the $2T market cap milestone a few months ago. However, NVIDIA and CEO Jensen Huang have only seemed to become proper household names in the last year with unprecedented AI trends in industry and popular culture; Citadel was ahead of its time publishing a performance measurement study on NVIDIA GPU architecture at least five years before it was widely-recognized outside of high-performance graphics and deep learning communities.

It’s NVIDIA’s Tesla’s world… we’re just living in it

Before we get into the details of the Citadel paper, I want to provide important context. In 2008, Erik Lindholm et al. (NVIDIA) published an IEEE whitepaper proposing a “unified graphics and computing architecture”; the NVIDIA Tesla architecture, first deployed in 2006 with the GeForce 8800 GPU.

Peter Glaskowsky published “NVIDIA’s Fermi: The First Complete GPU Computing Architecture” a year later, and it’s insightful to understand what constituted the full-package state-of-the-art 15 years ago: “double-precision floating-point performance… linear addressing model model with caching at all levels, and support for languages including C, C++, FORTRAN, Java, Matlab, and Python.” Importantly, “The 8800… introduced CUDA, the industry’s first C-based development environment for GPUs… With Tesla, programmers don’t have to worry about making tasks look like graphics operations; the GPU can be treated like a many-core processor.” (Glaskowsky, 2009)

More on GPUs

I found a nice reference by experts from Finland, Denmark, Sweden, Norway, and Lithuania (road trip, anyone?):

“GPUs were initially developed for highly-parallel task of graphic processing. But over the years, they were used more and more in HPC.”
“What problems fit to GPU?” Generally, I find the actual exercise of matching problems to possible solving techniques severely understudied in both research and industry. The most common example we see now in industry is the perception that “LLMs can solve everything” — “throw LLMs at the problem” — “ask ChatGPT” — which can certainly solve certain problems very well but currently requires a very high level of machine understanding and sheer patience to consistently develop a quality solution or product.
- This is not to say that we shouldn’t keep democratizing the tools and hyping up (official, technical term) new players in the space. Creative thinking and even brash risk-taking can drive innovation, for better or worse. It’s our duty as participants in research and industry to invest further in ethics and fairness.
- Back to the GPU question, because I want to make sure we don’t conflate LLMs with GPUs. GPUs are optimized for parallelization, not sequence or memory. TLDR: Good for graphics rendering, simulations (also cryptocurrency mining, historically). Not so good for memory-intensive, “extensively branched” applications (many if statements… aka average B2B SaaS backend).

Back to the fundamentals

My high school chemistry teacher was one of the hardest graders I’ve ever had, but instilled in us a deep appreciation of the core forces driving chemical reactions. I actually went through three different chemistry courses at two different schools in order to pass one so I could graduate with my high school requirements. The first time, I dropped the class because it wasn’t a year requirement and I was bored out of my mind. The second time, a classroom overflow put me in a class with exclusively advanced level seniors. I learned a lot in that class, on all topics except chemistry. The third and last time I took chemistry, I still wasn’t getting great grades (third time was not the charm) but I began to stumble over enough of the same, recurring patterns in chemical analysis to see the arithmetic beauty of chemistry.

This was the first subject I was not able to coast by on via sheer short-term memory and a voracious reading appetite. Before I realized the base patterns that defined chemical reactions, I was too impatient to look past the less predictable, harder-to-grasp criticalities of chemistry.

After the first and only film class I ever took, I developed a taste for wide angle cinematography. Just as in chemistry, there was a standard formula I used to assess overall film quality. Cinematography, script, acting. That was it.

When I think about the software industry, especially this AI revolution we’re in, I think about creativity, demand, and durability.

Creativity: Utility and productivity generated by human ideas and collaboration.
Demand: Societal appetite for natural (e.g. agriculture) and artificial (e.g. social media) products.
Durability: Demand that has been and is likely to continue being resilient to the test of time and changing societal preferences.

What’s interesting here is that the right combination of creativity and demand can ensure consistent durability — e.g. Apple sustains itself as long as customers are still engaged with each new iteration of the iPhone — and that there are long-known opportunities in less disrupted, and often more durable markets for higher creativity/innovation to spark demand. The question with these known opportunity markets is not primarily “how” but rather “why” and “when”; will creativity investment pay off well (generate demand) & early enough (durability) to fund not just the first effort but also create space in the market for competitors (can be short-term counterproductive to individual interest, but long-term may be beneficial in generating larger overall pie)?

Market behemoths largely know that competition is good, even if they need to fight harder to preserve and build capital, but the hyper-skewed economic microcosm of Silicon Valley has surprised investors and challenged and warped these fundamentals for decades. All I know is to chase the trifecta of creativity, demand and durability.

Val's Tech Blog

Discussion about this post