How liquid cooling is rewriting the rules of data centre design

Liquid cooling only way forward as data centres move towards 1MW racks.

How liquid cooling is rewriting the rules of data centre design
Photo Credit: Paul Mah. Dr Ali on stage.

As data centres move towards 1MW racks, liquid cooling isn't just an option anymore. It's the only way forward.

That's the picture that emerged at the Ecolab liquid cooling seminar today, where Nvidia's Dr Ali Heydari - who invented liquid cooling for the AI giant - laid out what's coming.

More efficient, but not foolproof

There are multiple advantages to liquid cooling, observed Nalco Water's Dr Lu Kangjia (Kelly). Fewer fans are needed, and also fewer chillers and compressors thanks to higher inlet temperatures.

But there are plenty of ways it can go wrong. Dr Kelly zoomed in on the final TCS, or Technology Cooling System loop, between the CDU and IT racks. Four main threats emerge here: scale, biofilm, fouling, and corrosion. Any of these could lead to unscheduled downtime, which is highly undesirable in data centres.

Overcoming them is entirely possible, but it requires care on several fronts. Material selection matters. So does design and fabrication. Chemical treatment and liquid monitoring round out the picture.

Take material selection as an example. Brass with more than 15% zinc is a no-no. But even stainless steel is not immune to corrosion. Get the science wrong, and disaster is only a matter of time.

What Nvidia is working on

Dr Ali is highly passionate about liquid cooling, and he shared his thoughts on the future and moonshot projects he's working on.

Several things caught my attention. Two phase liquid cooling using R134a. Digital control valves to control liquid flow. Rack emulators to simulate real Nvidia products. Digital Twin for workload simulation.

One issue with liquid cooling, Dr Ali observed, is the lack of industry standards. As liquid cooling enters more data centres, hopefully this will change.

Who owns what

There are also various operational considerations for liquid cooling in the data centre, noted Romain Tranchant from iMasons.

For a start, the facilities manager, IT team, and vendors will need to get together early to figure out key details. Risk acceptance. Service windows. Workload targets. Leak containment. Temperature setpoints. All of these need to be worked out before deployment.

Within the data centre itself, ownership of the components must also be sorted out. CDUs, pumps, manifolds, alarms and escalations, firmware setpoints, and leak detection thresholds all need clear owners.

What are your thoughts about liquid cooling?