Addressing the protection challenges of 48V AI servers using hot-swap controllers
Enabling the design of a reliable input protection solution for a 48V AI server.
BY AVISHEK PAL, POWER SWITCHES AT TEXAS INSTRUMENTS AND RAKESH PANGULOORI, POWER SWITCHES AT TEXAS INSTRUMENTS.
With advancements in artificial intelligence (AI) and machine learning, enterprise servers have become extremely power- hungry as they simultaneously process a large amount of data and storage. The steady-state power rating of each server motherboard has gone up to 5kW or 6kW, in contrast to 1kW or 2 kW for general servers. The form factor remains the same, however, which imposes system design challenges given the increased power density. The load amplitude, slew rate and frequency of transient loads on AI servers have increased three to four times compared to general servers.
With advancements in artificial intelligence (AI) and machine learning, enterprise servers have become extremely power- hungry as they simultaneously process a large amount of data and storage. The steady-state power rating of each server motherboard has gone up to 5kW or 6kW, in contrast to 1kW or 2 kW for general servers. The form factor remains the same, however, which imposes system design challenges given the increased power density. The load amplitude, slew rate and frequency of transient loads on AI servers have increased three to four times compared to general servers.
Figure 1. Typical block diagram of a 48V rack server power distribution.
In this article, we’ll discuss various challenges that AI- based processors bring into 48V server designs, along with design guidelines and important tips and tricks for the design and layout to achieve a reliable hot-swap solution for the system specifications outlined in Table 1.
Table 1. Typical system specifications.
Challenges in designing a hot-swap circuit for a 48V AI server
It’s interesting to look at how hot-swap circuit configurations have evolved over the years. A hot- swap solution consists of three main components: a N-channel metal-oxide semiconductor field-effect transistor (MOSFET) that serves as the main power control switch; a sense resistor that measures the current; and the hot- swap controller, which includes a current-sense amplifier completing the loop to control the MOSFET’s pass current.
Figure 2. Traditional power-limiting hot-swap circuit.
As shown in Figure 2, you can use a single MOSFET- based hot-swap solution for low-power designs.
Fundamentally, the hot-swap controller comes with current- and power-limiting functionalities to limit the inrush and fault currents while ensuring the MOSFET’s safe operating area (SOA). These functionalities are good enough to design low-power (<500W) hot-swap solutions.
With the increase in digital load, the system needs a higher output capacitance (>470µF), requiring parallel MOSFETs to support steady-state current and the adoption of output-voltage slew-rate control [1] to keep the MOSFET within its SOA.
In the output-voltage slew-rate control method, capacitor Cdv/dt placed across GATE-GND (see Figure 3) limits the slew-rate of the gate and output voltages, which limits the inrush current. MOSFETs can handle more energy when the power dissipation in them is reduced and spread over longer durations. Therefore, as the output capacitance increases, you need a higher Cdv/dt to reduce both the inrush current and power dissipation in the MOSFET during startup.
Figure 3. Hot-swap circuit with GATE slew-rate control.
A higher Cdv/dt interferes with the turnoff process, however, the hot-swap controller has limited pulldown strength. This necessitates a local P-channel N-channel P-channel (PNP)-based discharge circuit for Cdv/dt, as shown in Figure 4. During startup, Cdv/dt controls slew- rate in the same way, but during a turnoff event, the Q1 PNP transistor activates and discharges Cdv/dt locally. Diode D1 blocks the discharge of Cdv/dt into the GATE pin, which reduces the stress on the GATE pin and also ensures proper operation of the controller.
Figure 4. Hot-swap circuit with local discharge path for Cdv/dt.
In AI-powered graphics processing unit applications, the hot-swap solution has to support currents around 150A and must support high-frequency, high slew-rate load transients, which present three new challenges.
Challenge No. 1: Turnoff delay during an output short-circuit
With the increase in load current, more MOSFETs need to be parallel to limit the maximum steady-state MOSFET junction temperature to a safe value (100°C to 125°C).
For example, to support a steady-state load current of 150A at an ambient temperature of 70°C, eight Texas Instruments (TI) CSD19536KTT MOSFETs need to be in parallel to limit the steady-state MOSFET junction temperature to 100°C. Paralleled MOSFETs help thermally, but increase the effective capacitance on the GATE pin of the hot-swap controller and impact the turnoff response.
During an output short-circuit, the MOSFETs need to turn off fast enough to prevent further buildup of fault current and avoid damage to the MOSFETs, input power supply, or printed circuit board (PCB). The gate pull- down strength of the TI LM5066I hot-swap controller is limited to 160mA, which is not enough to turn off all eight MOSFETs completely during a short-circuit event, as shown in Figure 5.
Figure 5. Short-circuit response of the LM5066I controller with eight MOSFETs
Challenge No. 2: False gate turn-off during a load transient
Although the local PNP-based discharge circuit for Cdv/dt helps reliably turn-off the MOSFETs during an output short-circuit event, it causes a false GATE turn-off in the presence of high-frequency, high slew-rate load transients. During load step-up, the MOSFET source node drops because of the finite input and output impedances of the hot-swap circuit. The voltage drop at the source node gets coupled to the MOSFET gate node through the CGS capacitance of the MOSFET and causes the gate node to drop as well. The MOSFET source node recovers during load step-down. The gate node cannot recover completely to its previous level, because of the limited gate current (20µA typical) of the LM5066I hot-swap controller.
As a result, the hot- swap controller gate continues to drop further in the subsequent load transient cycles developing the base- emitter voltage for Q1. Finally, PNP bipolar junction transistor Q1 turns on, and falsely shuts down the system. Figure 6 illustrates the whole process, while Figure 7 shows the corresponding test result.
Figure 6. (TOP) Illustration of a hot-swap circuit for a dynamic load. Figure 7. (BOTTOM) Response of a hot-swap circuit to a dynamic load.
Challenge No. 3: Parallel resonance during controlled (slow) turn-on
Generally, parallel MOSFETs are more prone to parasitic oscillations than a single MOSFET in the linear region
of operation. This is because of the presence of parasitic stray package inductances and capacitances on the drain, source and gate nodes, which form a resonant tank circuit resembling a Colpitts oscillator. Unlike switching regulators with a gate-drive strength of >2A, hot-swap controllers with a lower gate-drive strength (20µA) limit the inrush current during start-up by operating the MOSFETs in the linear region.
As a result, the parallel combination of hot-swap MOSFETs is highly susceptible, with more chance of generating sustained oscillations. This phenomenon causes the violation of the MOSFET SOA during a power-into-short fault, leading to MOSFET damage.
Proposed circuit enhancements
Let’s discuss circuit enhancements to help solve these three challenges.
Improving the turn-off response
In the proposed solution shown in Figure 8, introducing an external fast pull-down circuit using - PNP transistor (QPD and RPD) will boost up the turn-off speed. During an output short-circuit event, the gate pull-down current of 160mA creates a substantial voltage drop across the RPD resistor and enables fast pull-down of the PNP transistor (QPD). This in turn shorts the gate-to-source of all parallel MOSFETs, turning off the MOSFETs immediately to quickly disconnect the power path. Figure 9 shows the experimental result for a short-circuit event with a fast pull-down circuit.
Figure 8. (TOP) Proposed hot-swap circuit configuration & Figure 9. (BOTTOM) Output short-circuit response with fast pull-down circuit.
Figure 10. Load transient performance for steps from 20A to 120A to 20A at a 1kHz frequency.
Overcoming false turn-off for dynamic loads
In this solution, the hot-swap gate node is decoupled from the MOSFET gate terminal by placing the DSS diode between them, again shown in Figure 8. This modification helps eliminate the reflection of output voltage ripple to the hot-swap controller GATE node and avoids false turn-on of the soft-start PNP transistor, Qss.
Changing the position of the diode does not impact controller behavior during start-up nor any of the fault events. As shown in the test result (see Figure 10), the system operates continuously even for large load steps from 20A to 120A at a 1kHz frequency.
Damping parasitic oscillations
Adding a damping resistor (RG1, RG2, RG3) in series with the gate of each MOSFET can eliminate the parasitic oscillations in the system.
Usually, we recommend a 10Ω 0603 package resistance, but based on the parasitics, a low value around 1Ω may also help. We suggest testing on your PCB and deciding the value of the damping resistor.
Design guidelines and component selection
Reference [1] iterates a procedure for designing a hot- swap circuit to protect the system and MOSFETs. We recommend reviewing Reference [1] to become familiar with the design.
Feeding the system specifications shown in Table 1 into the LM5066I design calculator will obtain the values of the current-sense resistor (RSNS), power-limiting resistor (RPWR), fault timer capacitor (CTIMER), soft-start capacitor (Cdv/dt) and number (N) of selected MOSFETs to parallel. In the 8kW Hot-Swap Reference Design for 48V Artificial Intelligence Servers [2], RSNS = 330µΩ, RPWR = 28.7kΩ, CTIMER = 10nF, Cdv/dt = 47nF and N =8.
Looking at Figure 8, select the RPD resistor using RPD> (1)
You will need output Schottky diodes to protect the output pin of the hot-swap controller against a negative where, VBE(sat) is the base-emitter saturation voltage of the QPD PNP transistor and IGATE(CB) is the power-on reset circuit-breaker sink current in the LM5066I hot- swap controller. The 8kW hot-swap reference design uses an RPD value = 20Ω.
Cdv/dt discharge circuit
Figure 8 uses a 100V signal diode for DSS. The diode should handle a few tens of milliamperes of forward current. The 8kW hot-swap reference design uses the BAV16W-7-F from Diodes Inc.
You will have to select RSS1, RSS2 and QSS iteratively so that none of the three components become stressed during turn-off. For QSS, you can select any standard PNP transistor with collector-emitter (VCEO) and collector- base (VBEO) voltages of >100VDC and a continuous collector current of >200mA.
Select the values for RSS1 and RSS2 and their respective power ratings to limit the current flowing through the QSS transistor to a safe value. You must use a special high-power resistor for RSS2 to manage the transient peak power stress during turn- off. The 8kW hot-swap reference design uses onsemi MMBT5401LT1G for QSS, with RSS1 = 100Ω and RSS2 = 499Ω (the Vishay RCS0805499RFKEA).
Input transient voltage suppression (TVS) diodes are required to protect against transient overvoltages during input hot-plug and output short-circuit events. The TI TVS diode recommendation tool can help you obtain the part number (voltage and power ratings) of the TVS diode and the number of TVS diodes to parallel.
The 8kW hot-swap reference design uses three Littelfuse 8.0SMDJ60A TVS diodes. For a deeper analysis into TVS diode selection, see Reference [3].
You will need output Schottky diodes to protect the output pin of the hot-swap controller against a negative transient in the event of an output short-circuit event. The 8kW hot-swap reference design uses three onsemi FSV20100V Schottky diodes.
Conclusion
The emerging 48V AI servers demand significantly more power,
both in peak and steady states, than traditional servers. The high-power
consumption along with fast and transient dynamics impose challenges in
designing front-end protection using a hot-swap controller and parallel
MOSFETs. The challenges include fast turn-off of parallel MOSFETs for
real faults while avoiding false turn- off for high-frequency transients from
the computational load.
The proposed solution in this article eliminates the limitations of legacy hot-swap controllers and enables the design of a reliable input protection solution for a 48V AI server.































