How Nvidia’s AI Security system saved supercomputer data centre operator $300,000 an hour and stopped it being hacked by bitcoin desperadoes

Downtime is so expensive to a super computer operator that a network failure can cost it a million dollars in lost productivity in half a day.

So a new level of artificial intelligence, which predicts and prevents operational issues, prevents network failures and catches hackers early, can instantly pay for itself. 

Processing giant Nvidia has unveiled a new artificial intelligence (AI) driven security system, which aims to minimise downtime in InfiniBand data centres using analytics to detect anticipate problems.

The NVIDIA Mellanox UFM has been used to manage InfiniBand systems for a decade. It applies AI to learn a data center’s operational cadence and network workload patterns. Drawing on this knowledge of both real-time and historic telemetry and workload data it can create a baseline of what is normal and acceptable. It then tracks the system’s health and network modifications, and detects performance degradations, usage and profile changes.

In June Nvidia added today a third element to the UFM family, the UFM Telemetry platform. This tool captures real-time network telemetry data, which is streamed to an on-premises or cloud-based database to monitor network performance and validate the network configuration. 

This means the new system can spot abnormal system and application behaviour. It can also predict potential system failures and nip these threats in the bud by taking corrective action. 

Supercomputers are often targets of high value system hacking by sophisticated crooks attempting to host undesired applications, such as cryptocurrency mining. The result is reduced data center downtime — which typically costs more than $300,000 an hour, according to research by ITIC.

The UFM Cyber-AI system allows system administrators to instantly detect and respond to potential security threats and prevent failures. This saves a fortune and provides the continuity of service that keveryone in a job, according to Gilad Shainer, senior vice president of marketing for Mellanox networking at NVIDIA.

‘It determines a data centre’s unique vital signs and uses them to identify performance degradation, component failures and abnormal usage patterns,” said Shainer.

Douglas Johnson, association director of the Ohio Supercomputer Center, has used the UFM platform for years in his employer’s InfiniBand data centres. ‘UFM and the expertise from the Mellanox networking team have been fundamental ingredients in the management of our network and the stability we’ve achieved,’ said Johnson.

The UFM Cyber-AI platform complements the UFM Enterprise platform, which manages networks, performance and security.

Advertisement

D-Link introduces new fever-screener technology to identify over heating staff in the COVID-19 crisis

Camera maker D-Link has launched a new spy-cam which can take the temperature of everyone in the office and report on who looks a bit peaky.

The new Group Temperature Screening Camera, DCS-9500T, is an all-in-one intelligent fever screening kit with a dual-lens thermographic picture taker, blackbody calibrator and management software.

It uses artificial intelligence (AI) to analyse data from the thermographic camera and can raise an alarm automatically if an unusually high body temperature is detected in any of the subjects. 

The system was developed by South Korean vendor D-Link for monitoring large, busy areas and gives fast skin-surface temperature detection for up to 30 people at once with accuracy to the nearest 0.3°C.

D-Link says it is intended for schools, factories, office buildings, airports or hospitals.

The fever screener has a high accuracy camera with a wide-angle thermal lens and an uncooled IRFPA 400×300 microbolometer high-resolution thermal sensor. Together these create razor-sharp thermal images and precise results when identifying those with a temperature. 

A full high definition (HD) optical imaging sensor allows the kit to create high-quality footage that overlays thermal and optical images into one.

Facial recognition technology in the management software means that the kit can identify staff members who are falling ill.

The fever screening system is compatible with open industry forum ONVIF, so it can be integrated into existing systems. 

The management software can manage up to 32 cameras, so up to 900 people could be monitored at once. 

Questions

What made the South Koreans develop this particular application of the technology?

Is it in use already in South Korea?

The technology identifies people who are overheating. How does it raise an alarm to management: through a discrete email to a manager, does it use public shaming methods such as social media or does it employ direct digital intervention such as shutting down the individual’s technology and despatching a Robot to eject them from the premises?

Has anyone raised any queries about the use of this technology?

Apstra’s Artificial Intelligence Beats Avoidable Extravagance

Loitering with Intent, before you start building your data centre, will save you a fortune says Apstra vice president Sean Hafeez

Apstra’s product management VP Sean Hafeez explains how Loitering with Intent can make you an absolute killing on your networking costs.

Like a network engineer about to begin a very expensive operation, we need to define our terms first. What exactly is Artificial Intelligence (AI) and how do we use it in Intent based networking?

Everything and noting, by the sounds of it. If my friend Guy Matthews phones an exclusive department store, the switchboard recognises his calling line identity (CLI) and automatically routes the call to an operator on the other side of the world who will be speaking in his second language. That decision is a cognitive function, ergo it is ‘artificial intelligence’.

However, AI is a lot more automated snobbery. It can help companies to save a fortune through planning and avoidance of expensive mistakes. Few people’s mistakes are more expensive than those of a network engineer and, ironically, few mistakes by engineers come under intense scrutiny. 

However, networking company Apstra has created an AI that automatically builds data centres as efficiently as possible. The problem with data centres is that you cannot afford to get them wrong. That would be an even more expensive mistake than asking a $200 per hour engineer to sort out the cable ties!

Aptra builds an operating system that slashes operating costs, prevents outages, speeds up your installations and makes you get your money’s worth of the networking engineers.

It’s used by the biggest networking companies in the world but the economies apply to small companies too.

The operating system (OS) works by working out what you want to do and then calculating the most efficient way to achieve it.

The problem for a data centre builder is that there are all kinds of equipment – servers, switches, cabling ducts – and even more types of software. This presents you with an infinite number of options, which means there are infinity minus one ways of getting it wrong. Even if you did get it right first time hat if, in retrospect, you wanted to take out all the Huawei kit. That would be very tricky to retrospectively audit.

Just on the cabling alone you have a nightmare. Cabling is an uncomfortable job. you’re not sitting comfortably when you make those connections. You’re often distorted into a weird position. It’s uncomfortable in those data centres. They’re dark and gloomy and lonely.

It’s a filthy job, if you’re down among the dead men, crawling around in a cabling duct. Many a cable rat has inadvertently kicked out a lead as they wriggle around in those dreadful cages. I remember running cabling along the corridors of a Hospital when I worked in its networking department. We were forever kicking up dust from the crumbling walls of that ancient institute.

It’s cold in data centres too. Since IT people are all massively over-cautious, the air-conditioning is dialled to the ‘Stun’ setting at all times. There’s a reason why networking companies love printing their logos on fleeces. Network engineers spent all their time shivering. 

For a combination of reasons – such as the strain on your muscles, the chill of the coolers or your revulsion at the dirt – cabling engineers often made hurried decisions when deciding where to plug a cable in. Nobody wants to hang around in those gloomy rooms for long. Many of them are haunted. There are rumours that organised crime has moved into ‘hosting’ because a data centre is the perfect place to hide the body of a cartel rival.

That’s all conjecture of course. What we can say with some certainty is that the data centre industry can be very expensive and needlessly wasteful. Nothing much is done about it as this is one of the great unreported scandals. Nobody knows enough about data centres to care and it’s very hard to get anyone interested. 

Nobody seemed to know what we did in that Hospital IT Department. Which is dangerous because that makes you unaccountable. When a doctor had his laptop stolen, for example, nothing happened. They didn’t even call in the police, which they ought to have done because it would have cleared the air. As it was, we were left with the unhealthy atmosphere of mutual suspicion.

Networking engineers cost an absolute fortune to employ and, since nobody really understands what they do, those costs are not manageable. How could they be when they can’t be measured?

When a network is created there is an incredibly complicated spaghetti of wires, each of which looks identical but plays a unique role and strategically critical role in the functioning of the network. Since the network is the central nervous system and intelligence of the modern enterprise, the consequences of all these misfiring synapses are painfully expensive.

When the chief executives and financial officers discover this, there will be hell to pay to those angry CEOs and furious CFOs.

Apstra’s operating system (AOS 3.2) will keep them at bay.

The problem ‘building at scale’ is that time is of the essence but mistakes are very expensive too. Each connection is vital to the functioning of one part of the network. It could be that a particular app that one set of users is totally reliant on the best connection to patch it directly into the server on which the apposite data and code resides. But that is one of 153,000 cabling options that the poor duct rat had to consider. 

Each server has 48 connections you can make.

Those 48 cables leading out from the server have, in turn, 1600 different end points they can be put into in the hubs and wiring closets. 

With some servers that’s 96 cables on your spine each with 1600 options on each panel. That’s 153,600 variations already. Each optical cable can have a thousand fibres, so consider how complicated it gets when you have a bundle of optical cables.

Every one of these options takes 20 minutes to investigate. So if you get it wrong, that’s many man hours that will be tied up, at $200 an hour, while the networking engineer checks cables.

In carpentry they say measure twice, cut once. Data centre builders say the road to promotion is paved with good Intent.  

Sean Hafeez has spent 20 years in networking, having joined Apstra in 2015 for release 1.0 of AOS.

Before that he was at Big Switch Networks, Arista Networks and Shasta Networks. 

Hafeez was a key figure in Arista’s first customer wins on Wall Street.

More impressively still, one of his ancestors was a signatory to the American Declaration of Independence and an architect of the American Constitution. Now  that is impressive management ancestry!