![]() In the meantime, to get the Tesla P100 in the hands of researchers that do a lot of the foundational work on deep learning, Nvidia has built its own server, which it is billing as the "world's first deep learning supercomputer." The DGX-1 is a 3U server with two 16-core Xeon CPUs, 512GB of memory, eight Tesla P100 GPUs, 7TB of solid-state storage, and dual 10Gbps Ethernet and 100Gbps InfiniBand ports. Cray, Dell, Hewlett-Packard Enterprise and IBM are also building Tesla P100 servers, which will be announced later this year and start shipping in the first quarter of 2017. The Tesla P100 is already in volume production and Nvidia has started delivering it to key hyperscale customers that build their own servers. The GP100 also improves on the unified memory model in CUDA 6 by allowing programs to access all of the memory in the CPUs and GPUs in the system as a single virtual address space simultaneously while maintaining coherency without a big performance hit. NVLink can also be used to connect the GPUs with IBM Power CPUs in servers (Nvidia is part of the OpenPower consortium). The Tesla P100 has four 40GB per second links for a total of 160GB per second of bidirectional bandwidth between GPUs. The Tesla P100 also uses a new interconnect called NVLink which provides better performance than PCI-Express 3.0 in workstations or HPC clusters that use multiple GPUs (most deep-learning algorithms use four or eight GPUs for training). HBM2 also supports error correction, a key requirement for many HPC applications. The Tesla P100 has four of these stacks for a total of 16GB with 720GB per second of peak bandwidth. Each stack consists of four 8Gb (1GB) memory chips each with 5,000 TSVs (through-silicon vias) to connect them to each other and to the rest of the system. But Nvidia is the first to use second-generation HBM, which delivers higher capacity and greater bandwidth (AMD had once planned to release a Polaris part with HBM2 this year, but an updated roadmap from the recent Game Developers Conference shows this has been pushed back to Vega in 2017). Rival AMD was the first to introduce High Bandwidth Memory (HBM) in its Radeon R9 Fury X consumer cards based on the 28nm Fiji GPU. The GP100 also has more cache and 14MB shared register files, as well as significantly more bandwidth (80TBps), which means it can handle larger jobs more efficiently. That's nearly 2.5 times the performance of the current Tesla K80, which is manufactured on a 28nm process (Nvidia skipped 20nm) and uses two Kepler GPUs. The 3,584 FP32 CUDA cores can also be used in FP16 half-precision mode, which is sufficient for most deep-learning tasks, and pushes the performance to 21.2 teraflops. ![]() The result is peak performance of 10.6 teraflops single-precision and 5.3 teraflops double-precision. "The odds of this working at all is approximately zero," Huang joked.īased on what is known internally as the GP100 GPU with 60 streaming multiprocessors, the Tesla P100 uses 56 of these SMs, each with 64 FP32 (32-bit) CUDA cores and 32 FP64 (64-bit) CUDA cores clocked at 1.3GHz though it can also burst a bit higher. To make things a bit more challenging, it also includes four stacks of high-bandwidth memory-16GB in all-in the same package using foundry TSMC's CoWoS (Chip-On-Wafer-On-Substrate) technology. The Tesla P100 isn't the first Nvidia GPU to use an advanced 16nm manufacturing process with 3D FinFET transistors and the new Pascal architecture-the company announced its Drive PX 2 for self-driving cars at CES earlier this year-but it is by far the most complex with 15.3 billion transistors on a chip measuring 610 square millimeters. And no product is more indicative of that than the company's new Tesla P100 GPU, a big bet that took three years, thousands of engineers and some $3 billion in investment. This is why Nvidia has gone "all in" in on deep learning, as Huang said repeatedly. Examples of this progress in the past year including Microsoft's work on image recognition with the ImageNet database, Berkeley's work on robotics, Baidu's speech recognition services, and most recently Google DeepMind's AlphaGo. In many cases, deep learning is now surpassing the capabilities of humans. The combination of lots of data, better algorithms and powerful GPUs has led to a big bang in modern AI. ![]() "We think this is a new computing model, a fundamentally different approach to developing software." "I think we are going to realize looking back that one of the biggest things that ever happened is AI," CEO Jen-Hsun Huang said in his opening keynote at this year's GPU Technology Conference. ![]() But now Nvidia thinks it has found its killer app in the form of deep learning. It started with niche high-performance computing applications such as seismic data processing for oil and gas, fluid dynamics simulations and options pricing. The idea of using GPUs for more than just fun and games is nothing new.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |