NVIDIA – Clear as Mud – Hyper-Q and Dynamic Parallelism on Laptops

Until now NVIDIA CUDA’s powerful new Hyper-Q and Dynamic Parallelism features were only available on Tesla Kepler K20 and some Quadro K-series cards.  Geforce cards do not support it.  The main reason for this apparently is that these functions require features only available on some Kepler architectures.  However that is not the full story since the Geforce Titan does supposedly support Hyper-Q and Dynamic Parallelism but not dual copy engines or GPUDirect RDMA as we’ll see below.

Now that is rather disappointing news if you want to experiment with these features on a lower cost platform, but there is an alternative.  Hyper-Q and Dynamic Parallelism will be available on certain laptops with the GK208 chip.  And it is not only expensive laptops.  Likely there will be an Acer model coming in around $700.  Why is this?  Many programmers develop their CUDA algorithms on a laptop and then move to the large expensive (and very likely shared) Tesla workstations later.  I’m not clear on whether dual copy engines will also feature.  Probably not.  Titan also misses this powerful ability to transfer data to and from the GPU concurrently.  For the right algorithm it can make a big difference.  I’ve got a GTX680 here which wipes the floor with the Quadro 4000 in terms of single floating point processing, however when there is a lot of data transfer the advantage can switch around.  Dual copy engines are actually available on almost all NVIDIA cards, only it is not enabled.  In some earlier series cards it could be enabled by hacking the firmware.  In some Kepler cards it and other Tesla features can be enabled by unleashing a soldering iron on your board.

With NVIDIA, accurate information is about as hard to get as water in the desert.  There is more misinformation flying around than during the Iraq war.  Supposedly the GT730M will have GK208.  Except when it has a GK107.  Then there is GPUDirect.  This marketing word when in the (Linux only) GPUDirect RDMA flavour (as opposed to the GPUDirect for Video variant) refers to the ability to make use of a little used PCI-Express features for peer to peer data transfers.  This is a fantastically important feature, allowing data to travel between GPUs, network cards and other PCIe devices without burdening the CPU and system memory.  At GTC2012 this was said to be a Kepler feature and all that was needed was CUDA 5.0 and the latest drivers.  You could then adapt your own device drivers to support this.  In our case that was for an FPGA card, so we could DMA directly to the GPU and thus reduce latency and load on CPU.  Then without warning this support was removed from all but the Tesla K20.  I think.

One piece of info I got from a source at NVIDIA is that they do not want the support headache of having massive numbers of (Geforce) people using advanced features like these.  So they restrict to their own made high end cards.  For GPUDirect RDMA I can see some logic in this, in that you are also reliant on correctly modifying the driver of another card and the motherboard architecture is also important.  Making use of a PCIe bridge is best, like a PLX device.  If you only have the PCIe switch of the CPU then it may not work at all.  So complaints to NVIDIA could actually be the fault of motherboard, motherboard drivers, chipset, 3rd party PCIe device, 3rd party made NVIDIA GPU etc – too many variables.  Unfortunately the Tesla K20 and Titan cards are too power hungry for many non-high performance computing applications and a low cost Geforce would work fine (think embedded systems).

Will GK208 make it to desktop GPUs?  And more so, will it have these K20 features enabled?  Depends who you read.  Here is a GT630 with the GK208?!?! The moral of the story is actually that NVIDIA needs to get clear on what cards support what.  We need clear information and guarantees.  Prancing around smugly announcing the latest super computer populated with their CUDA cards is not enough.  The rest of us mortals are generally left speculating and making fools of ourselves in front of our customers after trying to convince them to leap into GPGPU.  End users are right to be concerned about product road maps, product lifetimes, driver and feature supports.