Replace cublas operations with thrust algorithms
Instead of the cumbersome calls to cublas, prefer the usage of thrust algorithms. This most likely won't hurt performance as operations such as the dot product are memory bound anyway, and this way it's just a little less code and easier code to maintain.
I'll do some benchmarks the next couple of days just to be sure that my gut feeling is correct :^)