0

EDIT4: The problem seems to be much larger and I will be refraining from further investigating this type of EnsembleProblem on GPU. Below is the last working code (that has nothing to do anymore with the actual problem I want to solve) and what to do to lock up the GPU at 100% (it can be reset though), even after 10 minutes it has not finished a simple batch of 1000 solutions. The actual issue stays unsolved but I will pivot to my mulithreaded CPU solution, takes a bit longer for the values I need but at least I know how I can debug it.

using DiffEqGPU, DifferentialEquations, StaticArrays

function sys_gpu!(u, params, t)
    du1 = params[1] 
    du2 = params[2]
    return SVector{2}(du1,du2)
end 

function plateu_cycle_study_gpu()
    plateu_cycle::Float32 = 8.0f0
    w::Float32 = 0.34888f0
    tstart::Float32 = 0.0f0

    tend::Float32 = 2.0f0pi/w * (plateu_cycle+1.0f0)+1.0f0
    tspan = (tstart, tend) 

    params= @SVector [w, plateu_cycle]
    f0=1.0f0
    g0=1.0f0
    init_cond = SVector{2,Float32}(f0, g0)
    prob = ODEProblem(sys_gpu!,init_cond,tspan, params)

    plateu_cycle_end = 10.0f0
    amount = 1000
    plateu_cycle_study_values = collect(range(zero(Float32), plateu_cycle_end, length=amount))
    
    new_tend =  @. 2.0f0pi/w * (plateu_cycle_study_values+1.0f0)+1.0f0
    new_tstart = zeros(Float32, size(new_tend))


    function prob_func(prob, i, repeat)
        remake(prob, p=SVector{2}(prob.p[1], plateu_cycle_study_values[i]))
    end

    plateu_cycle_study_problem = EnsembleProblem(prob, prob_func=prob_func)
    @time sim = solve(plateu_cycle_study_problem, GPUTsit5(), EnsembleGPUKernel(0), trajectories=amount)
end
plateu_cycle_study_gpu()

After letting Julia completely recompile the code one can rewrite the remake line into

remake(prob, tspan=(new_tstart[i],new_tend[i]), p=SVector{2}(prob.p[1], plateu_cycle_study_values[i]))

results in locking up a 1080 strix according to the GPU Tweak III software. EDIT3: Versions of currently used packages are:

  [f68482b8] Cthulhu v2.8.5
  [071ae1c0] DiffEqGPU v1.26.0
  [0c46a032] DifferentialEquations v7.7.0
  [5ad8b20f] PhysicalConstants v0.2.3
  [91a5bcdd] Plots v1.38.8
  [90137ffa] StaticArrays v1.5.17

One can produce a dynamic function invocation error by changing the inital conditions into complex numbers easily (f0, g0 and init_cond declerations need to be changed). Which might have been one clue to the actual issue.

derdotte
  • 81
  • 7

2 Answers2

1
function nu(t::Float64, plateu_cycle::Float64, w::Float64, px::Float64, py::Float64)
    return -1im * e * Vectorpotential(w*t,w,plateu_cycle)*exp(2im*Energy*t) * ( px*py /(Energy*(Energy+1))+1im*(1-py^2 / (Energy*(Energy+1))))
end
function kappa(t::Float64, plateu_cycle::Float64, w::Float64, py::Float64)
    return 1im*e*Vectorpotential(w*t,w,plateu_cycle)*py/Energy
end

Those closures are not allowed in GPU kernels. But they also aren't necessary. Just calculate the values directly and it should work.

Chris Rackauckas
  • 18,645
  • 3
  • 50
  • 81
  • I did not know about the closures. It did not fix the issue. I have since updated the post above with your first fix. It still fails at the same spot with the same stacktrace. I suspect there is still something else. – derdotte Mar 15 '23 at 13:08
  • Could it also be that my calls to Vectorpotential() are similarly not allowed? Would I need to inline this myself? – derdotte Mar 15 '23 at 13:20
  • I have since checked my other function calls as i am quite unsure what is throwing above's error. Inlined literally everything that is possible and the error still occurs. – derdotte Mar 16 '23 at 17:48
  • Delete a few lines, which line deletion ends up making it go away? – Chris Rackauckas Mar 17 '23 at 14:38
  • What package versions? – Chris Rackauckas Mar 17 '23 at 14:45
  • I have added the current versions to the main post as EDIT3, currently still checking when it might go away – derdotte Mar 17 '23 at 15:11
  • Thanks for your help @Chris-Rackauckas it seems there are more major issues that I just have no motivation anymore to debug. I have however still updated the post above with what i had found and is probably something to invest as a contributor to the libraries involved. – derdotte Mar 17 '23 at 17:28
0

If you try to run your code with the following (with fixed time-stepping):

sim = solve(plateu_cycle_study_problem, GPUTsit5(), EnsembleGPUKernel(0.0), trajectories = amount, save_everystep = false, adaptive = false, dt = 0.01f0)

It successfully compiles and solves instantly.

I tried to do some debugging, and it seems that using adaptive time-stepping with different time spans cause longer wait times, because of different amounts of work/thread, which is probably causing warp or thread divergence, causing a lock (I am not completely sure though). Fixed time-stepping ensure constant work/thread, although being not the optimal in respect of time-stepping.

I'll need to dig in more sometime and find a fix. Mind filing an issue to DiffEqGPU.jl ?

  • Thats really interesting. I am not sure I fully understand the Issue at hand. Therefore I am unsure wether I am qualified enough to head a bug report. Did you do it with the "edit 4"-code, while also implementing the remake line I posted? Or with the old code I had posted many edits ago? – derdotte Mar 20 '23 at 20:03
  • You can paste your example and describe your issue. I tried with your edit 4 code and with the remake line you posted separately. – Utkarsh Mar 21 '23 at 00:59
  • I have opened up an Issue: https://github.com/SciML/DiffEqGPU.jl/issues/247 – derdotte Mar 21 '23 at 07:45