News:

Members can see all forum boards and posts. Non members can only see a few boards.  If you have forgotten your password use this link to change it.
https://forum.boinc-australia.net/index.php?action=reminder

Main Menu

Running GPU projects concurrently

Started by dyeman, February 25, 2013, 10:15:14 PM

Previous topic - Next topic

dyeman

The way that the HCC GPU WUs work is challenging:  They alternate between being 100% CPU bound and highly CPU bound.  In the case of NVIDIA, a full CPU core is dedicated to the WU at all times (though I suspect it isn't really being used much when the GPU itself is being used).

Typical HCC WU looks to me like this:

20 secs or so CPU
GPU use (depends of power of GPU - say 1 - 10 minutes??)
50 secs or so CPU
GPU use (depends of power of GPU - say 1 - 10 minutes??)
30 secs or so CPU

Looks like there are actually 2 GPU WUs packaged together, each with GPU activity sandwiched between 2 periods of intensive (CPU only) activity.  This makes it difficult to maximise the use of the GPU.  One approach is to run multiple HCC WUs concurrently using APP_INFO.XML or (new) APP_CONFIG.XML BOINC config files so that the CPU-intensive parts of a WU hopefully overlap with the GPU-intensive parts of another WU.  This can help, however for me, the multiple concurrent WUs often end up being synchronised.  Also, running multiple WUs concurrently wastes CPU resources that could otherwise be doing productive work on CPU-only projects (WCG or otherwise).  Then add to this that the credit for HCC GPU work is abysmal compared to other GPU projects (GPUGRID, POEM, PRIMEGRID, DISTRRTGEN etc etc).  

I started experimenting with a way to allow a GPU to remain highly utilised when running HCC WUs, using the new app_config.xml mechanism.  Unfortunately we had a 36 hour blackout in the Coffs Harbour area last week following which I had to travel and couldn't continue with the experiments.  So I'm putting out my findings so far and hopefully you guys will be able to work out whether any of it works or not

The key to the idea is the app_config.xml file available in current beta BOINC versions (7.0.40 or later - I used 7.0.52 beta version in my tests and it is stable and reliable for me).  The app_config.xml is a simple configuration file that allows you to specify CPU use and maximum concurrent tasks for any given application within a project. So I figured you can use this to run a mix of tasks (especially a mix of CPU and GPU imtensive tasks concurrentlt to try to ensure that the GPU remains highly utilised.  For example, when HCC WUs are running in their CPU-intensive stages, other GOU WUs can run to keep the GPU active.

I got as far as verifying that the approach works (at face value - WUs don't fail) but not whether there were any major "gotchas's" that make it not useful/worthwhile.  I'm sure that there are some of you that will answer the question as to whether this approach works or not!

There are 2 key parameters in app_config.xml:  max concurrent and GPU Use:
<app_config>
  <app>
     <name>uppercase</name>
     <max_concurrent>1</max_concurrent>  **************************
     <gpu_versions>
         <gpu_usage>.5</gpu_usage>  ***************************
         <cpu_usage>.4</cpu_usage>
     </gpu_versions>
   </app>
</app_config>

My idea is to set a low value for GPU_USAGE and instead use MAX_CONCURRENT to control what runs.  For example, set GPU_USE = .1 for all GPU apps (theoretically allow 10 GPU apps to run concurrently) then use MAX_CONCURRENT to control the mix of applications.  For example, set MAX_CONCURRENT to 1 or 2 for HCC, combined with 1 for DISTRRTGEN, on the assumption that when HCC is in CPU-intensive mode, distrrtgen will use the GPU.  I tried this, setting HCC to one or 2 concurrent and running distrrtgen to 1 concurrent (and also POEM to 3 or 4 concurrent) in various miixes. Seemed to work fine (WUs from different projects ran concurrently, completed, and didn't crash), but I didn't have the time to figure out if it was overall worthwhile in increased device utilisation and throughout, and I don't have access to be able to cut and paste APP_CONFIF.XML sample here)

Since I can't continue with my research for now, I hope anyone interested can take up the cudgels!  My 2 remaining PCs alive after the weekend blackout are running with a mix of HCC and POEM and/or DISTRRTGEN WUs running on my ATI cards (not bothering with HCC on NVIDIA - uses too much CPU..)

Hope this makes sense!

>> addenda:

Just to clarify that the max-concurrent applies to all resources for a given named application - in the case of HCC, this is CPU + AMD GPU + NVIDIA GPU.  So need to consider the execution of  HCC on CPU when running HCC on a GPU.  For me, I selected non-HCC projects only for mu CPU resources..)

Folding Stats

LawryB


Thanks Dyeman that is very interesting +1


kashi

#2
That's an interesting way to do it. Thanks dyeman for sharing your crunching knowledge.

I haven't got a more recent model AMD card with the GCN architecture so WCG HCC tasks run relatively much slower on my HD 5870. This means that multiple HCC GPU tasks are less likely to be always CPU phase synchronised because the time spent GPU processing on each task is longer than the time spent in the CPU only phase of the task.

I only run 4 concurrent GPU tasks and I start them at staggered times so they are not synchronised. They do drift back to being in synch but it takes a while and I readjust as needed every few hours. If they end up synchronised without my interference I just cop it sweet with the longer task runtime. I haven't been running any GPU work during sleeping hours for a long time now as the noise is disturbing, especially on projects where GPU processing stops and starts.

I understand that on faster GCN AMD cards the GPU only processes for a very small percentage of the total HCC task runtime. I hadn't thought about how frustrating that may be for those who wish to extract the maximum usage and efficiency from their crunching GPU. I was aware that the inefficiency and very low credit rate of the HCC GPU application would come as an unwelcome surprise for those only used to other GPU projects and have posted about it in the past but decided it was more diplomatic not to mention it recently. Besides, I'd be a sad sack if all I did was whinge, disappointing yes, but  still much faster than doing HCC on the CPU alone, so still worthwhile  for speeding up medical research.;D

HCC really is a GPU assisted CPU application rather than a traditional GPU application which is why CPU clockspeed and IPC has such a large influence over task total runtime.

By accident I have tried running different GPU applications concurrently. It may work OK on GCN architecture as it has special features to enhance different GPU applications running at once but it didn't work on my Cypress HD 5870. The tasks ran and completed without error but one of them was starved of resources and slowed down unacceptably.

I had forgotten completely about the busy wait bug/feature of the NVIDIA OpenCL drivers on HCC. That would impact HCC crunching on NVIDIA cards a lot especially where the CPU has fewer cores.

For the AA, I was about to pull the trigger on a $263 Sapphire HD 7870 XT which is a cut down Tahiti class GPU (Tahiti LE). Thought it may be quiet enough to run overnight but decided I'd spent too much on computer stuff so bought a few cardigans and jumpers from Wool Overs and another cheap powerline networking doover instead. Have 3 EOP units now, they are great for linking the modem, computer and WD TV Live, so much better than wireless performance which is often erratic in this house.:thumbsup:

JugNut

#3
Thanks Dyeman food for thought.
   
Actually i've been finding a similar thing a pain in the you know where.   For example last night I decided to finish up on POEM GPU WU's & start doing some WCG HCC ones to get ready for the AA, so before the POEM finished I download a bunch of HCC GPU units & let them sit there waiting to run with the idea of when the POEM finished the HCC would start,  but when I look in some time later the POEM GPU WU's + HCC GPU WU's units where running mixed together & crunching away on the one graphic card,  with some of the POEM WU's still crunching from over 1 1/2 ago.(3 times as long as normal) 

Both POEM & HCC GPU WU's are using using the app_config.xml for multi unit crunching. (only does this behaviour with app_config)

So now I have to suspend any work or they'll mix together,  increasing manual fiddling & if I forget to un-suspend them while out or asleep I can miss hours of work time because the next WU's are suspended. 

Maybe as Dyeman is suggesting (I think) if you use the app_config to control exactly how many run at once it may work out but would take a fair bit of time & effort to figure out the right mix as to what would be optimum for each person graphic card.  Still if you get it to work as desired it would be fantastic. 

Please keep us informed as to any progress,  that's some great lateral thinking that's for sure.

Mmm maybe the <max_concurrent> parameter can help me also to stop this happening when I don't want it to..

I'll fiddle later when I get some time.


 - Participated in AA's 27 - 55 & Team Challenge # 1.
My team (Boinc@Australia) stat's
My personal stat's


     Crunching today for a better tomorrow...