News:

We have a Twiiter Account for the Team https://twitter.com/BOINCAUSTRALIA

POEM GPU

Started by kashi, December 27, 2011, 12:15:27 PM

Previous topic - Next topic

kashi

Quote from: veebee on December 27, 2011, 09:25:47 AM
... how is all this POEM credit coming in so quickly ???....

Yes Windows POEM ATI on a HD 5870.
Needs BOINC 7.0.2 or 7.0.3 and an OpenCL Catalyst driver. Application is inefficient and only runs at 39% GPU load with my card/driver combo. So I run multiple tasks concurrently with an app_info.xml file. Maximum GPU load of 78% was reached at 4 concurrent tasks, I run 6 concurrently (<count>0.16</count>), it is about the same efficiency as 4 but a bit more rewarding under CreditNew. Currently 227K per day but dropping all the time.

If you want to give it a go here's an app_info.xml for 4 concurrent tasks, ncpus values may need increasing if you go above 4 concurrent:

<app_info>

   <app>
      <name>poemcl</name>
      <user_friendly_name>POEM++ OpenCL</user_friendly_name>
   </app>

   <file_info>
      <name>poemcl_0.1_windows_intelx86__opencl_ati_100</name>
      <executable/>
   </file_info>
   
   <app_version>
      <app_name>poemcl</app_name>
      <version_num>1</version_num>
      <plan_class>opencl_ati_100</plan_class>
      <avg_ncpus>0.25</avg_ncpus>
      <max_ncpus>1</max_ncpus>
      <flops>2.1e10</flops>
      <coproc>
         <type>ATI</type>
         <count>0.25</count>
      </coproc>
      <cmdline></cmdline>
      <file_ref>
         <file_name>poemcl_0.1_windows_intelx86__opencl_ati_100</file_name>
         <main_program/>
      </file_ref>
   </app_version>

</app_info>

veebee

December 27, 2011, 02:59:41 PM #1 Last Edit: December 27, 2011, 05:04:33 PM by Dingo
hmmm,

trying it out now on my i7 with 2 x 5850's, but there are only two WU's (Poem Open CL++) running - one per GPU ( 0.97 cpus + 1 GPU )


(just to be sure, the app_info.xml file DOES go in the BOINC directory in "Program data" folder ??? (though I put it in the Boinc folder in "Program files x86" as well after I noticed it wasn't working.)


I did shut down client abefore/ afetr adding the app_info file, I also did update boinc client to 7.x.x ....

Would like to work this out so I can run it on the 6950, 2 x 5850 and 1 x 4770...


EDIT: so as to not go "off topic" here, we can move this to PM if you want Kashi - thanks in advance..

EDIT:  I moved them :)

kashi

December 27, 2011, 06:19:12 PM #2 Last Edit: December 27, 2011, 07:46:13 PM by kashi
Thanks for moving GPU info Dingo.

Fine tuning an app_info.xml file to suit your number of GPUs and how many CPU cores you have available to support GPU crunching gets easier with experience. With no <cmdline> parameters available for this application yet, there are only 3 values to experiment with:

1. <count></count>  - This controls how many tasks are run on each GPU core.
 
Examples:
<count>1</count>  - default one task per GPU core
<count>0.5</count> - 2 tasks per GPU core
<count>0.33</count> - 3 tasks per GPU core

2. <avg_ncpus></avg_ncpus> - This controls how much of a CPU core is available to support GPU crunching.

Examples:
<avg_ncpus>0.97</avg_ncpus> - POEM application default, 97% of a CPU core
<avg_ncpus>0.5</avg_ncpus> - 50% of a cpu core
<avg_ncpus>0.3</avg_ncpus> - 30% of a CPU core

3.<max_ncpus></max_ncpus> - This controls the total amount of CPU core(s) available to support GPU crunching. Usually for best performance it should represent the number of concurrent tasks specified in <count> value multiplied by the amount of CPU core available to support each task as specified in <avg_ncpus> value. Therefore for a single concurrent task <avg_ncpus> value and <max_ncpus> value is the same but <max_ncpus> value is a multiple when more than 1 task is run concurrently.

Example:
<count>0.25</count> multiplied by <avg_ncpus>0.25</avg_ncpus> equals <max_ncpus>1</max_ncpus>
4 concurrent tasks per GPU core with 25% of a CPU core available per task should have a total amount of CPU available of 1 full CPU core per GPU core.


The usual courtesy of aborting any unfinished tasks in your cache before installing an app_info.xml file does not apply to CreditNew projects lest you invoke the anti cherrypicker mechanism. Here's a copy of the PM I sent to veebee, in case anyone else wants to give the new POEM ATI GPU application a go:
"app_info.xml file should go in "C:\ProgramData\BOINC\projects\boinc.fzk.de_poem" folder. Then restart BOINC. If you have any tasks still in your cache, they will probably be deleted by BOINC when you restart, but it should then download new tasks.

Efficiency may vary with different Catalyst versions, model of GPUs and number of GPUs, so app_info.xml file may need to be slightly different for different configurations. With 2 GPUs you may need to change ncpus value but see how the sample one I posted goes first. That one specifies one quarter of a CPU core per GPU task so a single GPU running 4 tasks would use one full CPU core and 2 GPUs would use 2 full CPU cores.


Good luck."

Edit: POEM ATI OpenCL application has the ATI/AMD OpenCL "busy wait" bug on HD 5970s and will use a full CPU core per GPU core. At least it does with the Catalyst version I am using. Possibly may apply to HD 6990s too if anyone in the team has one of those. Usual workaround using Process Lasso should work if you can't spare the extra CPU core.

LawryB


Thanks Kashi, much appreciated.  Karma is well deserved.

veebee

December 28, 2011, 01:48:56 AM #4 Last Edit: December 28, 2011, 02:10:17 AM by veebee
Yep, thanks Kashi... finally got the machine with the 2 x 5850's going... crunching 4 WU's of goodness per GPU !!! (looks lovely - 8 ATI WU's ticking over at the same time !).

But this main rig, with 1 x 6950 and 1 x 4770, is having all sorts of problems, apart form having what seem to be connection issues (Facebook isn't the "normal page".. it is crappy looking typing text way over on left edge of screen (though B@A Forum etc is OK), Also, I cannot get onto my Credit union site (but can from the computer 10 metres away, and every second  attempt of BOINC to get work etc, is greeted with a mesages about failure to contact the perr or smemthing (should have copie dit before resetting again).

Using client 7.0.3 on bath machines..

Now, even with all other projects suspended and "no new work", I cannot GET work from POEM by asking, and I cannot get POEM to ask for any work ITSELF .... wondering if THIS section of the event log at host startup has something to do with that... especially the bottom two lines -

28-Dec-11 1:36:05 AM |  | ATI GPU 0: AMD Radeon HD 6900 series (Cayman) (CAL version 1.4.1385, 1024MB, 992MB available, 5914 GFLOPS peak)
28-Dec-11 1:36:05 AM |  | ATI GPU 1: ATI Radeon HD 4700/4800 (RV740/RV770) (CAL version 1.4.1385, 512MB, 480MB available, 2150 GFLOPS peak)
28-Dec-11 1:36:05 AM |  | OpenCL: ATI GPU 0: Cayman (driver version CAL 1.4.1385 (VM), device version OpenCL 1.1 AMD-APP-SDK-v2.5 (709.2), 2048MB)
28-Dec-11 1:36:05 AM |  | OpenCL: ATI GPU 1: ATI RV770 (driver version CAL 1.4.1385, device version OpenCL 1.0 AMD-APP-SDK-v2.5 (709.2), 512MB)
28-Dec-11 1:36:05 AM |  | ATI GPU is OpenCL-capable
28-Dec-11 1:36:05 AM | Poem@Home | Found app_info.xml; using anonymous platform
28-Dec-11 1:36:05 AM | Poem@Home | File referenced in app_info.xml does not exist: poemcl_0.1_windows_intelx86__opencl_ati_100
28-Dec-11 1:36:05 AM | Poem@Home | [error] State file error: missing application file poemcl_0.1_windows_intelx86__opencl_ati_100

EDIT:  worked it out... somehow, the program file mustn't have downloaded... I did notice that POEM didnt go from the bottom of the projects up to it's normal place after "Initialization" ... sure enough.. there was nothing but the app_info file in the POEM project folder.... detached and re-attached and bingo !

Problem is, as I couldn't get POEM to ask for work before that, I noticed that WCG seemed to be "stuck" requesting an update... so I detached on this machine... now I cannot re-attch as "project is temporarily unavailable" !! GAAH !


But, thanks Kashi, ( a well deserved + 1 ) as POEM is cranking away now on these two machines ( 4 GPUS...x 4 WU's !!) .. now one final question (you DO seem to have vast amounts of knowledge on this stuff) - does POEM prefer high/er clock speeds ? high/er memory speed ? shader speed ?

Cheers,
VB

kashi

That's great, glad you have it running.  :congrats You'll be zooming on POEM now! Team member Johnny8380 is also running 4 concurrent tasks now. I could have given him the info if he had asked here but he got it off the POEM forum anyway so no problem.

Quote from: veebee on December 28, 2011, 01:48:56 AM
.... now one final question (you DO seem to have vast amounts of knowledge on this stuff) - does POEM prefer high/er clock speeds ? high/er memory speed ? shader speed ?....

No, I just have lots of time to read what others discover. My app_info.xml fu is mainly trial and error, I couldn't even work out how to compile one correctly and just modified the one posted on a German team forum.

As for higher GPU clock speeds I'm not sure, you could try it yourself if you wished.

In the normal case of an efficient application a faster core clock speed would result in more work being done. However this application was probably developed on low or mid range cards and because of that it becomes less efficient (lower GPU load) on higher model GPUs as core clockspeed increases. It is throttled because of a mismatch between the speed of the OpenCL application and the speed the CPU polls the GPU. How do I know this? Because my HD 5970 at 725MHz took the same time to complete a single task as my HD 5870 at 870MHz. This is probably similar to what gamers call a configuration that is CPU bound. The application needs to be modified to more efficiently make use of the higher potential performance of faster GPUs. Either that or <cmdline> parameter options need to be provided to enable tuning the application to suit different hardware configurations and user preferences. Similar to the wait factor and kernel frequency parameters available to Collatz or the --gpu-polling-mode and --gpu-target-frequency parameters available in Milky.

The other thing that affects this is the efficiency varies a great deal depending on which Catalyst driver is used.

The same throttling at higher GPU core clock was evident when the Collatz GPU application was being developed. I had a HD 4890 and other people testing had HD 38xx, HD 4850 and HD 4870. I couldn't get my GPU load higher than 83-87% while people with slower cards were getting 95-99% load. I requested Andreas Przystawik provide me with an optimised application so that I could remedy this by adjusting the <cmdline> parameters and he agreed and Slicker kindly made it available to download. 

As for memory speed I also don't know, I reduced my memory speed to 900MHz from the default 1250MHZ of my card, don't know if it makes a difference. I am running at the default 870MHz core clock of my HD 5870. It is a Vapor-X model so default is 20MHz faster than reference cards. At 78% GPU load it's only pulling about 24A on VDDC, so it is nowhere near as efficient as a CAL application such as MilkyWay or Moo! which pull over 70A at 99% load. Assuming default voltage is being used, higher current is usually a more reliable way to judge how completely the GPU hardware is being used (efficiency) than GPU load %. Lower current usage is great for keeping cards cooler and using less electricity though. My GPU core is running at only 57-59  °C on POEM, on Milky or Moo it would be at 70-75  °C.

I think increasing the core clock in theory should get you more credit however even if it doesn't increase efficiency. This is because CreditNew uses the reported GFLOPS peak value to compute granted credit. Well I think it does at least until it gets averaged, I don't really understand how CreditNew "works", don't have to understand the theory though to know that in practice it is inconsistent, unfair and totally unsuitable to GPU applications. In practice the slower you complete a task the more credit you get with CreditNew so there is not much real benefit credit wise in overclocking your card. If you can manage to increase the efficiency by increasing your GPU speed it may allow you to process a bit more science though.:thumbsup:

It is my hope that Timo introduces fixed credit that encourages efficiency rather than this inconsistent varying credit system that rewards and encourages inefficiency.

If you sort out your work fetch problems with BOINC 7.0.3. let me know how you did it please. I can get work but only a little each time and only when the cache is dry.

veebee

wow... that crazy credit rate didn't last long !!!

down to around 3000 per WU now....  :cry2:  although  must say I DID get a nice chunk of credit overnight from it.

kashi

3,000 per task should give you about 350K per day for your 2 5850s. Seems reasonably generous for an inefficient application, how does that compare to what your 5850s receive per day from MilkyWay, Collatz or PrimeGrid ATI GPU applications?

I'm currently getting 2,800-3,000 per task depending on the runtime. Except for a few super credit spiker tasks for the early adopters, the trend has generally been gradually downwards, although there was a little upward movement earlier today. Any movement in credit rate usually happens with each new batch of tasks downloaded. CreditNew adjusts the Estimated computation size seen in the task Properties according to how long you take to complete a task. As the Estimated Computation size increases with new each batch, the credit amount per task usually decreases. Depends on the average of other contributors too somehow, so more new contributors may slow the rate of credit decrease or sometimes it even increases a bit. Or something weird like that. ???

veebee

The big chunk of credit the other night was from a whole lot of Moo Wrapper credit that "went missing" the day before...

I think I was getting a fair bit more from Milkyway (per day), but work has been a bit rare at Milkyway lately.... too dangerous to just leave IT open and no other GPU project open for work (projects like Moo Wrapper always seem to take over my machines if left to their own devices !).

I was hoping to get a day of just POEM in so as to see what sort of output the 4 cards have, but managed to get Milkyway work this morning, so that little "project" will have to wait.

kashi

Milkyway GPU credit rate is based on SETI credit rate with a 2X multiplier for the extra hardware and processing necessary for double precision. POEM GPU application does not use double precision and has a maximum GPU load of 78% and a VDDC current draw of 25A on my card running multiple concurrent tasks. MilkyWay runs at 99% GPU load with a VDDC current draw of over 70A. At this time POEM GPU credit rate is a bit higher than MilkyWay credit rate on my HD 5870.

Default single task configuration of the POEM GPU application results in low GPU load, less heat, less electricity usage and less interference/sluggishness with GUI and other graphics applications which is good especially for those new to GPU processing. However there is a price to pay for OpenCL inefficiency though aside from the obvious one of less science being processed than with a more efficient GPU application.

High CPU usage with OpenCL GPU applications plus OpenCL System CPU overhead means less CPU resources available to use for CPU projects. Those with only quad core computers are more affected than those with 6 or 8 core computers. The effect is multiplied for those with multiple GPUs. Plus there's the OpenCL "busy wait" bug on dual ATI/AMD GPUs.

Sometimes GPU applications are developed on older or slower GPU hardware and suffer from incorrect GPU polling settings and/or chunk sizes when run on newer, faster GPU models. Even if the fastest hardware available at that time is used to develop the application, newer, faster GPU models can be introduced after the application has been released.

The availability of <cmdline> parameters to fine tune a GPU application to suit different models of GPU with varying performance is a useful courtesy which enables increased efficiency and is valued by keen GPU contributors. Those who purchase an expensive GPU expressly to contribute to a GPU project may be less than impressed with an OpenCL application that runs at 50% GPU load or less.

Such inefficiency can sometimes be partly overcome by running multiple concurrent GPU tasks but that uses much more CPU resources and may have other consequences too.

I enjoy WCG badge hunting but rewarding and encouraging inefficiency is irresponsible, whether it be with Runtime badges or with a variable credit system that has repeatedly shown that it is inconsistent, unfair and subject to wild spikes and dips. The "it should stabilise in a few days" mantra we have heard so many times now is a poor excuse for a system that is fundamentally flawed. Granting 62,666 credits for a single throttled task processed in 30 hours to one person and 2,000 credits or less to someone else for the same size task processed quickly is just wrong, there is no way to excuse it or justify it.

veebee

December 31, 2011, 11:47:21 AM #10 Last Edit: December 31, 2011, 11:54:26 AM by veebee
Well, for some reason I think that is it for POEM on the GPU on this main machine... yesterday, something went wrong and these warning dialoge boxes (small centre screen) kept popping up, something about POEM and an error of some sort (dont you just HATE it when people are too stupid to READ those messages properly when they are trying to fix the problem ??!!  :hbang: ).

Anyway, SOMEhow, that manged to detach (apparently so - no longer in BM) POEM from the client/ manager, and I then had the same sort of problems as I did with WCG.

I made sure that Peerblock was off (now uninstalled) but still no louck - so I turned of Winsows firewall and it allowed me to reconnect with POEM.

After much backwards and forwards (or, to-ing and fro-ing) I finally repeated everything i did to get the GPU CL++ WU's going:

Boinc client version 7.0.3
Uninstalled and re-installed Catalyst





I awoke this morning to see (what I thought was heaps of completed POEM Wu's.. but they were computation errors.

When I try to get new work, I get THIS MESSAGE:

31-Dec-11 11:26:09 AM | Poem@Home | Message from server: Your app_info.xml file doesn't have a usable version of POEM++ OpenCL version.
31-Dec-11 11:26:09 AM | Poem@Home | Message from server: Your app_info.xml file doesn't have a usable version of POEM++.


So I went through it all again... and ended up with a machine that then wouldn't boot past the Windows 7 logo etc, so had to do a system restore.

Have done all the "prep work" again (as above):

- updated BOINC to Version 7.0.3
- shut down the client
- creted new app-info file and put in POEM project folder
-tried detaching and re-attaching to POEM
- restarted boinc (noted two lines for each card in event log at startup,:

* 31-Dec-11 11:33:15 AM |  | ATI GPU 0: AMD Radeon HD 6900 series (Cayman) (CAL version 1.4.1385, 1024MB, 992MB available, 5914 GFLOPS peak)
* 31-Dec-11 11:33:15 AM |  | ATI GPU 1: ATI Radeon HD 4700/4800 (RV740/RV770) (CAL version 1.4.1385, 512MB, 480MB available, 2150 GFLOPS peak)
* 31-Dec-11 11:33:15 AM |  | OpenCL: ATI GPU 0: Cayman (driver version CAL 1.4.1385 (VM), device version OpenCL 1.1 AMD-APP-SDK-v2.5 (709.2), 2048MB)
*31-Dec-11 11:33:15 AM |  | ATI GPU is OpenCL-capable

* 31-Dec-11 11:33:15 AM |  | OpenCL: ATI GPU 1: ATI RV770 (driver version CAL 1.4.1385, device version OpenCL 1.0 AMD-APP-SDK-v2.5 (709.2), 512MB)


- and I STILL get that same message:

31-Dec-11 11:34:20 AM | Poem@Home | Message from server: Your app_info.xml file doesn't have a usable version of POEM++.
31-Dec-11 11:34:20 AM | Poem@Home | Message from server: Your app_info.xml file doesn't have a usable version of POEM++ OpenCL version.
31-Dec-11 11:34:20 AM | Poem@Home | This computer has finished a daily quota of 153 tasks


thought it might be about a new version of the open CL POEM , but the app versions at their web site still look the same as those in the app_info.xml file to me.


Have I missed something really simple ?? I have, haven't I ??...



EDIT: UNBELIEVABLE !  of course, it waits until I have clicked "post" (as I was going through EVERYTHING again as I typed it in...) and as soon as I have posted, I thought I will try de/ re-attaching one more time... and of course it STARTS UP !!!

As someone once said..."I could just scream ...  :furious: )

veebee

Well.... it WAS working, but only 1 WU per GPU... so I realised I had forgotten to put the app_info file back in the project folder.... did it and of course I get that error message when it's trying to get work.. (and also, all the WU's which HAD downloaded were gone.)

One thing I noticed (though it seemed to work fine before..) is what looks to be a double underscore between "intelx86" and "opencl".. :

   <file_info>
      <name>poemcl_0.1_windows_intelx86__opencl_ati_100</name>
      <executable/>

this is driving me crazy...

kashi

January 01, 2012, 03:33:27 AM #12 Last Edit: January 01, 2012, 03:41:23 AM by kashi
Well I can see why many of the later tasks errored. Because they had an earlier BOINC version 6.12.34 that is not compatible with the POEM OpenCL application. When you rolled back to a restore point it must have gone back to an earlier version of BOINC.

Don't know why the earlier ones ran for too long and/or errored though. Without knowing further details all I can think of off hand is using 2 different GPUs, so no crossfire and no dummy plug. With the dual monitor method there is a possibility that the powersaving features of the Catalyst drivers turns off one of the cards and/or reduces the clock and memory speed on the other when you turn off the monitors. This had me stumped for a long time until I started using a dummy plug and it still gave trouble on some projects.

When this happens, all kinds of weird stuff can occur, BOINC can create phantom GPUs so that you can have more than the number you have installed, any tasks running on the GPU that shuts down can start to run on the other one in addition to the tasks already allocated to the still running GPU, the low clock and memory speed of the powersaving GPU state can cause the tasks to virtually stall and take 6-8 hours to run instead of 1-2 hours, when you turn the screen back on everything can be so sluggish and slow that it results in a screen freeze or bluescreen error.

It's one of the reasons I always recommended the use of dummy plugs instead of the dual monitor method.

You may suspect that this may be the case if the problems only occur when the screens are turned off by Windows or yourself. I haven't used 2 separate screens myself but had both cards connected to the one screen with different connector types, so I don't know whether you would need to leave both monitors on or just one. Another way to test would be to remove the HD 4770. If the problem did not reoccur then it's likely to be related to using 2 GPUs that are not crossfired and the use of a dummy plug may be a possible relatively easy fix that could save you a lot of time troubleshooting.

May be just easier to use that computer on other GPU applications if it runs OK on them. Your computer with 2 5850s appears to perform well on POEM and 2 5850s running on POEM is still currently amongst the top ranking POEM hosts and users in daily output.

Double underscore is correct in the application executable. Look in your Task Manager > Processes tab on your other computer and you will see it working there with the double underscore. And you can double check in C:\ProgramData\BOINC\projects\boinc.fzk.de_poem folder if you wish.

Edit: If your GPU drops out for any reason (driver reset, etc.), when BOINC detects no GPU driver and hence no usable GPU it can spontaneously detach you from a GPU project. I had this happen to me a number of times on Collatz and once or twice on MilkyWay when I was using a HD 4890.

veebee

I think I will use it for other GPU work now (this machine).

It just seems strange that it was steaming along, both GPU's firing on all cylinders, but now they wont. (they crunch one WU each, but when I put the app_info.xml file in, I get those error messages).

Anyways, no problem, just wish Milkyway would have a steady supply of work - it is my favourite project of those with GPU apps.

kashi

January 01, 2012, 11:47:37 AM #14 Last Edit: January 01, 2012, 11:50:47 AM by kashi
Milkyway (when available) for that box, good idea, a mixture of two different ATI/AMD cards of different performance may not be ideal for the POEM GPU application.

The HD 4770 may not have sufficient resources to reliably run 4 tasks concurrently like your other cards. Although it works OK on one task at a time, the lesser amount of video memory and/or number of stream processors may not support processing 4 tasks at a time. If the 4770 is running at capacity on 4 concurrent tasks any little glitch or extra load could flip it into producing errors or slowing down unacceptably. You could try 2 at a time if you wished by changing the <count> value to 0.5 in the app_info.xml. With this initial POEM GPU application, running a single task on the Cayman part of that computer is just too inefficient for a keen cruncher.

Yes mucking around with app_info.xml files can be frustrating sometimes. Do anything wrong and whoosh, all the tasks in your cache disappear. If you lose too many tasks on the same platform and then detach and reattach, BOINC server software may flag your computer as unreliable and restrict the amount of work per day it will send until you return valid tasks. Just when you get it all working properly, you can't get a decent number of tasks to process.:hbang: It stops people with misconfigured computers trashing heaps of tasks day after day though.

Yes MilkyWay server has been notoriously unreliable for years and the tiny cache allowed compounds the issue. Fingers crossed that the new MilkyWay server will provide a more reliable supply of work in the future.