News:

Main Menu

app_config issue? Hardware issue?

Started by chooka03, March 02, 2019, 11:38:15 AM

Previous topic - Next topic

chooka03

Got an interesting issue here. I noticed yesterday that regardless if it's Einstein@Home or Milkyway, one of my WU's has dropped off. let me explain -

So if I use 0.33 in the config file, 3 WU's should run on each card. (This is my T/R system with the 2 x Vega 56's.) but Card (0) will run 3WU's but card (1) will only run 2 WU's. Even if I change the setting to 0.25, card (0) will run 4 WU's but card (1) will only run 2WU's??
Changing the CPU usage makes no difference.
Run "read config file" makes no difference.
Restarting BOINC makes no difference.

I thought about removing the project and then installing it again but it seems odd to affect both projects so I doubt that will help. I might also update my drivers but I'm concerned it could be one of the cards. I have had issues with one of them lately. It crashes under high temps. (I know this because I swapped PCI-E slots and it was the same card failing, also with the air con running, it doesn't crash) I will pull it apart this weekend if I get time to clean the dust out of it etc.

Anyone got any ideas other than what I've mentioned?

Edit - Well driver update made no difference. :thumbdown:

Dingo

I found this in google:
Quote
By default BOINC only uses the best GPU if more than one of the same brand is found. The best being decided by (decreasing priority):

Nvidia

compute capability
software version
available memory
speed

AMD

double precision support
local RAM
speed



Are you running too many CPU tasks and not enough threads for the GPU ??

Do you have a cc_config.xml in the C:\ProgramData\BOINC folder, the  <use_all_gpus>1</use_all_gpus> says to use all GPU's ?

<cc_config>
  <options>
    <use_all_gpus>1</use_all_gpus>
  </options>
</cc_config>

Make sure both cards are on tthe same driver version if they are the same make.

Just stuff I thought of off the top of my head.






Radioactive@home graph
Have a look at the BOINC@AUSTRALIA Facebook Page and join and also the Twitter Page.

Proud Founder and member of BOINC@AUSTRALIA

My Luck Prime 1,056,356 digits.
Have a look at my  Web Cam of Parliament House Ottawa, CANADA

chooka03

#2
Thanks Dingo.
These are the same cards I've been running for months, years even. If I cut back CPU usage to 50%, it makes no difference.
In fact...things just got even weirder! So I left it running E@H (because the stupid Milkyway website is down as per usual) and I left card (0) running 3 WU's and card (1) running2 WU's. Well one of the card (0) WU's looked like it was going to get stuck (over 1 day to complete the WU), so I went back and changed the config file back to 0.5 (2 wu's running per card).......but now I have devices 0, 1, 2 & 3!!  :bloodshot what the!!
Oh...now I have a BSOD.

Wow.......something is really messed up.

I see there's some Windows updates. I'll try that then maybe  start re installing things.   :fingers


I don't think I have that config file Dingo. Does it just go in the generic BOINC folder? No the project folders? And if I have 2 GPU's do I change the 1 to a 2?

chooka03

Also, that config file, do I save it as app_config.xml?

I just removed Einstein@Home and reinstalled it. It ran with only 1 WU running. As soon as I loaded a config file with 0.5, it started crunching 5WU's.  2 on device (0), 2 on device (1) and 1 on device (2)??
Bizarre.

chooka03

Here's another oddity, I have BOINC set to "Won't get new tasks" for Einstein... yet when I abort the WU's and update...... it downloads more WU's??
Maybe I need to uninstall BOINC!

Dingo

Quote from: chooka03 on March 02, 2019, 01:06:57 PM
Thanks Dingo.
These are the same cards I've been running for months, years even. If I cut back CPU usage to 50%, it makes no difference.
In fact...things just got even weirder! So I left it running E@H (because the stupid Milkyway website is down as per usual) and I left card (0) running 3 WU's and card (1) running2 WU's. Well one of the card (0) WU's looked like it was going to get stuck (over 1 day to complete the WU), so I went back and changed the config file back to 0.5 (2 wu's running per card).......but now I have devices 0, 1, 2 & 3!!  :bloodshot what the!!
Oh...now I have a BSOD.

Wow.......something is really messed up.

I see there's some Windows updates. I'll try that then maybe  start re installing things.   :fingers


I don't think I have that config file Dingo. Does it just go in the generic BOINC folder? No the project folders? And if I have 2 GPU's do I change the 1 to a 2?

cc_config.xml goes in the C:\ProgramData\BOINC folder, not the project folder and the  <use_all_gpus>1</use_all_gpus> says to use all GPU's 1 means all


Radioactive@home graph
Have a look at the BOINC@AUSTRALIA Facebook Page and join and also the Twitter Page.

Proud Founder and member of BOINC@AUSTRALIA

My Luck Prime 1,056,356 digits.
Have a look at my  Web Cam of Parliament House Ottawa, CANADA

chooka03

Thank you Dingo.
Man.... Milkyway@Home are really giving me the $&(^$ at the moment. The website has been down ALL day. No WU's uploading which is preventing me from uninstalling & re installing BOINC  :compbash:
Einstein is messed up (or my GPU's.) My other PC's are running fine. It Runs ok for a bit then I get a BSOD - "DCP Watchdog Violation"!  :faint:

Oh dear.

tazzduke

Hi Chooka03

I know I am late to the party, but are you using the AMD drivers or has WIN 10 done an update and given you the Microsoft Device Drivers.

Try using a little utility called DDU, download it from Guru3d.

Then reload with the latest drivers from AMD.

Cheers



 AA 24 - 53 participant

chooka03

Hi Tazzduke.

I actually updated the drivers but then rolled them back to the previous drivers 19.1.1 and there's been no change between the 2. One thing I have noticed, I basically had to remove Einstein from my BOINC projects because even with no tasks set, it continued to download work. I couldn't stop it!
After re installing Einstein, I haven't used any config file yet. Only One GPU is crunching. Shouldn't both cards be crunching 1 WU each?
I have copied and pasted Dingo's cc_config file and saved it as a .XML and placed it in the BOINC folder.

I was using this for the record -

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>0.5</gpu_usage>
            <cpu_usage>0.2</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

Dataman

Quote from: chooka03 on March 02, 2019, 08:02:45 PM
Thank you Dingo.
Man.... Milkyway@Home are really giving me the $&(^$ at the moment. The website has been down ALL day. No WU's uploading which is preventing me from uninstalling & re installing BOINC  :compbash:
Einstein is messed up (or my GPU's.) My other PC's are running fine. It Runs ok for a bit then I get a BSOD - "DCP Watchdog Violation"!  :faint:

Oh dear.
Yes, it does seem to be Einstein causing the DCP-WATCHDOG-VIOLATION. They also had some upload/download problems but I think they have fixed that. I am running MW/Asteroids/Einstein on the GPU's (except the RTX) and the BSOD always happens when Einstein is running. Only happens to me about once a week but I have got it on two of the threadrippers. It is a GPU driver problem but I am running the current nVidia driver. Go figure?

MW is seriously FUBAR now with daily outages; Asteroids is out of work; SETI site is down too. What's a cruncher to do?  Bashhead


chooka03

Oh Dataman..... don't get me started on Milkyway!  :boom:
Long story short...... I've had to wipe C: drive AGAIN and start over. I lost ALL my Milkyway WU's..... days worth!  :cry I'm so unhappy about that! What a waste. All because their stupid website/database is continually down.

Good news is, (touchwood) the issue is now resolved. I can now run 2 WU's per card again.
That was all very painful. DDU removed all the drivers but then for some reason, the new Radeon install couldn't find any hardware. After that, Windows failed to launch like last time. It could repair after a few system restores and I had to give up. Full reinstall again. :thumbdown:

Thank you for the info on the watchdog violation DM. i didn't know about that.
Lets see if a fresh install of o/s, boinc & GPU drivers fixes anything :/

I need a holiday.

Dataman

#11
Quote from: chooka03 on March 03, 2019, 02:01:11 PM
Oh Dataman..... don't get me started on Milkyway!
Me too!
MW is my favorite project and I have supported it almost from the beginning. When the last batch of grad students departed to the "real world" things started to fall apart. They announced Jeff as their new computer guru but it has just got worse. RPI is a big US university and it is hard to believe they cannot acquire a decent storage array as that seems to be the problem. Maybe they should have a car wash or a bake sale.  :rofl:
I though running MW/Asteroids/Einstein would insure the GPU's always have work but MW is FUBAR, Asteroids runs out of work weekly and BSOD's on Einstein. Maybe I should put all 15 on Collatz. :shock
Oh well ... tomorrow is a new day.  :wink


chooka03

Oh for goodness sake..... is the Einstein work generator down? I'm now getting no WU's  :compbash:
Looks like it.

I'm just not meant to crunch any GPU tasks am i lol  Bashhead

Dataman

Ha ha, the new day is worse than the old one.   :wink

Yes, Einstein has no GPU work. SETI is back up but sloooow.

Well mates when one gets a bag of lemons one makes lemonade. I turned on Collatz on all GPU's.   :boom:

:panic:


chooka03

YEEEEEEEEE HAAAAAAAAA  (Dam...no emojicon with a lasso)  :jester:
At least you can test your max GPU output on Collatz Dataman.

I went onto Primegrid for my AP gold badge. I need more golds and it's one of the few GPU wu's on Primegrid. I would have done SETI....... might do that once I get my gold...but by then I imagine some of the other projects might be back up.

:AUS: :USA1