News:

If you are a member of the Team on BOINC you still need to register on this forum to see the member posts.  The posts available for visitors are not posted to much by members.
 Remember to answer the questions when Registering and also you must be a active member of Team BOINC@AUSTRALIA on BOINC.

Main Menu

Server Crashes

Started by Vajras, November 11, 2009, 03:38:02 AM

Previous topic - Next topic

Vajras

News

November 7, 2009
Just noticed a problem with the assimilator crashing. It should be back up and running and work should start flowing again.

November 10, 2009
We're looking into why the server went down. Something about unrecoverable disk errors. Hopefully people haven't lost too much credit or anything like that.
--Travis

November 10, 2009
It looks like some serious problems happened. Right now I've turned off all the BOINC daemons until we can get the database restored to a previous backup (which should hopefully bring back a bunch of credit).
--Travis

Server Crash - Part 3
November 10, 2009
We have a backup from this morning, but it may have been taken after all the corruption. We're going to try it out and see if it helps anything.
--Travis

Server Crash - Part 4
November 10, 2009
It looks like I'm going to have to remove all the workunits and results from the database. So if you have any running, feel free to cancel them.

Server Crash - Part 5
November 10, 2009
We've restored the server from the last backup (which was this morning) so hopefully not too much credit has been lost. I still have to purge the database of all the workunits, unfortunately. We also need to order new hard drives for the server, so I'm not sure how stable things will be for the next week or so until we get them installed. But at least hopefully that explains the issues we've been having lately.
--Travis

Server Crash - Part 6
November 10, 2009
In order to save you guys more lost credits, I don't think we'll be starting up new work until we have replacement hard drives. What I've gotten from labstaff is that the drives are running in degraded modes and hurting really bad. They're telling us the reason for the problems has been the construction around campus at RPI which has caused a lot of vibration in the computer labs which has wrecked quite a few hard drives. It seems we're not the only ones having similar issues. Hopefully we can have new hardrives in a day or two and get things back up and running.
--Travis

An Apology
November 10, 2009
We also really want to apologize for all the recent server issues and lost credit. Hopefully you'll all still be around when we get the server back up and work flowing again. I'll post more as soon as I know about hardware orders and what's going on.

[edit: Update]

    11/11/2009 5:06:38 a.m. Milkyway@home Reporting 24 completed tasks, requesting new tasks for GPU
    11/11/2009 5:06:46 a.m. Milkyway@home Scheduler request completed: got 0 new tasks
    11/11/2009 5:06:46 a.m. Milkyway@home Message from server: Server error: feeder not running

We won't be generating new workunits until we have working hard drives again. The ones running the forums are pretty crippled. If we started generating work again they'd most likely just crash.

It'll probably be a couple days until we get the new hardware and have work flowing again.
--Travis

:faint:

Dataman

 biggrin It's not just a project ... it's an Adventure.  biggrin
I do like the science this project is trying to do. I use to get angry due to their outages but now I just feel sorry for them. They have little expertise, almost no staff and crappy equipment. It would be good if their project was to be hosted somewhere like WCG or Yoyo where it could be administered correctly. I stopped running it because I cannot monitor it hour by hour to see if it is still running.  :hbang:


Vajras

News

Update on Harddrives
November 15, 2009
Just letting everyone know we ordered new hard drives for the server last week, and hopefully they will be here soon. We're hoping to have everything back up and running within the week.
--Travis


Message 33316 - Posted 16 Nov 2009 18:15:49 UTC

Just wanted to say that I will be doing a post-doc here at RPI until at least the end of next summer. So MW will be around until then at the very least. :P
I should be able to get someone new up to speed by then, and I may be around even longer depending on funding and how my job search pans out.

Either way, MW has been a great source of publications and a wonderful research project, so I expect to keep working with it as long as they let me :P
--Travis