The Instructianal email/SquirrelMail server (https://imail.eecs) and the SVN server (https://isvn.eecs) were down from about 8am-11:30am. Email that is sent to the mail server during that time is queued by the sender for later delivery. Some workstations in the labs also denied logins. These problems were on systems that had not been fully decoupled from our LDAP service, which went down again this morning. We are working to snuff out the remaining dependencies on LDAP.
(10:30am) This work has been completed. "inst.eecs" has been moved to a new server. We we plan to shut down this WEB server down for maintenance for a few minutes between 10am and 10:30am tomorrow (Tuesday Nov 18). This is to swap the disks to another server, in response to several recent crashes of the existing server. We suspect from the kernel logs that it's caused by bad memory, or possibly a bad motherboard. The server is a Dell 1850 circa 2007. The server rebooted itself 3 times on Nov 17 (9:30am, 1pm, 4pm) and once last Friday (6pm), with downtimes of 10-30 minutes each. This was unrelated to the LDAP problem (below), which has not reoccurred since Nov 12.
Nov 14: For an analysis of the severe downtime events this semester, please see "Analysis of the Repeated Downtime Events" in https://inst.eecs.berkeley.edu/~inst/reports/?file=Fall_2014.pdf. Please post any questions to email@example.com. Thank you for your patience.
Nov 13: We still have a delay in changing passwords, so we recommend that you keep using the one you have for another week or so (we'll update that here.) Nov 12: (12:30pm) The EECS Instructional UNIX systems are stable again, after several weeks of periodic downtime. If you were unable to login to any of our UNIX systems recently, please try it again. If it still fails please tell us (firstname.lastname@example.org) which computer you are trying. Our password server has been down a lot lately, and that confuses people into thinking they have forgotten theirs. We are updating local password files on our computers for the time being. We are testing a new LDAP service (retiring SUN LDAP, impementing OpenLDAP) and will order new servers for it. The current servers are circa 2002, which has contributed to their instability. See below for the symptoms and history of the problem.
(10:45am) An ongoing problem with NFS is preventing the course WEB pages from being accessed through this WEB server. (12:30pm) This problem has been fixed.
The http://inst.eecs WEB server was down from about 9-10:30pm tonight because the inst.eecs server rebooted itself. This seems to be unrelated to the LDAP problem (below), which has not reoccurred since 10:30am today.
(Tue Nov 11) The LDAP server went down again this morning and, although we thought we'd eliminated the dependence of our computers on it, that still (unexpectedly) broke the NFS link to the home dirs. We'll get that fixed tomorrow. In the meantime, we'll try to keep LDAP running. (Mon Nov 10) The LDAP service was up and down this afternoon We have installed local, static files on all of our UNIX systems so that we can take the load off of the LDAP server. While this will impose delays in any password changes, we hope it will keep things stable while we diagnose or replace it. The Imail mail server and SquirrelMail (http://imail.eecs) hang up when LDAP or NFS are down. Email that is sent to the mail server during that time is queued by the sender for later delivery.
The Instructional UNIX systems lost LDAP and NFS (user identification and home dirs) again on Sunday Nov 9 from about 5pm - midnight. The Instructional UNIX computers (Linux, Solaris, MacOSX) and WEB servers (Inst, ISVN, SquirrelMail) were also down. Please see below for symtoms and explanation. It has been a recurrent problem of the server failing; LDAP just stops answering, and it takes 30-60 minutes to restart it. We don't know why it got so bad this semester. We have tried tuning the timeouts and monitoring the client connections. We are testing a new version of the LDAP server software, on a newer computer with more RAM.
The Instructional UNIX systems lost both LDAP and NFS services (passwords and home directories) at about 3:45pm on Wednesday and were unstable until about midnight. The effect on our users was frozen UNIX login sessions or the inability to login, inaccessible home directories and inaccessible WEB sites on http://inst.eecs. The next day, we implemented work-arounds while we debug it. We regret the negative impact that this has had on our students.
The Instructional UNIX systems lost both LDAP and NFS services (passwords and home directories) from about 11am - 10:45pm today. The effect on our users was frozen login sessions or the inability to login. We had to restart a jammed LDAP server, which can take an hour as it rebuilds its database. This has occurred previously this semester, and we are trying to debug it.
The Instructional UNIX systems lost their LDAP password service at about noon today. The service was restored by 2pm (changed from 1pm...), as the redundant LDAP servers rebuilt their databases. The effect on our users were frozen login sessions or the inability to login. It also caused loss of access to some WEB pages on http://inst.eecs.berkeley.edu and delays in email delivery through imail.eecs.berekley.edu.
The Instructional UNIX systems lost both LDAP and NFS services (passwords and home directories) from about 10am-10:20am today. The effect on our users was frozen login sessions or the inability to login. This was caused by a loss of connection to one of our LDAP servers and the time delay for the NFS server to automatically cutover to our redundant LDAP server.
(July 29) There was a network problem from about 9:50am to 10:55am today that prevented our users from accessing their UNIX homedirs and Airbears. For more information: https://iris.eecs.berkeley.edu/news/11953-unplanned-outage-wired-and-wireless (July 14) EECS network staff are performing load testing today to help prevent additional incidents as below. There will be intermittent moments of poor network performance at the EECS border as this occurs. For more information: https://iris.eecs.berkeley.edu/news/11893-intermittent-network-slowness-today (July 9) The EECS Instructional systems experienced intermittent lost connections (for periods of a minute or so every few hours) to the UNIX home directories between July 3 and July 8. The symptom was that it would be slow to login while waiting for initial access to the home directories, then you might get 'command not found' errors if the "dot" files in your home directory had failed to run and set your path. The server support staff corrected this at about 3pm on July 8. For more information: https://iris.eecs.berkeley.edu/news/11813-degraded-performance-for-some-project (June 25) EECS computers experienced intermittent network interruptions of up to several minutes between June 23 and June 25. There were dropped connections between the EECS network and the outside world (including the rest of campus and users on Airbears who are connected to EECS computers). This affected communication in both directions. The EECS network group posts updates at https://iris.eecs.berkeley.edu/news/11673-packet-loss-at-eecs-network
A major security risk has been identified in Microsoft Internet Explorer. Please use Firefox or another broswer until a patch has been released. More information: https://technet.microsoft.com/en-us/library/security/2963983.aspx http://blogs.technet.com/b/srd/archive/2014/04/26/more-details-about-security-advisory-2963983-ie-0day.aspx http://www.fireeye.com/blog/uncategorized/2014/04/new-zero-day-exploit-targeting-internet-explorer-versions-9-through-11-identified-in-targeted-attacks.html http://www.usatoday.com/story/tech/2014/04/28/internet-explorer-bug-homeland-security-clandestine-fox/8409857/
On Tuesday April 15, 199 Cory will reopen as the newly renovated SanDisk Computing Lab. All students in EECS classes are welcome to use this comfortable and collaborative space, which includes 8 new PCs (Windows, 16GB RAM), seating for groups and laptop users and a large LCD display that you can use with your own portable device. Please also join us for the Opening Ceremony with SanDisk on Friday April 18 at 11am.
The EECS networks were restored to service at about 8:45pm. WEB pages on http://inst.eecs were accessible again at 9pm. star.cs.berkeley.edu was rebooted at 12:30am (Sunday) to reset NFS. The original announcement (March 27): All EECS computers will be inaccessible on Saturday March 29 from about 10AM - 6PM during scheduled maintenance to the EECS networks. The EECS network will be down for maintenance on Saturday, so our computers will be inaccessible from the network and from each another. Any users on our systems would experience interruptions and possible loss of data. This includes our email server (imail.eecs) and WEB server (inst.eecs). Email that is sent to our server during that time will be queued by the sender for later delivery. For more information about the EECS network maintenance, please see http://iris.eecs.berkeley.edu. Here's the sign for the labs.
All EECS instructional computers will be offline from Friday Jan 10 at about 5pm through Monday Jan 13 at about 10am Exceptions: Our email server (imail.eecs) and WEB server (inst.eecs) will be down only on Saturday Jan 11 from 10am-6pm. Email that is sent to the server during that time will be queued by the sender for later delivery. The EECS network will be down for maintenance on Saturday, so our computers will be inaccessible from the network and from each another. Any users on our systems would experience interruptions and possible loss of data. For more information about the EECS network maintenance, please see http://iris.eecs.berkeley.edu.
Starting the morning of Dec 19, some of our UNIX systems have denied logins or been missing the home directories. This is caused by a failure in LDAP authentication caused by some new certificates. This was fixed by 1pm today; please notify "email@example.com" (510-643-6141) if you are still unable to login to our systems. You can list our login servers at http://inst.eecs.berkeley.edu/cgi-bin/clients.cgi?choice=servers.
Starting the morning of Dec 5, logins at some workstations on 200 SDH (Macs) and 2xx Soda (Linux) experienced delays and timeouts. It is most noticable with Firefox and Chrome WEB browsers. If the WEB site you want times out, you can usually get it after clicking "Reload" a few times. The browsers may refuse to close when you try to logout. The sypmtoms are more severe on the Macs than on the Linux systems. We are trying to diagnose this will get help from the dept network staff on Friday. We'll post updates here.
The EECS network has been restored to service. Thanks to the EECS network staff for discovering the cause. The network problem occurred from 1:30am - 3:30pm today. It caused delays and timeouts for logins and for access to WEB servers and email servers on EECS computers. AirBears was down. For updates from the EECS network staff, please see https://iris.eecs.berkeley.edu/news/10633-eecs-network-in-a-degraded
If you have received this email, please DELETE it without clicking on the link: ===================================================================== From: UC Berkeley EECS Subject: validate and upgrade to our new Mail hub system. This Email Is from the UC Berkeley EECS Support. We Will Be Making Some Vital E-Mail Account Maintenance Today 12th of November 2013. To avoid your e-mail account been terminated during this upgrade, Kindly Click Here and follow the instructions to validate and upgrade to our new Mail hub system. ===================================================================== This is a phishing scam and is NOT from EECS administrators. You may have found that obvious because of the addresses in the full email header: From firstname.lastname@example.org Wed Nov 13 16:59:23 2013 To: Recipients <email@example.com> From: UC Berkeley EECS <firstname.lastname@example.org>
Printers that are spooled via the "iprint.eecs" server were down from about 8am Saturday (Nov 2) through 10:30am Monday (Nov 4). This includes the printers: lw199@iprint (lw199a@iprint, lw199b@iprint) in 199 Cory lw119@iprint in 119 Cory lwh30@iprint in 200 Sutadja Dai lw274@iprint (print274a@iprint, print274b@iprint) in 274 Soda lw330@iprint in 330 Soda lw349@iprint in 349 Soda All print jobs that were queued during that time have been canceled, and will not count against the user's print quota.
We are transitioning to new Synopsys licenses during October. The TecPlot program is replaced by the Sentaraus Visual (Svisual), and some functions of the INSPECT scripting language will no longer work. For details please see http://inst.eecs.berkeley.edu/cgi-bin/pub.cgi?file=synopsys.help. To give GSIs, researchers and students time to test the new Svisual features, we'll run the new licenses from about noon-6pm each weekday until Oct 25, 2013. After that, Tecplot will become obsolete. Please report questions or problems to email@example.com.
The Instructional email server refused incoming mail from 8pm on Thursday until 10am on Friday. All backlogged mail seems to be delivered now (10:30am Friday). Email that is refused is cached on the senders' computers and retried periodically until it gets through, so emails were delayed but not lost.
There was a campus power failure yesterday. (http://newscenter.berkeley.edu) EECS Instructional services were restored by noon today (Oct 1). Please report any problems to firstname.lastname@example.org
At about 8:15am, a transformer near Evans Hall broke and caused a power outage in Cory, Soda, Etcheverry, Sudarja Dai and other buildings. Soda Hall was closed until about noon. The computers, servers and networks for the instructional labs were restored to service by 2pm. There is a notice about the power failure on the campus IST Service Status page at http://ucbsystems.org/. This power outage was a result of an explosion in a manhole outside of Evans hall which damaged several high-power electrical cables. Campus reported that up to 15 buildings lost power completely and up to 30 more lost partial power.
Please see http://inst.eecs.berkeley.edu/End-of-Semester.
The National Instruments bus came to campus for the day and was visited by about 75 faculty, students and staff.
Yahoo donated 24 servers to instructional computing and bought pizza lunches for EECS students!
Please see http://inst.eecs.berkeley.edu/cgi-bin/pub.cgi?file=lab-safety.help for information about seeking help while you are in one of our labs.
- when you try to login the screen freezes - you see the error message "home directory is /" - session hangs up if you try to 'ssh' into an Instructional computer - unable to read WEB pages from the http://inst.eecs.berkeley.edu - lots of annoying "NFS timeout" error messages on your screen While the server is down, you may not be able to logout in our labs because you can't type any commands. On a SunRay, even turning it off doesn't log you out. The support staff check the labs after events like this to be sure everyone gets logged out. We also post information about the problem at http://inst.eecs.berkeley.edu to help students find out when the problem has been fixed. So all you can really do in this case is to wait until the problem is fixed, go back to the lab (or login to the SunRay server for that lab) and log yourself out, or let us log you out. We disable email receipt and relaying through imail.eecs when the home directory server is down. No mail is lost. Computers that send mail queue messages that are not accepted by a remote server, and they resend the messages periodically until they are received.