New Message: About the recent outages
webmaster at userland.com
webmaster at userland.com
Thu Dec 31 03:24:08 CST 2009
A new message was posted:
By: Jake Savin (jake at userland.com)
I'm Jake Savin, a former developer with UserLand. I wanted to take a few minutes to explain why all hell broke loose starting in the middle of last month, what's happened since, and what the near-term future looks like.
In mid-November, I received a call from one of my former UserLand colleagues, informing me that there were problems on some of the servers at UserLand, and that Lawrence Lee was unable to be reached to help. To this day, none of us have heard from him, despite repeated attempts to reach him on email, instant messaging, telephone, and Skype.
Over the following weeks since then, in-between my current job as a program manager at Microsoft, my wife and two-year-old son, and the holidays, I spent many hours working to debug and clean up what turned out to be a set of major problems with many of the UserLand servers, stemming primarily from the fact that they did not have the latest security patches. One of the main servers, which actually runs 10 Virtuozzo-based virtual servers, had been infected with a virus, and despite repeated attempts to remove it safely, the server had to be re-imaged, and all 10 virtual servers re-created from scratch.
One of these virtual servers is UserLand's software licensing server, which was down for about three weeks. If you purchased anything from UserLand in the second half of November or the beginning of December, and you didn't receive your software license codes, this is why. At this time, I don't believe any orders were lost -- they were just very slow to get processed. You should have since received your license/authorization codes. If not, please email me at jake-at-userland-com, and I'll see what I can do.
In the meantime, the remaining servers, of which the Radio UserLand publishing system (upstreaming server), and the Radio UserLand site/DG, were badly in need of maintenance, and to some extent still are.
The first thing I did was make sure that there was a solution in place for automatically applying security patches, and ensuring that the Frontier server would come back up properly if the server had to be rebooted as part of that process. This should help protect from similar outages in the future.
The next thing I had to do was to fix a networking issue where the Frontier/Manila/Radio upstreaming servers could not see the web server (Apache) where they saves upstreamed and static content (pictures and gems). This is the "L:" drive problem that a number of people have reported on the Radio DG. In order to do this, I had to first investigate the issue, and then create a new ID/password for the Samba server that runs alongside Apache, and set appropriate permissions across a /very/ large number of files/folders -- since Lawrence is, to the best of my knowledge, the only person who has the ID/password for the user that mounted that shared drive previously.
Still remaining to be done, from an admin standpoint, is to:
* Automate cleaning of stale resources on the servers, clearing old logs, and the like. This will help keep things fast.
* Turn off or optimize some of the more resource-intensive processes on the Radio comments server. This is the biggest resource-consumer in the system today, and much of its resource expenditure is due to spammers.
* Consolidate the upstreaming, static files and hosting functions onto one machine (or same machines), to avoid the "L:" drive problem, and also save money.
This work will likely take up most of my available time during the month of January.
After that, there's the longer-term problem of what to do now that UserLand has no dedicated admin. I am not in a position to do this work myself, since my job and family take up 100% of my waking hours today, and I've been doing this work "in my sleep" so to speak, up to this point. With winter vacation ending next week, I will not be able to keep this up indefinitely, without at least making some changes to the servers, to make them better able to take care of themselves. Obviously I also won't be able to fill Lawrence's shoes when it comes to technical support, and that fact is not likely to change soon.
That said, I will be keeping the Radio upstreaming server running for at least the month of January. Radio comments may be changed to read-only at some point during that month, and may disappear completely after Jan (per the original cancellation notice posted in mid-2009).
I recognize that UserLand's products have a very dedicated, albeit small following -- I'm one myself. And as one of the creators of Radio and Manila, I thank you! I'm going to do my best to keep whatever I can online, at least in the near term.
In the meantime, if you have heard from or know how to reach Lawrence, please ask him to contact me to let me know that he's okay.
Thanks! If you need to contact me, email jake-at-userland-dot-com, and allow at least 24 hours for a response. I will check in on the UserLand DGs periodically, probably about once a day, and likely in the late evening.
PS. Cross-posted on the Manila DG. (I also made a similar post  on the Radio DG.)
This is a Manila site... http://manila.userland.com/.
More information about the Manila-Server