Current Time: 15:26:04 EST
 

Mail and MySQL Server Urgent Maintenance – Resolved

Posted In: Outage — Apr 28th, 2016 at 2:58 pm EDT by IX: Brian S.
Shared services are affected

Incident Description:

Our system administrators have identified an issue with one of our mail arrays related to a hardware failure of one storage member in the SAN.

A drive failure occurred on a member of the dmail02 storage group.  The member attempted to initiate a RAID rebuild, which was unsuccessful and the storage member removed itself from the storage group.  RAID refers to “redundant array of independent disks”, a technology  that allows us to achieve high levels of storage reliability from our server drives. It does this by arranging the devices into an array. Simplified, this means they act like one large hard drive, but if one drive dies, there is enough data stored on the rest to recreate the lost data once the broken hard drive is replaced with a new one.

The server had to be taken offline and solutions are currently being investigated.  For all email messages that are being sent to email addresses on these servers, that mail is being queued and will be delivered once services are resumed.

 

Which Customers are Impacted?

Customers with email service provided by this  array and also customers who have database servers on this array.  A full list has been posted.

How are Customers Impacted?

Email and database services are temporarily offline.  Mail delivering to the affected mail servers will wait in a queue to be delivered once services return.  Customers may also be unavailable to access their control panels during this outage, as well.

How often will we be updated?

As required

Time to Resolution (ETA)

Unknown

Incident Updates

  • 2016/04/28 03:30PM EDT - Our system administrators are still investigating the cause of the problem.  Our primary concern at this point is maintaining data integrity so all services remain offline.
  • 2016/04/28 03:45PM EDT - Full list of affected mailservers has been added to the main post
  • 2016/04/28 03:50PM EDT - Full list of affected mailservers has been updated
  • 2016/04/28 03:55PM EDT - Full list of affected database servers has been added to the main post
  • 2016/04/28 04:30PM EDT - Services remain offline while we continue investigation.  Incoming mail during this outage will be queued and delivered once services are restored to normal
  • 2016/04/28 05:03PM EDT - Our engineers are working with the vendor engineers on restoring the storage array.
  • 2016/04/28 05:440PM EDT - Our storage vendor engineers are currently running a full diagnostic test on the array, in an attempt to try to bring the RAID back up.
  • 2016/04/28 06:20PM EDT - Our storage vendor engineers have escalated this issue further up through their development team and our System Engineers are also investigating alternate scenarios to resolve the issue.
  • 2016/04/28 07:36PM EDT - Our storage vendor engineers have identified a possible solution and they are preparing to attempt it.
  • 2016/04/28 07:59PM EDT - As we investigate deeper into this issue we have identified these additional mail servers affected: mail21, mail37, mail310, mail1213, mail1217, mail1218, mail1302, mail1411, mail1417, mail1421, mail1424
  • 2016/04/28 08:42PM EDT - SAN restoration attempts have not been successful, the engineering team is working with vendor engineers on the remaining options to restore the server without data loss.  We apologize that this process is taking some time, but it is very important that we are very careful and thorough with this sensitive problem.
  • 2016/04/28 11:40PM EDT - We are working a more detailed explanation for everyone that will contain more information on what failed and our next steps.
  • Update 2016/04/29 01:20AM EDT - Although DBMail02 cluster of virtual machines is organized in a redundant RAID 50 SAN, it had several consecutive failures today, resulting in the system wide downtime you’re experiencing. One disk failure is normally not a problem in an array of this kind; however today, multiple drives failed consecutively. This is unlikely chain of events rendered the entire cluster unavailable. We are currently making copies of the failed disks. If these copies can be successfully created, the array can be brought back online by performing several sophisticated technical steps on the hard disks. If the array can’t be brought back online, we would at least have a more recent version of the data, so that it can be restored after all services have been brought back online from the last backup data.This backup restore process is running in parallel now, and most data will be gradually restored from backup as the services come back up. There will be another update in the morning with more technical details and informationThis is a very long outage and frustrating outage for everybody. We wish wholeheartedly there was a way to speed this up, but our main concern is preserving data and minimizing any data loss. We will continue to work through the night on every avenue that will accomplish that, while simultaneously restoring services and data from backup.
  • Update 2016/04/29 07:50AM EDT - We are working on a detailed update that should be complete within the next hour.  Stay tuned.
  • Update 2016/04/29 08:26AM EDT - Our engineers have worked through the night, and we have been able to successfully copy the failed disk, which gives us more options toward the still primary goal of restoring the database and mail data. Currently our engineers are back online with the highest level vendor engineers, and have managed to get the array back up in a delicate state, which gives us hope that we can evacuate the data safely and get it back online. We are very carefully attempting to do that now.  While those operations proceed, our second engineering team has also been working through the night to recreate all 149 servers and starting to sync backup data from the backups we do have of the Database, SiteStudio, and Control Panel servers. Copying that much data does take time, which is why we started it yesterday, however we are still very hopeful that we will not have to use this solution. Our mail cluster continues to spool incoming mail, and will hold that mail until the mail servers are re-established, so no customers should lose emails sent to them during the outage. We do see and hear your calls for more frequent updates, and we very much want to provide them. Unfortunately many of the operations underway are done very carefully and slowly, and sometimes we are just simply waiting for output from the systems for an hour or more. Again we are very sorry for how seriously this is affecting all of you, and commit that every level of IX is completely focused on resolving this issue as quickly as possible.
  • Update 2016/04/29 01:09PM EDT -  We are tentatively reporting that we have more progress.  We were able to stabilize the RAID array and connect another member.  We have started to evacuate the data.  We will all be steadily watching and hoping that the evacuation will complete successfully.   If the evacuation completes successfully, we hope to have everyone back on with little to no data loss.  We continue to see and hear the calls for more specific ETAs, but there is just no way to provide one until the evacuation is further along, it is currently at 5%.  Give us a couple of hours to calculate progression rates, and we may be able to give more concrete ETAs.  
  • Update 2016/04/29 04:15PM EDT  Evacuation of the SAN has been going smoothly so far, and we are becoming more encouraged that we will be able to restore the production servers and not need to use the backup systems, although that continues to be progressed by the second engineering team as a fail safe.  The evacuation process first moves the largest volumes, so we have not had any servers ‘come out’ of it yet:  as of this update we are at 19%, and so far our progression is averaging 5-7% per hour.  However, as the evacuation progresses, entire server volumes will start to restore.  For database servers, we will bring them online immediately.  For mail servers, the queued mail will first be brought down, and then the server will be made available online.  We will update this post with server names as we confirm they are up.

    Again we sincerely apologize for this lengthy issue, saving all customer data has been our priority throughout, and will continue to be our main priority.

  • Update 2016/04/29 07:00PM EDT We are now past 30% and volumes are starting to emerge.  Once all the partitions (volumes) of a server are out we will start to bring them online as discussed in the previous update.  We should have some start very soon.  
  • Update 2016/04/29 09:00PM EDT - Evacuation progress is currently at 39%
  • Update 2016/04/29 09:15PM EDT - Our first server is back online.  MySQL1411 is now online, but it will still be inaccessible to customers.
  • Update 2016/04/29 10:38PM EDT - Six MySQL servers are online and accessible. You can view the online server in the incident description above.
  • Update 2016/04/29 11:18PM EDT - Evacuation progress is currently at 48%
  • Update 2016/04/29 11:32PM EDT - Evacuation progress is currently at 50%
  • Update 2016/04/30 12:02AM EDT - Evacuation progress is currently at 52%
  • Update 2016/04/30 12:34AM EDT - Evacuation progress is currently at 55%
  • Update 2016/04/30 01:08AM EDT - Evacuation progress is currently at 57%
  • Update 2016/04/30 01:56AM EDT - Evacuation progress is currently at 59%
  • Update 2016/04/30 02:24AM EDT - Evacuation progress is currently at 61%
  • Update 2016/04/30 03:06AM EDT - Evacuation progress is currently at 64%
  • Update 2016/04/30 03:54AM EDT - Evacuation progress is currently at 68%
  • Update 2016/04/30 04:25AM EDT - Evacuation progress is currently at 70%
  • Update 2016/04/30 04:58AM EDT - Evacuation progress is currently at 73%
  • Update 2016/04/30 06:03AM EDT - Evacuation progress is currently at 77%
  • Update 2016/04/30 08:35AM EDT - Evacuation progress is currently at 88%
  • Update 2016/04/30 09:51AM EDT - Evacuation progress is currently at 91%
  • Update 2016/04/30 11:35AM EDT - Evacuation progress is currently at 95%
  • Update 2016/04/30 12:27PM EDT - Evacuation progress is currently at 98%
  • Update 2016/04/30 01:16PM EDT - Evacuation progress is currently at 100% Evac is complete.  The last sets of servers are preparing to be brought online.

Resolution Description

Data has been evacuated from the failed storage array and servers have been re-enabled.  Mail queues have been delivered and all services are restored.

DDoS (Distributed Denial of Service) attack – Resolved

Posted In: Other Issues — Apr 28th, 2016 at 10:07 am EDT by IX: Toi Santamaria
Shared services are affected

Incident Description:

Our system administrators detected a Distributed Denial of Service attack (DDoS), launched against the nameservers for CP12.

A DDoS is is an attempt to make a computer resource unavailable to its intended users. The way the attack is carried out varies as much as who is attacked and why. One common method of attack involves saturating the target (victim) machine with external communications requests. This creates so many false connections to the server, real attempts to connect cannot be completed. Because so many domains share an IP, it is not possible to determine which site the attack is directed at. In many cases, a temporary block is sufficient until the DOS attack passes, however, if the attack continues, the shared IP could remain blocked for an extended period of time.

In order to mitigate the attack and prevent larger service impact, system administrators have temporarily filtered all connections to those nameservers. Customers who do not have their DNS already cached will not be able to browse their sites.

Which Customers are Impacted?

All customers with websites that use CP12 nameservers. You can determine if your account uses CP12 by clicking the manage button next to your hosting account. The address in the address bar will tell you what CP you are located on.

How are Customers Impacted?

Customers who do not have their DNS already cached will not be able to browse their sites.

How often will we be updated?

Hourly

Time to Resolution (ETA)

Systems Administrators are working to mitigate the effects of the DDoS. We will update with an ETA as soon as one is available.

Incident Updates

  • 2016/04/28 10:20AM EDT - System Administrators are still investigating the best way to mitigate the DDoS
  • 2016/04/28 11:15AM EDT - No new information to provide at this time
  • 2016/04/28 11:20AM EDT - Our system administrators have removed the filters on CP12 DNS queries.  We have implemented new rules to mitigate the attack.  CP12 nameservers are now successfully answering queries
  • 2016/04/28 12:20PM EDT - The changes we have implemented are still having a positive impact.  Due to the large amount of traffic that is still incoming some queries may still timeout, but we have noticed an increase in the number of legitimate queries that are processed.
  • 2016/04/28 12:45PM EDT - The DDoS is still active, but we have successfully filtered it and all queries are being handled.  We are still actively monitoring the DDoS to see if there are any changes.

Resolution Description

The filter our System Administrators have implemented is working.    All incoming traffic to this nameserver is isolated to one provider to protect the other parts of our network from the attack.  We are monitoring it to make sure if anything changes we are aware.

Windows VPS Maintenance – April 29, 2016 – Postponed

Posted In: Maintenance — Apr 27th, 2016 at 2:31 pm EDT by IX: Brian S.

Incident Description:

On April 29th, 2016 at 11PM EDT we will be performing maintenance on our Windows VPS Node “WVZ7″  during which we will be replacing the CPU for this node.  The maintenance is expected to last for one hour.  During this time all customer servers on this node will be offline.

Which Customers are Impacted?

All customers with VPS products on WVZ7

How are Customers Impacted?

All services will be offline during the maintenance

How often will we be updated?

Hourly

Time to Resolution (ETA)

1 hour

Incident Updates

  • 2016/05/03 2:40PM EDT - Maintenance has been postponed.

Resolution Description

N/A

Semi-Annual Data Center Maintenance – Friday, April 29, 2016

Posted In: Maintenance — Apr 26th, 2016 at 10:57 am EDT by IX: Toi Santamaria
Cloud services are affected
VPS services are affected
Shared services are affected

Incident Description:

Beginning Friday, April 29th, 2016 from 11:00 PM EST – 4:00 AM EST, we will be conducting routine maintenance on our data centers major electrical systems.

The purpose is to test and repair any internal components and batteries, as well as to inspect the Power Distribution Units throughout the data center.

During the maintenance, the commercial power grid will be offline and we will  be  running entirely on our generator systems.  One at a time, we will take each UPS (we have two, UPS A and B) offline via Maintenance Bypass. 

The maintenance is scheduled to be completed within a 6 hour maintenance period.

Which Customers are Impacted?

All active customers will be affected.

How are Customers Impacted?

Backup power generators will be unavailable during maintenance, servers will run on UPS power backup until generator power is restored in the unlikely event of a power outage.

How often will we be updated?

6 hours

Time to Resolution (ETA)

Friday,April 29th, 2016,4:00 AM EST

Incident Updates

N/A

Resolution Description

N/A

Server Maintenance for Web 404 – April 24th, 2016 – Complete

Posted In: Maintenance — Apr 22nd, 2016 at 3:41 pm EDT by IX: Kyle H.
Shared services are affected

Incident Description:

At 11 p.m. EDT on April 24th, 2016 we will be performing maintenance on Web 404 in which we will need to take it offline in order to improve stability to server backups. The server will be unavailable for up to 30 minutes while maintenance is completed.

Which Customers are Impacted?

All customers on Web404

How are Customers Impacted?

Services will be unavailable

How often will we be updated?

30 minutes

Time to Resolution (ETA)

30 minutes

Incident Updates

  • 2016/4/24 11:20 PM EDT - Maintenance has started. Web404 will now be unavailable

Resolution Description

Maintenance is now complete and all services have resumed back to normal

Control Panel Maintenance – April 22nd, 2016

Posted In: Maintenance — Apr 21st, 2016 at 10:58 am EDT by IX: Brian S.

Incident Description:

At 4:30AM EDT on April 22nd, 2016 we will be performing maintenance on our Manage Control Panel.  We expect this maintenance to last for one hour and during this time access to the control panel will be unavailable.  This means that your websites, email, and databases will all be online, but access to edit your products and billing information will be unavailable.

Which Customers are Impacted?

All customers

How are Customers Impacted?

Customers will be unable to access their control panel to make edits to their account.  Email, websites, and databases will all be online

How often will we be updated?

We will update at the completion of the maintenance

Time to Resolution (ETA)

1 hour

Incident Updates

  • 2016/04/22 5:39 AM EST - Maintenance has stopped. Dashboard functions are now accessible.

Resolution Description

N/A

Mail510 – Planned Server Maintenance 04/20/2016 3am – 3:30am – Resolved

Posted In: Maintenance — Apr 19th, 2016 at 1:37 pm EDT by IX: Andrew Y.
Shared services are affected

Incident Description:

Mail510 was moved from a physical server to a VM today, Wednesday April 20th at 3AM, successfully.
All customers should should be able utilize this service without issue now

Which Customers are Impacted?

All customers on Mail510.

How are Customers Impacted?

All mail services will be unavailable during resync.

How often will we be updated?

Once maintenance is complete.

Time to Resolution (ETA)

30 minutes

Incident Updates

N/A

Resolution Description

Maintenance has completed and all mail services are up and running.

web1107 shared ip 50.6.22.2 – Filtered – Resolved

Posted In: Outage — Apr 15th, 2016 at 3:40 pm EDT by IX: Victoria Witten
Shared services are affected

Incident Description:

Our system administrators detected Distributed Denial of Service attack (DDoS), launched against the shared IP address of Web1107 – 50.6.22.2  In order to mitigate the attack and keep server online, system administrators filtered all connections to that IP address.

The Server is up  however the websites on shared ip 50.6.22.2 are down.

Once the attack is over, we will lift up the IP filter immediately.

Which Customers are Impacted?

All customers using the shared IP on web1107.

How are Customers Impacted?

Websites using the shared IP will not be reachable

How often will we be updated?

When the current filter expires.

Time to Resolution (ETA)

When current filter expires

Incident Updates

  • 2016/04/16 02:25 PM EDT- IP was filtered again today. Filter will expire at 2016-04-16 23:40:35
  • 2016/04/16 11:58 PM EDT - Filter has been re-added for 12 hours.
  • 2016/04/17 07:50 AM EDT - Filter has been removed.

Resolution Description

The filter has been removed and traffic has normalized. Thank you for your patience and cooperation.

iis1025 – Maintenance – Complete

Posted In: Maintenance — Apr 14th, 2016 at 2:22 pm EDT by IX: Brian S.

Incident Description:

System Administrators detected a problem with the HSphere installation on iis1025 that causes customers to be unable to switch versions of PHP and ASP.NET.  In order to repair this and to prevent further issues, they will be moving the server to another VM on April 15th, 2016 between 03:00 AM and 06:00 AM EDT
The server has already been synced will be taken offline for a final sync and brought back up on the new VM at this time.

Which Customers are Impacted?

All customers with websites on iis1025

How are Customers Impacted?

Websites will be unavailable while the server is offline

How often will we be updated?

Hourly

Time to Resolution (ETA)

3 hours from the start of maintenance

Incident Updates

  • 2016/04/15 03:10 AM EDT - Maintenance has started.
  • 2016/04/15 04:05 AM EDT - First phase of maintenance has completed. Server has been brought down for final sync.
  • 2016/04/15 04:43 AM EDT - 32% of accounts have been sync'd
  • 2016/04/15 05:19 AM EDT -  61% of account have been sync'd
  • 2016/04/15 06:10 AM EDT - All accounts have been sync'd on the new server and most services are now live. Currently administrators are working bringing up the control panel.

Resolution Description

Maintenance has completed and all services resumed back to normal. Thank you for your patience and cooperation.

WEB1107 shared ip: 50.6.22.2 – Filtered – Resolved

Posted In: Outage — Apr 14th, 2016 at 8:05 am EDT by IX: Toi Santamaria
Shared services are affected

Incident Description:

Our system administrators detected Distributed Denial of Service attack (DDoS), launched against the shared IP address of Web1107 – 50.6.22.2  In order to mitigate the attack and keep server online, system administrators filtered all connections to that IP address.

The Server is up  however the websites on shared ip 50.6.22.2 are down.

Once the attack is over, we will lift up the IP filter immediately.

Which Customers are Impacted?

All customers on web1107 .

How are Customers Impacted?

Websites are down.

How often will we be updated?

When current filter expires

Time to Resolution (ETA)

8:00 PM EST.

Incident Updates

N/A

Resolution Description

The filter has been removed and traffic has normalized. Thank you for your patience and cooperation.

iis1025 – Unable to Switch PHP or ASP.NET Versions – Resolved

Posted In: Outage — Apr 14th, 2016 at 3:35 am EDT by IX: Kristopher G.
Shared services are affected

Incident Description:

Our administrators have detected an issue with this server.  Customers will be unable to switch between different versions of PHP or ASP.NET.  Outside of that, the server is up and running as normal.  We do have additional maintenance planned to fix this issue in the future.

Which Customers are Impacted?

All customers utilizing this server.

How are Customers Impacted?

PHP and ASP.NET versions cannot be switched on the control panel

How often will we be updated?

Hourly

Time to Resolution (ETA)

1 hour

Incident Updates

  • 2016/04/14 04:57 AM EDT - Our administrators are currently working on resolving issues with IIS and the control panel on this server. ETA is extended 30 minutes.
  • 2016/04/14 05:45 AM EDT - Our administrators are currently working on resolving issues with communication between IIS and the control panel. ETA is extended 30 minutes.
  • 2016/04/14 06:17 AM EDT - Our administrators are currently working on resolving issues with communication between IIS and the control panel. ETA is extended 30 minutes.
  • 2016/04/14 8:24 AM EDT - Our administrators are currently working on resolving issues with communication between IIS and the control panel. ETA is extended 30 minutes.
  • 2016/04/14 8:50 AM EDT - The server is up and operational. At this time, the only issue remains is unability to switch versions of PHP and ASP.NET at the control panel

Resolution Description

N/A

Support Phones Down – Resolved

Posted In: Outage — Apr 14th, 2016 at 2:32 am EDT by IX: Omari J.
Shared services are affected

Incident Description:

The customer support phones are currently down.  We are working to get them back up and running, and will provide an ETA and more details as soon as possible.  Live chat and tickets are still available for use.

Which Customers are Impacted?

All customers

How are Customers Impacted?

Customers are unable to reach technical support by phone

How often will we be updated?

Every hour

Time to Resolution (ETA)

None yet, but we will provide one as soon as possible

Incident Updates

N/A

Resolution Description

The phones are now back online and working

Mail and MySql Storage Maintenance – Resolved

Posted In: Outage — Apr 13th, 2016 at 5:25 am EDT by IX: Omari J.
Shared services are affected

Incident Description:

Our system administrators have detected an issue with the RAID array on one of the main servers that hosts client email and MySql databases. RAID refers to “redundant array of independent disks”, a technology  that allows us to achieve high levels of storage reliability from our server drives. It does this by arranging the devices into an array. Simplified, this means they act like one large hard drive, but if one drive dies, there is enough data stored on the rest to recreate the lost data once the broken hard drive is replaced with a new one.

If a RAID fails, or becomes corrupted, it must be rebuilt. This means the architecture that allows for RAID redundancy must be repaired or completely rebuilt.

The following mail servers will be affected: mail1, mail34, mail44, mail45, mail901, mail902, mail903, mail905, mail906, mail908, mail909, mail910, mail914, mail915, mail916, mail919, mail920, mail1001, mail1002, mail1004, mail1005, mail1006, mail1007, mail1008, mail1009, mail1010, mail1011, mail1012, mail1013, mail1014, mail1015, mail1017, mail1018, mail1101, mail1103, mail1104, mail1105, mail1108, mysql917, mysql1103, mysql1106, mysql1107, mysql1109. There is no downtime expected, but mail services may appear slow which can result in time outs and give the appearance that the service is unavailable.  We would like to ensure you that no data loss is expected through this process.

 

UPDATE: 2016/04/14 010:00 AM EDT

As of now, the RAID rebuild has completed.  Unfortunately, we didn’t see the load on the SAN improve over time as we expected it would.  As the RAID rebuild finished we expected the SAN to process its workload and over the course of a few hours the load would slowly decrease until we got to a regular operating level.  That did not happen and we are currently investigating the cause and solutions.  We have also escalated with our hardware provider and are troubleshooting with them.

At this point, our System administration team is pausing email delivery on some mail servers under maintenance. We expect that this will help reduce the load on the SAN.  Once the load is stabilized, they will re-enable it again. When delivery is paused, you will not be able to send email until delivery is resumed. Incoming mail will not be delivered to mailboxes right away, but will be pending while delivery is paused.

UPDATE: 2016/04/14 12:15 PM EDT

We are returning mail services to normal on the array.  At this time you should be able to login and send email without issues.  Please note that incoming email will be delayed for the next several hours while the SAN works through the queue of incoming mail that has built up over the last 24 hours.

Which Customers are Impacted?

All customers on mentioned mail and database servers.

How are Customers Impacted?

Mail and database services may appear slow and result in time outs

How often will we be updated?

Every 3 hours

Time to Resolution (ETA)

N/A

Incident Updates

  • 2016/04/13  6:53 AM EST -Raid rebuild status is now at 23%
  • 2016/04/13 9:47 AM EST - Raid rebuild status is now at 33%
  • 2016/04/13 11:30 AM EST - Raid rebuild status is now at 42%
  • 2016/04/13 12:08 PM EST - Raid rebuild status is now at 44.29%
  • 2016/04/13 1:15 PM EST - Raid rebuild status is now at 49%
  • 2016/04/13 2:10 PM EST - Raid rebuild status is now at 52%
  • 2016/04/13 2:32 PM EST - Raid rebuild status is now at 54%
  • 2016/04/13 3:02 PM EST - Raid rebuild status is now at 56%
  • 2016/04/13 3:55 PM EST - Raid rebuild status is now at 60%3
  • 2016/04/13 4:46 PM EST - Raid rebuild status is now at 63%
  • 2016/04/13 6:01 PM EST - Raid rebuild status is now at 67%
  • 2016/04/13 7:01 PM EST - Raid rebuild status is now at 70%
  • 2016/04/13 8:03 PM EST - Raid rebuild status is now at 75%
  • 2016/04/13 9:00 PM EST - Raid rebuild status is now at 78%
  • 2016/04/13 10:00 PM EST - Raid rebuild status is now at 81%
  • 2016/04/13 11:12 PM EDT - Raid rebuild status is now at 84%
  • 2016/04/14 12:00 AM EDT - Raid rebuild status is now at 87%
  • 2016/04/14 01:00 AM EDT - Raid rebuild status is now at 89.80%
  • 2016/04/14 02:00 AM EDT - Raid rebuild status is now at 92.09%
  • 2016/04/14 02:58 AM EDT - Raid rebuild status is now at 94%
  • 2016/04/14 04:00 AM EDT - Raid rebuild status is now 96.48%
  • 2016/04/14 05:00 AM EDT - Raid rebuild status is now 98.51%
  • 2016/04/14 05:30 AM EDT - Raid rebuild status is now 99.45%
  • 2016/04/14 06:00 AM EDT - Raid rebuild is now complete. At this time the servers are still experiencing a high load and expect to normalize over the next 2 hours. We thank you for your cooperation and patience at this time.
  • 2016/04/14 010:00 AM EDT - At this point, our System administration team is pausing email delivery on some mail servers under maintenance. Once the load is stabilized, they will re-enable it again. When delivery is paused, you will not be able to send email until delivery is resumed. Incoming mail will not be delivered to mailboxes right away, but will be pending while delivery is paused.  The incident description above is updated with a list of these paused servers.
  • 2016/04/14 011:25 AM EDT - We are still actively engaged with our hardware providers to work out a resolution for this issue.
  • 2016/04/14 012:00 PM EDT - We identified another bad drive from the SAN array and have removed it.  This will start another RAID rebuild, but we expect to be able to resume services during this rebuild.  We have already noticed improved latency after removing this drive.
  • 2016/04/14 12:15 PM EDT - We are returning mail services to normal on the array.  At this time you should be able to login and send email without issues.  Please note that incoming email will be delayed for the next several hours while the SAN works through the queue of incoming mail that has built up over the last 24 hours.
  • 2016/04/14 2:15 PM EDT - The second rebuild is 12% complete.  Services are continuing to improve.
  • 2016/04/14 4:15 PM EDT - The second rebuild is 30% complete.  All mail servers are back to normal
  • 2016/04/14 6:02 PM EDT - The second rebuild is 45% complete.
  • 2016/04/14 6:02 PM EDT - The second rebuild is 54% complete.
  • 2016/04/14 8:07 PM EDT - The second rebuild is 63% complete.
  • 2016/04/14 9:02 PM EDT - The second rebuild is 73% complete.
  • 2016/04/14 9:52 PM EDT - The second rebuild is 81% complete.
  • 2016/04/14 10:57 PM EDT - The second rebuild is 86% complete.
  • 2016/04/14 11:55 PM EDT - The second rebuild is 90% complete.
  • 2016/04/14 12:58 AM EDT - The second rebuild is 94% complete.
  • 2016/04/14 2:00 AM EDT - The second rebuild is 98% complete.

Resolution Description

All mail services are now back to normal.

Plesk Licensing Update Maintenance – Complete.

Posted In: Maintenance — Apr 12th, 2016 at 10:03 am EDT by IX: John Richards
Cloud services are affected
VPS services are affected

Incident Description:

On the dates below, SA will perform maintenance on all customer VPS and VMs that use a Plesk license. We are notifying affected customers via ticket in your control panel.

The dates of maintenance are:

VPS: Wednesday April 13, 2016 2:30AM to 10:30AM EST
VDC: Thursday April 14, 2016 2:30AM to 10:30AM EST
CbIX: Friday April 15, 2016 2:30AM to 10:30AM EST

We expect no interruption of service for any customers.

Which Customers are Impacted?

VDC, Cloud by IX, and VPS customers who have a Plesk license installed.

How are Customers Impacted?

The Plesk license will be updated. We expect no impact to services.

How often will we be updated?

At the beginning and end of the maintenance period.

Time to Resolution (ETA)

4 to 8 hours

Incident Updates

n/a

Resolution Description

Maintenance has completed. Thank you for your patience.

Some VMs Hanging After Start/Stop/Reboot Commands – Resolved

Posted In: Outage — Apr 11th, 2016 at 3:54 pm EDT by IX: Greg Cook
Cloud services are affected

Incident Description:

System Administrators have detected an issue where a few CloudbyIX VMs are not responding to start, stop, and reboot commands correctly.  Administrators are working to identify and fix the problem as soon as possible and we will update you here as soon as we know more information.

Which Customers are Impacted?

Any Cloud by IX customer who has recently issued a start, stop, or reboot command.

How are Customers Impacted?

The VM does not respond to the command.

How often will we be updated?

As updates become available.

Time to Resolution (ETA)

Unfortunately, no ETA is available yet.

Incident Updates

  • 2016/04/11 4:30 PM EDT - System Administrators have identified the problem on one of the blades serving the virtual environment.  VMs are being moved to other available blades to return full functionality and to allow us to fix the error.  Moving VMs to another blade will have no negative impact on services.
  • 2016/04/11 5:40 PM EDT - The VMs are not down but have entered read-only mode.  They are being evacuated to another blade and are coming up after a reboot. 46 total VMs were affected on that blade and there are currently 10 of 46 up.
  • 2016/04/11 6:10 PM EDT - There are 30 of 46 affected servers up.
  • 2016/04/11 6:50 PM EDT - There are 38 of 46 affected servers up.
  • 2016/04/11 7:25 PM EDT - There are 44 of 46 affected servers up.

Resolution Description

All servers are up.

 
© 2011 IX Web Hosting.