Current Time: 15:26:04 EST
 

IIS Server Urgent Maintenance – Resolved

Posted In: Outage — Jun 17th, 2016 at 12:58 am EDT by IX: Kristopher G.
Shared services are affected

Incident Description:

We are currently experiencing issues with multiple IIS servers. We are investigating the issue and working to bring them back up as soon as possible. We will update this post once we have more information regarding affected servers and resolution.

Which Customers are Impacted?

Once further information is gathered

How are Customers Impacted?

Services and websites may appear to be unreachable

How often will we be updated?

Once further information is gathered

Time to Resolution (ETA)

n/a

Incident Updates

n/a

Resolution Description

All services should now be back to their normal working state

DDoS (Distributed Denial of Service) attack. Resolved.

Posted In: Security Issues — Jun 13th, 2016 at 8:38 am EDT by IX: Toi Santamaria
Cloud services are affected
Shared services are affected
my.ixwebhosting.com affected

Incident Description:

We are currently experiencing a DDoS (Distributed Denial of Service) attack against our services. Our system administration team is working to identify the source IPs and block this attack.

 

When accessing  your  my.ixwebhosting.com accounts you  may  experience slowness.

We will update this status blog post as we get additional information from our systems administrators.

Which Customers are Impacted?

Anyone visiting their my.ixwebhosting.com account may temporarily experience slowness.

How are Customers Impacted?

Slowness using my.ixwebhosting.com accounts.

How often will we be updated?

30 minutes.

Time to Resolution (ETA)

N/A

Incident Updates

N/A

Resolution Description

We no longer observe slowness on our services.

CP3 Server maintenance – Resolved

Posted In: Maintenance — Jun 12th, 2016 at 1:57 pm EDT by IX: Toi Santamaria
Shared services are affected

Incident Description:

Our administrators are performing maintenance on CP3 server  to fix R/O of  the root partition.

Which Customers are Impacted?

All customers on CP3 server.

How are Customers Impacted?

Services affected - Control panel  access and Webmail

How often will we be updated?

N/A

Time to Resolution (ETA)

110 minutes.

Incident Updates

  • 2016-06-12 2:26 PM EST - CP3  server under FSCK
  • 2016-06-12 3:36 PM EST - ETA is expected in another 20 minutes

Resolution Description

FSCK is completed and services restored.

iis319 server is down.Resolved.

Posted In: Outage — Jun 12th, 2016 at 10:22 am EDT by IX: Toi Santamaria
Shared services are affected

Incident Description:

Our administrators have found an  issue causing iis319 to be intermittently down.   The administrators are currently investigating .

Which Customers are Impacted?

All customer on affected server.

How are Customers Impacted?

Services unavailable.

How often will we be updated?

N/A

Time to Resolution (ETA)

10 minutes.

Incident Updates

N/A

Resolution Description

iis319 server  is up and  all services are available.

Control Panel Maintenance – Complete

Posted In: Maintenance — Jun 09th, 2016 at 2:33 am EDT by IX: Omari J.
Cloud services are affected
VPS services are affected
Shared services are affected
my.ixwebhosting.com affected

Incident Description:

At 3:00 AM EDT on June 9th, 2016 we will be performing maintenance on our Manage Control Panel.  We expect this maintenance to last for 30 minutes and during this time access to the control panel will be unavailable.  This means that your websites, email, and databases will all be online, but access to edit your products and billing information will be unavailable.

Which Customers are Impacted?

All customers will be affected

How are Customers Impacted?

Customers will be unable to access their control panel to make edits to their account.  Email, websites, and databases will all be online

How often will we be updated?

We will update at the completion of the maintenance

Time to Resolution (ETA)

30 minutes

Incident Updates

N/A

Resolution Description

Maintenance has been completed and services resumed.

Manage Control Panel Unavailable – Resolved

Posted In: Outage — Jun 05th, 2016 at 4:12 am EDT by IX: Kristopher G.
Shared services are affected

Incident Description:

Currently at this time all access to MCP is inaccessible. Our administrators are currently working on resolving this as quickly as possible. We apologize for the inconvenience.

Which Customers are Impacted?

Any one attempting to access their control panel via manage. or my. will be unreachable.

How are Customers Impacted?

my. and manage. will be unavailable.

How often will we be updated?

1 hour

Time to Resolution (ETA)

N/A

Incident Updates

N/A

Resolution Description

All services have been restored and all manage pages are reachable.

MySQL401 Server is Down – Resolved

Posted In: Outage — May 31st, 2016 at 11:35 am EDT by IX: Toi Santamaria
Shared services are affected

Incident Description:

Our System Administrators detected an error with the RAID array on MySQL401.

RAID refers to “redundant array of independent disks”, a technology that allows us to achieve high levels of data storage reliability. It does this by arranging the devices into an array protecting against drive failure. All data is shared across multiple drives with enough duplicated info (parity data) that if a single drive fails, the RAID can rebuild the lost information on another spare drive in the array.

System Administration Investigation revealed that a drive in the array failed and before the raid could be rebuilt on the spare another drive in the array failed and we lost parity data. We are recreating this server using data from the most recent backup performed May 28th.

Which Customers are Impacted?

Customers with databases on MySQL401. You can check to see if your databases are located on MySQL401 by logging in to your control panel. Click the manage button next to your hosting account, click the MySQL button and the host name will tell you which MySQL server your account uses.

How are Customers Impacted?

These databases will be unavailable until your account is recreated on the new server. Once recreated, the data restored will be from the backup taken on May 28th.

How often will we be updated?

Every 30 minutes.

Time to Resolution (ETA)

Up to 8 hours for all accounts to be restored.

Incident Updates

  • 2016-05-31 12:00 PM EDT - A new server is being created and we will apply our most recent backups to it once complete.
  • 2016-05-31 1:30 PM EDT - We have finished creating the server and now we will begin restoring from the backup.
  • 2016-05-31 2:32 PM EDT - 68 of 183 accounts restored.
  • 2016-05-31 3:04 PM EDT - 102 of 183 accounts restored.

Resolution Description

Databases are accessible now, sites are up.

Shared Phone Support – Resolved

Posted In: Other Issues — May 31st, 2016 at 10:20 am EDT by IX: KB Muhleman
Shared services are affected
Phone support is affected

Incident Description:

We are currently experiencing issues with the phone system within our shared support offices. The vendor has been contacted and we are working to restore these services as soon as possible. Updates will be provided as they become available.

Which Customers are Impacted?

All customers attempting to contact shared support by phone will be effected

How are Customers Impacted?

Incoming shared support

How often will we be updated?

N\A

Time to Resolution (ETA)

N\A

Incident Updates

N\A

Resolution Description

N\A

Ordering Wizard maintenance – Complete

Posted In: Maintenance — May 27th, 2016 at 3:37 pm EDT by IX: Daren H.

Incident Description:

On Monday, May 30th at 4:00AM EDT we will be performing a brief maintenance on our PCI Environment.  The maintenance is only expected to last for a total of 5 minutes and will have minimal customer impact.

Which Customers are Impacted?

We expect zero customer impact.

How are Customers Impacted?

It will affect new orders for any product.

How often will we be updated?

When Completed

Time to Resolution (ETA)

5 Minutes

Incident Updates

  • 2016/05/30 04:56 AM EDT - Maintenance has started. Apologies for the delay.

Resolution Description

Maintenance has been completed. Thank you for your cooperation.

Mail Server Queues – Resolved

Posted In: Other Issues — May 26th, 2016 at 3:45 pm EDT by IX: Greg Cook
Shared services are affected

Incident Description:

Some of our mail servers are experiencing higher-than-normal queues for sending Email.  We are investigating the issue.  Here are the servers currently affected:

smh01.opentransfer.com
smh02.opentransfer.com
smh03.opentransfer.com

These servers work with mail which is being sent from websites, not mail clients or webmail.

 

Which Customers are Impacted?

Mail which is being sent from websites may be delayed in receipt.

How are Customers Impacted?

Sent Email from websites may be delayed in receipt.

How often will we be updated?

Every hour

Time to Resolution (ETA)

Unknown at this time

Incident Updates

  • 2016/05/26 3:52 PM EDT - We are cleaning queues and investigating source of the problem.
  • 2016/05/26 4:45 PM EDT - smh03 is cleaned. We are still working on smh01-02
  • 2016/05/26 5:45 PM EDT - smh02 almost cleaned

Resolution Description

All mail servers have returned to normal operation.

Network Maintenance: 05/17/2016 – 05/19/2016 – Complete

Posted In: Maintenance — May 13th, 2016 at 11:15 am EDT by IX: Kyle H.
Shared services are affected

Incident Description:

Our System Administrators will be performing network maintenance to connect our shared servers to new switches.  This maintenance will be performed on the following dates and times:

Tuesday May 17th: 11PM to 3AM EDT

Wednesday May 18th: 11PM to 3AM EDT

Thursday May 19th: 11PM to 3AM EDT (If needed)

This increase in bandwidth from the new switches will help ensure that heavy network activity does not cause service degradation to customers.

Which Customers are Impacted?

Customers on our shared hosting services.

How are Customers Impacted?

Customer impact is expected to be less than 10 seconds per server.  Servers will be inaccessible during this very short period of time.

How often will we be updated?

When completed.

Time to Resolution (ETA)

n/a

Incident Updates

  • 2016/05/17 11:32 PM EDT - Maintenance has started
  • 2016/05/17 5:48 AM EDT - Maintenance has completed
  • 2016/05/18 11:00 PM EDT - Maintenance has begun.
  • 2016/05/19 03:21 AM EDT - Maintenance has been completed for today, however will be continuing again tonight starting at 11 PM EDT.
  • 2016/05/19 11:25 PM EDT - Maintenance has started.

Resolution Description

Maintenance has now completed successfully

Network Maintenance – 05/15/2016 1AM-2AM – Complete

Posted In: Maintenance — May 12th, 2016 at 1:01 pm EDT by IX: Admin
Cloud services are affected
VPS services are affected
Shared services are affected

Incident Description:

Our Systems Administrators will be performing network maintenance on Sunday May 15th from 1AM to 2AM EDT.
During maintenance, they will be upgrading components in our border routers .  This will increase the redundancy of both border routers.

Which Customers are Impacted?

We expect zero customer impact.

How are Customers Impacted?

We expect zero customer impact.

How often will we be updated?

When complete

Time to Resolution (ETA)

1 hour

Incident Updates

  • 2016/05/15 01:00 AM EDT - Maintenance has started. Please contact support should any issues be experienced, however we expect no problems during this time.
  • 2016/05/15 01:55 AM EDT - More time is required to complete all needed steps for maintenance. ETA extended 1 hour.
  • 2016/05/15 01:55 AM EDT - Maintenance is now complete.

Resolution Description

Maintenance is complete.

Windows VPS Maintenance – May 11th, 2016 – Complete

Posted In: Maintenance — May 10th, 2016 at 3:51 pm EDT by IX: John Richards
VPS services are affected

Incident Description:

On May 11th, 2016 at 11 PM EDT we will be performing maintenance on our Windows VPS Node “WVZ7″  during which we will be replacing the CPU for this node.  The maintenance is expected to last for one hour.  During this time, all customer servers on this node will be offline.

 

Which Customers are Impacted?

All customers with VPS products on WVZ7.

How are Customers Impacted?

All services will be offline during the maintenance.

How often will we be updated?

Hourly

Time to Resolution (ETA)

1 hour

Incident Updates

Maintenance is now complete and all servers online.

Resolution Description

Maintenance is complete.

Mail412 Urgent Maintenance – Resolved

Posted In: Maintenance — May 09th, 2016 at 3:05 pm EDT by IX: Greg Cook
Shared services are affected

Incident Description:

Our system administrators found an issue with the server and will be performing urgent maintenance.  System administrators have rebooted it by force. Currently, the server is under File System Check(FSCK).

Which Customers are Impacted?

All customers on mail412.

How are Customers Impacted?

Messages that will be sent will be delivered after FSCK is over.

How often will we be updated?

As Required.

Time to Resolution (ETA)

~5 hours.

Incident Updates

n/a

Resolution Description

FSCK has been completed and the server is up.

Mail and MySQL Server Urgent Maintenance – Resolved

Posted In: Outage — Apr 28th, 2016 at 2:58 pm EDT by IX: Brian S.
Shared services are affected

Incident Description:

Our system administrators have identified an issue with one of our mail arrays related to a hardware failure of one storage member in the SAN.

A drive failure occurred on a member of the dmail02 storage group.  The member attempted to initiate a RAID rebuild, which was unsuccessful and the storage member removed itself from the storage group.  RAID refers to “redundant array of independent disks”, a technology  that allows us to achieve high levels of storage reliability from our server drives. It does this by arranging the devices into an array. Simplified, this means they act like one large hard drive, but if one drive dies, there is enough data stored on the rest to recreate the lost data once the broken hard drive is replaced with a new one.

The server had to be taken offline and solutions are currently being investigated.  For all email messages that are being sent to email addresses on these servers, that mail is being queued and will be delivered once services are resumed.

 

Which Customers are Impacted?

Customers with email service provided by this  array and also customers who have database servers on this array.  A full list has been posted.

How are Customers Impacted?

Email and database services are temporarily offline.  Mail delivering to the affected mail servers will wait in a queue to be delivered once services return.  Customers may also be unavailable to access their control panels during this outage, as well.

How often will we be updated?

As required

Time to Resolution (ETA)

Unknown

Incident Updates

  • 2016/04/28 03:30PM EDT - Our system administrators are still investigating the cause of the problem.  Our primary concern at this point is maintaining data integrity so all services remain offline.
  • 2016/04/28 03:45PM EDT - Full list of affected mailservers has been added to the main post
  • 2016/04/28 03:50PM EDT - Full list of affected mailservers has been updated
  • 2016/04/28 03:55PM EDT - Full list of affected database servers has been added to the main post
  • 2016/04/28 04:30PM EDT - Services remain offline while we continue investigation.  Incoming mail during this outage will be queued and delivered once services are restored to normal
  • 2016/04/28 05:03PM EDT - Our engineers are working with the vendor engineers on restoring the storage array.
  • 2016/04/28 05:440PM EDT - Our storage vendor engineers are currently running a full diagnostic test on the array, in an attempt to try to bring the RAID back up.
  • 2016/04/28 06:20PM EDT - Our storage vendor engineers have escalated this issue further up through their development team and our System Engineers are also investigating alternate scenarios to resolve the issue.
  • 2016/04/28 07:36PM EDT - Our storage vendor engineers have identified a possible solution and they are preparing to attempt it.
  • 2016/04/28 07:59PM EDT - As we investigate deeper into this issue we have identified these additional mail servers affected: mail21, mail37, mail310, mail1213, mail1217, mail1218, mail1302, mail1411, mail1417, mail1421, mail1424
  • 2016/04/28 08:42PM EDT - SAN restoration attempts have not been successful, the engineering team is working with vendor engineers on the remaining options to restore the server without data loss.  We apologize that this process is taking some time, but it is very important that we are very careful and thorough with this sensitive problem.
  • 2016/04/28 11:40PM EDT - We are working a more detailed explanation for everyone that will contain more information on what failed and our next steps.
  • Update 2016/04/29 01:20AM EDT - Although DBMail02 cluster of virtual machines is organized in a redundant RAID 50 SAN, it had several consecutive failures today, resulting in the system wide downtime you’re experiencing. One disk failure is normally not a problem in an array of this kind; however today, multiple drives failed consecutively. This is unlikely chain of events rendered the entire cluster unavailable. We are currently making copies of the failed disks. If these copies can be successfully created, the array can be brought back online by performing several sophisticated technical steps on the hard disks. If the array can’t be brought back online, we would at least have a more recent version of the data, so that it can be restored after all services have been brought back online from the last backup data.This backup restore process is running in parallel now, and most data will be gradually restored from backup as the services come back up. There will be another update in the morning with more technical details and informationThis is a very long outage and frustrating outage for everybody. We wish wholeheartedly there was a way to speed this up, but our main concern is preserving data and minimizing any data loss. We will continue to work through the night on every avenue that will accomplish that, while simultaneously restoring services and data from backup.
  • Update 2016/04/29 07:50AM EDT - We are working on a detailed update that should be complete within the next hour.  Stay tuned.
  • Update 2016/04/29 08:26AM EDT - Our engineers have worked through the night, and we have been able to successfully copy the failed disk, which gives us more options toward the still primary goal of restoring the database and mail data. Currently our engineers are back online with the highest level vendor engineers, and have managed to get the array back up in a delicate state, which gives us hope that we can evacuate the data safely and get it back online. We are very carefully attempting to do that now.  While those operations proceed, our second engineering team has also been working through the night to recreate all 149 servers and starting to sync backup data from the backups we do have of the Database, SiteStudio, and Control Panel servers. Copying that much data does take time, which is why we started it yesterday, however we are still very hopeful that we will not have to use this solution. Our mail cluster continues to spool incoming mail, and will hold that mail until the mail servers are re-established, so no customers should lose emails sent to them during the outage. We do see and hear your calls for more frequent updates, and we very much want to provide them. Unfortunately many of the operations underway are done very carefully and slowly, and sometimes we are just simply waiting for output from the systems for an hour or more. Again we are very sorry for how seriously this is affecting all of you, and commit that every level of IX is completely focused on resolving this issue as quickly as possible.
  • Update 2016/04/29 01:09PM EDT -  We are tentatively reporting that we have more progress.  We were able to stabilize the RAID array and connect another member.  We have started to evacuate the data.  We will all be steadily watching and hoping that the evacuation will complete successfully.   If the evacuation completes successfully, we hope to have everyone back on with little to no data loss.  We continue to see and hear the calls for more specific ETAs, but there is just no way to provide one until the evacuation is further along, it is currently at 5%.  Give us a couple of hours to calculate progression rates, and we may be able to give more concrete ETAs.  
  • Update 2016/04/29 04:15PM EDT  Evacuation of the SAN has been going smoothly so far, and we are becoming more encouraged that we will be able to restore the production servers and not need to use the backup systems, although that continues to be progressed by the second engineering team as a fail safe.  The evacuation process first moves the largest volumes, so we have not had any servers ‘come out’ of it yet:  as of this update we are at 19%, and so far our progression is averaging 5-7% per hour.  However, as the evacuation progresses, entire server volumes will start to restore.  For database servers, we will bring them online immediately.  For mail servers, the queued mail will first be brought down, and then the server will be made available online.  We will update this post with server names as we confirm they are up.

    Again we sincerely apologize for this lengthy issue, saving all customer data has been our priority throughout, and will continue to be our main priority.

  • Update 2016/04/29 07:00PM EDT We are now past 30% and volumes are starting to emerge.  Once all the partitions (volumes) of a server are out we will start to bring them online as discussed in the previous update.  We should have some start very soon.  
  • Update 2016/04/29 09:00PM EDT - Evacuation progress is currently at 39%
  • Update 2016/04/29 09:15PM EDT - Our first server is back online.  MySQL1411 is now online, but it will still be inaccessible to customers.
  • Update 2016/04/29 10:38PM EDT - Six MySQL servers are online and accessible. You can view the online server in the incident description above.
  • Update 2016/04/29 11:18PM EDT - Evacuation progress is currently at 48%
  • Update 2016/04/29 11:32PM EDT - Evacuation progress is currently at 50%
  • Update 2016/04/30 12:02AM EDT - Evacuation progress is currently at 52%
  • Update 2016/04/30 12:34AM EDT - Evacuation progress is currently at 55%
  • Update 2016/04/30 01:08AM EDT - Evacuation progress is currently at 57%
  • Update 2016/04/30 01:56AM EDT - Evacuation progress is currently at 59%
  • Update 2016/04/30 02:24AM EDT - Evacuation progress is currently at 61%
  • Update 2016/04/30 03:06AM EDT - Evacuation progress is currently at 64%
  • Update 2016/04/30 03:54AM EDT - Evacuation progress is currently at 68%
  • Update 2016/04/30 04:25AM EDT - Evacuation progress is currently at 70%
  • Update 2016/04/30 04:58AM EDT - Evacuation progress is currently at 73%
  • Update 2016/04/30 06:03AM EDT - Evacuation progress is currently at 77%
  • Update 2016/04/30 08:35AM EDT - Evacuation progress is currently at 88%
  • Update 2016/04/30 09:51AM EDT - Evacuation progress is currently at 91%
  • Update 2016/04/30 11:35AM EDT - Evacuation progress is currently at 95%
  • Update 2016/04/30 12:27PM EDT - Evacuation progress is currently at 98%
  • Update 2016/04/30 01:16PM EDT - Evacuation progress is currently at 100% Evac is complete.  The last sets of servers are preparing to be brought online.

Resolution Description

Data has been evacuated from the failed storage array and servers have been re-enabled.  Mail queues have been delivered and all services are restored.

 
© 2011 IX Web Hosting.