Current Time: 15:26:04 EST
 

Network Maintenance: 05/17/2016 – 05/19/2016 – Complete

Posted In: Maintenance — May 13th, 2016 at 11:15 am EDT by IX: Kyle H.
Shared services are affected

Incident Description:

Our System Administrators will be performing network maintenance to connect our shared servers to new switches.  This maintenance will be performed on the following dates and times:

Tuesday May 17th: 11PM to 3AM EDT

Wednesday May 18th: 11PM to 3AM EDT

Thursday May 19th: 11PM to 3AM EDT (If needed)

This increase in bandwidth from the new switches will help ensure that heavy network activity does not cause service degradation to customers.

Which Customers are Impacted?

Customers on our shared hosting services.

How are Customers Impacted?

Customer impact is expected to be less than 10 seconds per server.  Servers will be inaccessible during this very short period of time.

How often will we be updated?

When completed.

Time to Resolution (ETA)

n/a

Incident Updates

  • 2016/05/17 11:32 PM EDT - Maintenance has started
  • 2016/05/17 5:48 AM EDT - Maintenance has completed
  • 2016/05/18 11:00 PM EDT - Maintenance has begun.
  • 2016/05/19 03:21 AM EDT - Maintenance has been completed for today, however will be continuing again tonight starting at 11 PM EDT.
  • 2016/05/19 11:25 PM EDT - Maintenance has started.

Resolution Description

Maintenance has now completed successfully

Network Maintenance – 05/15/2016 1AM-2AM – Complete

Posted In: Maintenance — May 12th, 2016 at 1:01 pm EDT by IX: Admin
Cloud services are affected
VPS services are affected
Shared services are affected

Incident Description:

Our Systems Administrators will be performing network maintenance on Sunday May 15th from 1AM to 2AM EDT.
During maintenance, they will be upgrading components in our border routers .  This will increase the redundancy of both border routers.

Which Customers are Impacted?

We expect zero customer impact.

How are Customers Impacted?

We expect zero customer impact.

How often will we be updated?

When complete

Time to Resolution (ETA)

1 hour

Incident Updates

  • 2016/05/15 01:00 AM EDT - Maintenance has started. Please contact support should any issues be experienced, however we expect no problems during this time.
  • 2016/05/15 01:55 AM EDT - More time is required to complete all needed steps for maintenance. ETA extended 1 hour.
  • 2016/05/15 01:55 AM EDT - Maintenance is now complete.

Resolution Description

Maintenance is complete.

Windows VPS Maintenance – May 11th, 2016 – Complete

Posted In: Maintenance — May 10th, 2016 at 3:51 pm EDT by IX: John Richards
VPS services are affected

Incident Description:

On May 11th, 2016 at 11 PM EDT we will be performing maintenance on our Windows VPS Node “WVZ7″  during which we will be replacing the CPU for this node.  The maintenance is expected to last for one hour.  During this time, all customer servers on this node will be offline.

 

Which Customers are Impacted?

All customers with VPS products on WVZ7.

How are Customers Impacted?

All services will be offline during the maintenance.

How often will we be updated?

Hourly

Time to Resolution (ETA)

1 hour

Incident Updates

Maintenance is now complete and all servers online.

Resolution Description

Maintenance is complete.

Mail412 Urgent Maintenance – Resolved

Posted In: Maintenance — May 09th, 2016 at 3:05 pm EDT by IX: Greg Cook
Shared services are affected

Incident Description:

Our system administrators found an issue with the server and will be performing urgent maintenance.  System administrators have rebooted it by force. Currently, the server is under File System Check(FSCK).

Which Customers are Impacted?

All customers on mail412.

How are Customers Impacted?

Messages that will be sent will be delivered after FSCK is over.

How often will we be updated?

As Required.

Time to Resolution (ETA)

~5 hours.

Incident Updates

n/a

Resolution Description

FSCK has been completed and the server is up.

Mail and MySQL Server Urgent Maintenance – Resolved

Posted In: Outage — Apr 28th, 2016 at 2:58 pm EDT by IX: Brian S.
Shared services are affected

Incident Description:

Our system administrators have identified an issue with one of our mail arrays related to a hardware failure of one storage member in the SAN.

A drive failure occurred on a member of the dmail02 storage group.  The member attempted to initiate a RAID rebuild, which was unsuccessful and the storage member removed itself from the storage group.  RAID refers to “redundant array of independent disks”, a technology  that allows us to achieve high levels of storage reliability from our server drives. It does this by arranging the devices into an array. Simplified, this means they act like one large hard drive, but if one drive dies, there is enough data stored on the rest to recreate the lost data once the broken hard drive is replaced with a new one.

The server had to be taken offline and solutions are currently being investigated.  For all email messages that are being sent to email addresses on these servers, that mail is being queued and will be delivered once services are resumed.

 

Which Customers are Impacted?

Customers with email service provided by this  array and also customers who have database servers on this array.  A full list has been posted.

How are Customers Impacted?

Email and database services are temporarily offline.  Mail delivering to the affected mail servers will wait in a queue to be delivered once services return.  Customers may also be unavailable to access their control panels during this outage, as well.

How often will we be updated?

As required

Time to Resolution (ETA)

Unknown

Incident Updates

  • 2016/04/28 03:30PM EDT - Our system administrators are still investigating the cause of the problem.  Our primary concern at this point is maintaining data integrity so all services remain offline.
  • 2016/04/28 03:45PM EDT - Full list of affected mailservers has been added to the main post
  • 2016/04/28 03:50PM EDT - Full list of affected mailservers has been updated
  • 2016/04/28 03:55PM EDT - Full list of affected database servers has been added to the main post
  • 2016/04/28 04:30PM EDT - Services remain offline while we continue investigation.  Incoming mail during this outage will be queued and delivered once services are restored to normal
  • 2016/04/28 05:03PM EDT - Our engineers are working with the vendor engineers on restoring the storage array.
  • 2016/04/28 05:440PM EDT - Our storage vendor engineers are currently running a full diagnostic test on the array, in an attempt to try to bring the RAID back up.
  • 2016/04/28 06:20PM EDT - Our storage vendor engineers have escalated this issue further up through their development team and our System Engineers are also investigating alternate scenarios to resolve the issue.
  • 2016/04/28 07:36PM EDT - Our storage vendor engineers have identified a possible solution and they are preparing to attempt it.
  • 2016/04/28 07:59PM EDT - As we investigate deeper into this issue we have identified these additional mail servers affected: mail21, mail37, mail310, mail1213, mail1217, mail1218, mail1302, mail1411, mail1417, mail1421, mail1424
  • 2016/04/28 08:42PM EDT - SAN restoration attempts have not been successful, the engineering team is working with vendor engineers on the remaining options to restore the server without data loss.  We apologize that this process is taking some time, but it is very important that we are very careful and thorough with this sensitive problem.
  • 2016/04/28 11:40PM EDT - We are working a more detailed explanation for everyone that will contain more information on what failed and our next steps.
  • Update 2016/04/29 01:20AM EDT - Although DBMail02 cluster of virtual machines is organized in a redundant RAID 50 SAN, it had several consecutive failures today, resulting in the system wide downtime you’re experiencing. One disk failure is normally not a problem in an array of this kind; however today, multiple drives failed consecutively. This is unlikely chain of events rendered the entire cluster unavailable. We are currently making copies of the failed disks. If these copies can be successfully created, the array can be brought back online by performing several sophisticated technical steps on the hard disks. If the array can’t be brought back online, we would at least have a more recent version of the data, so that it can be restored after all services have been brought back online from the last backup data.This backup restore process is running in parallel now, and most data will be gradually restored from backup as the services come back up. There will be another update in the morning with more technical details and informationThis is a very long outage and frustrating outage for everybody. We wish wholeheartedly there was a way to speed this up, but our main concern is preserving data and minimizing any data loss. We will continue to work through the night on every avenue that will accomplish that, while simultaneously restoring services and data from backup.
  • Update 2016/04/29 07:50AM EDT - We are working on a detailed update that should be complete within the next hour.  Stay tuned.
  • Update 2016/04/29 08:26AM EDT - Our engineers have worked through the night, and we have been able to successfully copy the failed disk, which gives us more options toward the still primary goal of restoring the database and mail data. Currently our engineers are back online with the highest level vendor engineers, and have managed to get the array back up in a delicate state, which gives us hope that we can evacuate the data safely and get it back online. We are very carefully attempting to do that now.  While those operations proceed, our second engineering team has also been working through the night to recreate all 149 servers and starting to sync backup data from the backups we do have of the Database, SiteStudio, and Control Panel servers. Copying that much data does take time, which is why we started it yesterday, however we are still very hopeful that we will not have to use this solution. Our mail cluster continues to spool incoming mail, and will hold that mail until the mail servers are re-established, so no customers should lose emails sent to them during the outage. We do see and hear your calls for more frequent updates, and we very much want to provide them. Unfortunately many of the operations underway are done very carefully and slowly, and sometimes we are just simply waiting for output from the systems for an hour or more. Again we are very sorry for how seriously this is affecting all of you, and commit that every level of IX is completely focused on resolving this issue as quickly as possible.
  • Update 2016/04/29 01:09PM EDT -  We are tentatively reporting that we have more progress.  We were able to stabilize the RAID array and connect another member.  We have started to evacuate the data.  We will all be steadily watching and hoping that the evacuation will complete successfully.   If the evacuation completes successfully, we hope to have everyone back on with little to no data loss.  We continue to see and hear the calls for more specific ETAs, but there is just no way to provide one until the evacuation is further along, it is currently at 5%.  Give us a couple of hours to calculate progression rates, and we may be able to give more concrete ETAs.  
  • Update 2016/04/29 04:15PM EDT  Evacuation of the SAN has been going smoothly so far, and we are becoming more encouraged that we will be able to restore the production servers and not need to use the backup systems, although that continues to be progressed by the second engineering team as a fail safe.  The evacuation process first moves the largest volumes, so we have not had any servers ‘come out’ of it yet:  as of this update we are at 19%, and so far our progression is averaging 5-7% per hour.  However, as the evacuation progresses, entire server volumes will start to restore.  For database servers, we will bring them online immediately.  For mail servers, the queued mail will first be brought down, and then the server will be made available online.  We will update this post with server names as we confirm they are up.

    Again we sincerely apologize for this lengthy issue, saving all customer data has been our priority throughout, and will continue to be our main priority.

  • Update 2016/04/29 07:00PM EDT We are now past 30% and volumes are starting to emerge.  Once all the partitions (volumes) of a server are out we will start to bring them online as discussed in the previous update.  We should have some start very soon.  
  • Update 2016/04/29 09:00PM EDT - Evacuation progress is currently at 39%
  • Update 2016/04/29 09:15PM EDT - Our first server is back online.  MySQL1411 is now online, but it will still be inaccessible to customers.
  • Update 2016/04/29 10:38PM EDT - Six MySQL servers are online and accessible. You can view the online server in the incident description above.
  • Update 2016/04/29 11:18PM EDT - Evacuation progress is currently at 48%
  • Update 2016/04/29 11:32PM EDT - Evacuation progress is currently at 50%
  • Update 2016/04/30 12:02AM EDT - Evacuation progress is currently at 52%
  • Update 2016/04/30 12:34AM EDT - Evacuation progress is currently at 55%
  • Update 2016/04/30 01:08AM EDT - Evacuation progress is currently at 57%
  • Update 2016/04/30 01:56AM EDT - Evacuation progress is currently at 59%
  • Update 2016/04/30 02:24AM EDT - Evacuation progress is currently at 61%
  • Update 2016/04/30 03:06AM EDT - Evacuation progress is currently at 64%
  • Update 2016/04/30 03:54AM EDT - Evacuation progress is currently at 68%
  • Update 2016/04/30 04:25AM EDT - Evacuation progress is currently at 70%
  • Update 2016/04/30 04:58AM EDT - Evacuation progress is currently at 73%
  • Update 2016/04/30 06:03AM EDT - Evacuation progress is currently at 77%
  • Update 2016/04/30 08:35AM EDT - Evacuation progress is currently at 88%
  • Update 2016/04/30 09:51AM EDT - Evacuation progress is currently at 91%
  • Update 2016/04/30 11:35AM EDT - Evacuation progress is currently at 95%
  • Update 2016/04/30 12:27PM EDT - Evacuation progress is currently at 98%
  • Update 2016/04/30 01:16PM EDT - Evacuation progress is currently at 100% Evac is complete.  The last sets of servers are preparing to be brought online.

Resolution Description

Data has been evacuated from the failed storage array and servers have been re-enabled.  Mail queues have been delivered and all services are restored.

DDoS (Distributed Denial of Service) attack – Resolved

Posted In: Other Issues — Apr 28th, 2016 at 10:07 am EDT by IX: Toi Santamaria
Shared services are affected

Incident Description:

Our system administrators detected a Distributed Denial of Service attack (DDoS), launched against the nameservers for CP12.

A DDoS is is an attempt to make a computer resource unavailable to its intended users. The way the attack is carried out varies as much as who is attacked and why. One common method of attack involves saturating the target (victim) machine with external communications requests. This creates so many false connections to the server, real attempts to connect cannot be completed. Because so many domains share an IP, it is not possible to determine which site the attack is directed at. In many cases, a temporary block is sufficient until the DOS attack passes, however, if the attack continues, the shared IP could remain blocked for an extended period of time.

In order to mitigate the attack and prevent larger service impact, system administrators have temporarily filtered all connections to those nameservers. Customers who do not have their DNS already cached will not be able to browse their sites.

Which Customers are Impacted?

All customers with websites that use CP12 nameservers. You can determine if your account uses CP12 by clicking the manage button next to your hosting account. The address in the address bar will tell you what CP you are located on.

How are Customers Impacted?

Customers who do not have their DNS already cached will not be able to browse their sites.

How often will we be updated?

Hourly

Time to Resolution (ETA)

Systems Administrators are working to mitigate the effects of the DDoS. We will update with an ETA as soon as one is available.

Incident Updates

  • 2016/04/28 10:20AM EDT - System Administrators are still investigating the best way to mitigate the DDoS
  • 2016/04/28 11:15AM EDT - No new information to provide at this time
  • 2016/04/28 11:20AM EDT - Our system administrators have removed the filters on CP12 DNS queries.  We have implemented new rules to mitigate the attack.  CP12 nameservers are now successfully answering queries
  • 2016/04/28 12:20PM EDT - The changes we have implemented are still having a positive impact.  Due to the large amount of traffic that is still incoming some queries may still timeout, but we have noticed an increase in the number of legitimate queries that are processed.
  • 2016/04/28 12:45PM EDT - The DDoS is still active, but we have successfully filtered it and all queries are being handled.  We are still actively monitoring the DDoS to see if there are any changes.

Resolution Description

The filter our System Administrators have implemented is working.    All incoming traffic to this nameserver is isolated to one provider to protect the other parts of our network from the attack.  We are monitoring it to make sure if anything changes we are aware.

Windows VPS Maintenance – April 29, 2016 – Postponed

Posted In: Maintenance — Apr 27th, 2016 at 2:31 pm EDT by IX: Brian S.

Incident Description:

On April 29th, 2016 at 11PM EDT we will be performing maintenance on our Windows VPS Node “WVZ7″  during which we will be replacing the CPU for this node.  The maintenance is expected to last for one hour.  During this time all customer servers on this node will be offline.

Which Customers are Impacted?

All customers with VPS products on WVZ7

How are Customers Impacted?

All services will be offline during the maintenance

How often will we be updated?

Hourly

Time to Resolution (ETA)

1 hour

Incident Updates

  • 2016/05/03 2:40PM EDT - Maintenance has been postponed.

Resolution Description

N/A

Semi-Annual Data Center Maintenance – Friday, April 29, 2016 – Resolved

Posted In: Maintenance — Apr 26th, 2016 at 10:57 am EDT by IX: Toi Santamaria
Cloud services are affected
VPS services are affected
Shared services are affected

Incident Description:

Beginning Friday, April 29th, 2016 from 11:00 PM EST – 4:00 AM EST, we will be conducting routine maintenance on our data centers major electrical systems.

The purpose is to test and repair any internal components and batteries, as well as to inspect the Power Distribution Units throughout the data center.

During the maintenance, the commercial power grid will be offline and we will  be  running entirely on our generator systems.  One at a time, we will take each UPS (we have two, UPS A and B) offline via Maintenance Bypass. 

The maintenance is scheduled to be completed within a 6 hour maintenance period.

Which Customers are Impacted?

All active customers will be affected.

How are Customers Impacted?

Backup power generators will be unavailable during maintenance, servers will run on UPS power backup until generator power is restored in the unlikely event of a power outage.

How often will we be updated?

6 hours

Time to Resolution (ETA)

Friday,April 29th, 2016,4:00 AM EST

Incident Updates

N/A

Resolution Description

N/A

Server Maintenance for Web 404 – April 24th, 2016 – Complete

Posted In: Maintenance — Apr 22nd, 2016 at 3:41 pm EDT by IX: Kyle H.
Shared services are affected

Incident Description:

At 11 p.m. EDT on April 24th, 2016 we will be performing maintenance on Web 404 in which we will need to take it offline in order to improve stability to server backups. The server will be unavailable for up to 30 minutes while maintenance is completed.

Which Customers are Impacted?

All customers on Web404

How are Customers Impacted?

Services will be unavailable

How often will we be updated?

30 minutes

Time to Resolution (ETA)

30 minutes

Incident Updates

  • 2016/4/24 11:20 PM EDT - Maintenance has started. Web404 will now be unavailable

Resolution Description

Maintenance is now complete and all services have resumed back to normal

Control Panel Maintenance – April 22nd, 2016

Posted In: Maintenance — Apr 21st, 2016 at 10:58 am EDT by IX: Brian S.

Incident Description:

At 4:30AM EDT on April 22nd, 2016 we will be performing maintenance on our Manage Control Panel.  We expect this maintenance to last for one hour and during this time access to the control panel will be unavailable.  This means that your websites, email, and databases will all be online, but access to edit your products and billing information will be unavailable.

Which Customers are Impacted?

All customers

How are Customers Impacted?

Customers will be unable to access their control panel to make edits to their account.  Email, websites, and databases will all be online

How often will we be updated?

We will update at the completion of the maintenance

Time to Resolution (ETA)

1 hour

Incident Updates

  • 2016/04/22 5:39 AM EST - Maintenance has stopped. Dashboard functions are now accessible.

Resolution Description

N/A

Mail510 – Planned Server Maintenance 04/20/2016 3am – 3:30am – Resolved

Posted In: Maintenance — Apr 19th, 2016 at 1:37 pm EDT by IX: Admin
Shared services are affected

Incident Description:

Mail510 was moved from a physical server to a VM today, Wednesday April 20th at 3AM, successfully.
All customers should should be able utilize this service without issue now

Which Customers are Impacted?

All customers on Mail510.

How are Customers Impacted?

All mail services will be unavailable during resync.

How often will we be updated?

Once maintenance is complete.

Time to Resolution (ETA)

30 minutes

Incident Updates

N/A

Resolution Description

Maintenance has completed and all mail services are up and running.

web1107 shared ip 50.6.22.2 – Filtered – Resolved

Posted In: Outage — Apr 15th, 2016 at 3:40 pm EDT by IX: Victoria Witten
Shared services are affected

Incident Description:

Our system administrators detected Distributed Denial of Service attack (DDoS), launched against the shared IP address of Web1107 – 50.6.22.2  In order to mitigate the attack and keep server online, system administrators filtered all connections to that IP address.

The Server is up  however the websites on shared ip 50.6.22.2 are down.

Once the attack is over, we will lift up the IP filter immediately.

Which Customers are Impacted?

All customers using the shared IP on web1107.

How are Customers Impacted?

Websites using the shared IP will not be reachable

How often will we be updated?

When the current filter expires.

Time to Resolution (ETA)

When current filter expires

Incident Updates

  • 2016/04/16 02:25 PM EDT- IP was filtered again today. Filter will expire at 2016-04-16 23:40:35
  • 2016/04/16 11:58 PM EDT - Filter has been re-added for 12 hours.
  • 2016/04/17 07:50 AM EDT - Filter has been removed.

Resolution Description

The filter has been removed and traffic has normalized. Thank you for your patience and cooperation.

iis1025 – Maintenance – Complete

Posted In: Maintenance — Apr 14th, 2016 at 2:22 pm EDT by IX: Brian S.

Incident Description:

System Administrators detected a problem with the HSphere installation on iis1025 that causes customers to be unable to switch versions of PHP and ASP.NET.  In order to repair this and to prevent further issues, they will be moving the server to another VM on April 15th, 2016 between 03:00 AM and 06:00 AM EDT
The server has already been synced will be taken offline for a final sync and brought back up on the new VM at this time.

Which Customers are Impacted?

All customers with websites on iis1025

How are Customers Impacted?

Websites will be unavailable while the server is offline

How often will we be updated?

Hourly

Time to Resolution (ETA)

3 hours from the start of maintenance

Incident Updates

  • 2016/04/15 03:10 AM EDT - Maintenance has started.
  • 2016/04/15 04:05 AM EDT - First phase of maintenance has completed. Server has been brought down for final sync.
  • 2016/04/15 04:43 AM EDT - 32% of accounts have been sync'd
  • 2016/04/15 05:19 AM EDT -  61% of account have been sync'd
  • 2016/04/15 06:10 AM EDT - All accounts have been sync'd on the new server and most services are now live. Currently administrators are working bringing up the control panel.

Resolution Description

Maintenance has completed and all services resumed back to normal. Thank you for your patience and cooperation.

WEB1107 shared ip: 50.6.22.2 – Filtered – Resolved

Posted In: Outage — Apr 14th, 2016 at 8:05 am EDT by IX: Toi Santamaria
Shared services are affected

Incident Description:

Our system administrators detected Distributed Denial of Service attack (DDoS), launched against the shared IP address of Web1107 – 50.6.22.2  In order to mitigate the attack and keep server online, system administrators filtered all connections to that IP address.

The Server is up  however the websites on shared ip 50.6.22.2 are down.

Once the attack is over, we will lift up the IP filter immediately.

Which Customers are Impacted?

All customers on web1107 .

How are Customers Impacted?

Websites are down.

How often will we be updated?

When current filter expires

Time to Resolution (ETA)

8:00 PM EST.

Incident Updates

N/A

Resolution Description

The filter has been removed and traffic has normalized. Thank you for your patience and cooperation.

iis1025 – Unable to Switch PHP or ASP.NET Versions – Resolved

Posted In: Outage — Apr 14th, 2016 at 3:35 am EDT by IX: Kristopher G.
Shared services are affected

Incident Description:

Our administrators have detected an issue with this server.  Customers will be unable to switch between different versions of PHP or ASP.NET.  Outside of that, the server is up and running as normal.  We do have additional maintenance planned to fix this issue in the future.

Which Customers are Impacted?

All customers utilizing this server.

How are Customers Impacted?

PHP and ASP.NET versions cannot be switched on the control panel

How often will we be updated?

Hourly

Time to Resolution (ETA)

1 hour

Incident Updates

  • 2016/04/14 04:57 AM EDT - Our administrators are currently working on resolving issues with IIS and the control panel on this server. ETA is extended 30 minutes.
  • 2016/04/14 05:45 AM EDT - Our administrators are currently working on resolving issues with communication between IIS and the control panel. ETA is extended 30 minutes.
  • 2016/04/14 06:17 AM EDT - Our administrators are currently working on resolving issues with communication between IIS and the control panel. ETA is extended 30 minutes.
  • 2016/04/14 8:24 AM EDT - Our administrators are currently working on resolving issues with communication between IIS and the control panel. ETA is extended 30 minutes.
  • 2016/04/14 8:50 AM EDT - The server is up and operational. At this time, the only issue remains is unability to switch versions of PHP and ASP.NET at the control panel

Resolution Description

N/A

 
© 2011 IX Web Hosting.