Issues With Mail Server 1

Mail server 1 (78.129.255.15) is currently experiencing issues and rejecting requests.  We are looking into this as a matter or urgency and will post further information when available.

Update 9:55am - Mail server 1 is now responding correctly and the service is back to normal.  Please accept our apologies for this short outage of this service.

Comments Off

Data-centre connectivity

The data-centre that houses the majority of our servers is currently experiencing connectivity issues.  A proportion of the traffic into the data-centre will currently be experiencing increased latency and packet loss at varying levels. In some instances this may appear to be complete connectivity loss.  The issue seems to be originating from a routing problem at a peering point.

We will update this post with further information when available.

Update 1:35pm - The initial problem with a peering point that occurred earlier has led to a problem within the VSS routing cluster that services the data-centre. The data-centre network team quickly eliminated as many causes as possible. This issue was then escalated to Cisco and they are currently involved in joint investigation to try and discern the underlying problem.

As soon as we have any progress from this work we will post the information here.

Update 2:30pm – Today’s network connectivity issue should now be resolved and service should be returning to normal.

Comments Off

Data Centre Essential Power Systems Maintenance

The data centre that houses the majority of our servers will be carrying out essential repairs to their main power systems on Saturday 2nd October 2010 between the hours of 00:00 and 08:00 (UK Time).

Unfortunately due to the nature of the repairs all power to our servers will be unavailable for at least four hours.  During this time all our Helm based windows hosting will be unavailable, including websites, email, databases and other services.  This will also affect the control panel and our own website.  

Our support and status websites will continue to be available as these are housed in a separate data centre, however, please do not submit tickets in relation to this maintenance during the maintenance window. 

During this outage DNS will be handled by our secondary DNS server and emails will be queued to our backup mail servers as these are also in a separate data centre.  Therefore during this time no emails will be lost.

A full breakdown of the maintenance schedule is below:-


Summary

  • The data centre will be enacting procedures for a controlled power down and power restore. This will happen between the hours of 00:00 and 08:00 on Saturday 02/10/10.
  • These procedures are being put into action to allow essential repairs to the main power systems. For further details on these repairs please see ‘Technical explanation of cause’.
  • As part of procedures for restoration of power there will additional staff on site at the data centre. They will be checking that all services have power restored and pro-actively investigating any issues.


Sequence of events

  • 00:00: We will start executing a soft shutdown on all servers  in the data centre.
  • 01:00: Mains power will be systematically shut down.
  • 05:00: The data centre aims to begin restoring power systematically across the site, in stages.
    Technical staff will begin pro-actively investigating and restoring service to any problematic servers.
  • 08:00: The data centre aims to have all servers back online.


Technical explanation of cause

The data centre recently installed a fourth 500kva UPS into the data centre to add resilience and capacity to their power systems.  The data centre was designed with this in mind, so the appropriate connections exist on the Schneider UPS Output Panel (that feeds all servers from a common busbar) to connect this UPS. 

During the installation of the UPS, engineers discovered there was a fault with the panel which meant the additional UPS could not be connected.  To repair this fault, they need to electrically isolate the panel, as engineers cannot work on the panel while it is live for safety reasons.

The manufacturer of the panel, Schneider, is sending a team of engineers to repair the panel so it can accept the fourth 500kva UPS.

 

Please accept our apologies for this extended maintenance which is unfortunately out of our control.

Update 02/10/10 7:10am - The work is now complete and all servers and services are back online.  Our servers started coming online at 4:35am with the last server at 5:55am. 

All services appear to be functioning normally but if you have any issues please let us know.

Comments Off