we are currently encountering an issue in our colocation DC3
the datacenter provider (Colt Telekom) is informed, ticket created, we are waiting for Colt so solve the problem.
update 01:10 am: DC3 reachable again
we are currently encountering an issue in our colocation DC3
the datacenter provider (Colt Telekom) is informed, ticket created, we are waiting for Colt so solve the problem.
update 01:10 am: DC3 reachable again
UPDATE 2024-05-30 13:50 CEST
We have verified that all systems are currently working as designed. Availability of the affected services should be back to normal. We will continue to closely monitor the situation.
UPDATE 2024-05-29 17:41 CEST
mitigation is still ongoing as large volumes have to be moved. we continue to work 24×7
Dear Customer,
We are experiencing temporary downtimes of shared infrastructure services like Gitlab, Harbor, Rancher. Customer environments are not affected directly however in some cases e.g. some deployments are not possible. investigations have been difficult but we are are working with highest priority on the mitigation of the problem.
Best regards
Your CONVOTIS Munich Managed Service Team
Dear Customer,
A standard change on the internal Firewall in our DC3HAM colocation triggered a high availability problem because of a years old unnoticed configuration error. This error resulted in the intrusion prevention blocking hosts that should not have been blocked, including DNS. These cascading error unfortunately took a while to cleanup.
A complete review of the firewall rules is already underway by 2 people independently.
Best regards
Your MCON Managed Service Team
Update 2023-01-25 12:26 CET
Summary of Impact: Between 07:05 UTC and 09:45 UTC on 25 January 2023, customers experienced issues with networking connectivity, manifesting as network latency and/or timeouts when attempting to connect to Azure resources in Public Azure regions, as well as other Microsoft services including M365 and PowerBI.
Preliminary Root Cause: We determined that a change made to the Microsoft Wide Area Network (WAN) impacted connectivity between clients on the internet to Azure, connectivity between services within regions, as well as ExpressRoute connections.
Mitigation: We identified a recent change to WAN as the underlying cause and have rolled back this change. Networking telemetry shows recovery from 09:00 UTC onwards across all regions and services, with the final networking equipment recovering at 09:35 UTC. Most impacted Microsoft services automatically recovered once network connectivity was restored, and we worked to recover the remaining impacted services.
Next Steps: We will follow up in 3 days with a preliminary Post Incident Report (PIR), which will cover the initial root cause and repair items. We’ll follow that up 14 days later with a final PIR where we will share a deep dive into the incident.
You can stay informed about Azure service issues, maintenance events, or advisories by creating custom service health alerts (https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation) and you will be notified via your preferred communication channel(s).
***************************************************
Update 2023-01-25 11:48 CET , see also https://azure.status.microsoft/de-de/status
Between 07:05 UTC and 09:45 UTC on 25 January 2023, customers may have experienced issues with networking connectivity, manifesting as network latency and/or timeouts when attempting to connect to Azure resources in Public Azure regions, as well as other Microsoft services including M365, PowerBI.
We’ve determined the network connectivity issue was occurring with devices across the Microsoft Wide Area Network (WAN). This impacted connectivity between clients on the internet to Azure, as well as connectivity between services in datacenters, as well as ExpressRoute connections.
Current Status:
We have identified a recent change to WAN as the underlying cause, and have taken steps to roll back this change. Our telemetry shows consistent signs of recovery from 09:45 UTC onwards across multiple regions and services. Most customers should now see full recovery as WAN networking has recovered fully.We are working to monitor and ensure full recovery for services that were impacted.
The next update will be in 30 minutes or as soon as we have further information.
This message was last updated at 10:28 UTC on 25 January 2023
***************************************************
Update 2023-01-25 10:37 CET
Starting at 07:05 UTC on 25 January 2023, customers may experience issues with networking connectivity, manifesting as network latency and/or timeouts when attempting to connect to Azure resources in Public Azure regions, as well as other Microsoft services including M365, PowerBI.
We’ve determined the network connectivity issue is occurring with devices across the Microsoft Wide Area Network (WAN). This impacts connectivity between clients on the internet to Azure, as well as connectivity between services in datacenters, as well as ExpressRoute connections. The issue is causing impact in waves, peaking approximately every 30 minutes.
We have identified a recent WAN update as the likely underlying cause, and have taken steps to roll back this update. Our latest telemetry shows signs of recovery across multiple regions and services, and we are continuing to actively monitor the situation.
This message was last updated at 09:36 UTC on 25 January 2023
***************************************************
Dear Customer,please be informed about the following current announcement on azure.status.microsoft:
“Starting at 07:05 UTC on 25 January 2023, customers may experience issues with networking connectivity, manifesting as network latency and/or timeouts when attempting to connect to Azure resources in multiple regions, as well as other Microsoft services.We are actively investigating and will share updates as soon as more is known.
This message was last updated at 08:53 UTC on 25 January 2023 ”
Best regards
Your MCON Managed Service Team
Dear customers,
We are experiencing an outage in our colocation DC3.HAM. We are currently investigating the root cause and will update you soonest.
We apologize for any inconvenience.
update 21:50: lumen confirmed a problem in their network and started fixing. we escalated to Lumen management.
update Lumen 22:42: As this network fault is impacting multiple clients, the event has increased visibility with Lumen leadership. As such, client trouble tickets associated to this fault have been automatically escalated to higher priority.
update Lumen 00:20: Further troubleshooting has isolated the trouble to a local providers network. The local provider has dispatched a field team. Work is underway to obtain an estimated time of arrival.
update 03:30: Lumen restored the connectivity, all systems are reachable again
UPDATE 2022-04-12 15:15 CEST
Lumen finally succeeded to reconnect their datacenter in Hamburg which hosts our colocation DC3.HAM.
We have been checking and verifying all systems afterwards.
Systems including our ticket system are back and available.
UPDATE 2022-04-12 14:35 CEST
We are checking and verifying all systems and monitorings
UPDATE 2022-04-12 14:30 CEST
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (ScLo)[SUMMARY OF WORK]
The Lumen NOC advises some services have begun to clear and the local provider continues to repair the remaining damaged fiber cable.
Checking on local network connections.***************************************************
UPDATE 2022-04-12 11:48 AM CEST
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (ScLo)[SUMMARY OF WORK]
The cause of the services interruption identified as Force Majeure Dortmund, Germany. Fibre maintainers are on site and work is ongoing.
We continue to push for ETR.***************************************************
UPDATE 2022-04-12 09:56 AM CEST
Our colocation DC3HAM is still not available. The carrier Lumen is working on the problem.
Unfortunately also our ticket system is affected, we are reachable by mail.
We continue to push for ETR.
Dear customers,
We are experiencing an outage in our colocation DC3.HAM, obviously caused by our provider Lumen. We are in escalation contact with Lument about this and will update you soonest.
We apologize for any inconvenience.
Lumen finally succeeded around 01:30 am on Saturday to reconnect their datacenter in Hamburg which hosts our colocation DC3.HAM.
We have been checkng and verifying all systems afterwards.
Systems including our ticket system are back and available.
We will follow up with Lumen on an incident report.
——————————————————————————————-
Colocation DC3 seems to be back online, we are checking related systems from MCON side
——————————————————————————————-
*** CASCADED EXTERNAL NOTES 2022-03-12 00:05:37 GMT From CASE: 23317190 – SM Parent
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk ()
[SUMMARY OF WORK]
Good Morning
We are now seeing the services restored, we have asked the vendor to provide a full RFO.
We will keep you updated with all progress.
Kind Regards
Lumen
[PLAN OF ACTION]
Investigating RFO
[TIME – NOW] 2022-03-12 00:05 (UTC)
***************************************************
UPDATE 2022-03-11 23:56 AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (ScLo)[SUMMARY OF WORK]
Good AfternoonColt partner has confirmed the completion of splicing however Colt customers services are still down. Partner has been requested to recheck the splicing. We will keep you updated on our progress via this email address. Thank you
We will continue to push for an ETR.
[PLAN OF ACTION]
[TIME – NOW] 2022-03-11 20:58 (UTC)
[UPDATE ETA]
***************************************************
UPDATE 2022-03-11 19:06 AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (ScLo)[SUMMARY OF WORK]
Good AfternoonPlease be advised the local provider last mile field engineers continue in repair efforts for fix.
We continue to push for ETR.
[PLAN OF ACTION][TIME – NOW] 2022-03-11 17:58 (UTC)
[UPDATE ETA] 2022-03-11 19:15 (UTC)
***************************************************
UPDATE 2022-03-11 18:10 AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (ScLo)[SUMMARY OF WORK]
Good AfternoonPlease be advised we are pushing for an ETR from the local provider. They have stated cable repair preperation is ongoing and will update further.
[PLAN OF ACTION]
[TIME – NOW] 2022-03-11 16:40 (UTC)
[UPDATE ETA] 2022-03-11 17:40 (UTC)
***************************************************
UPDATE 2022-03-11 15:59 AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (ScLo)[SUMMARY OF WORK]
Good AfternoonPlease be advised our local providers last mile partner confirmed that situation is complex as damage location is occupied with heavy construction machinery which need to be cleared for digging work. Civil work is ongoing and expected time of restoration is awaited.
[PLAN OF ACTION]
We will follow up with further update in next 2 hours.[TIME – NOW] 2022-03-11 14:42 (UTC)
[UPDATE ETA] 2022-03-11 16:42 (UTC)
***************************************************Next update by: 2022-03-11 16:45 GMT
UPDATE 2022-03-11 14:54 AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (ScLo)[SUMMARY OF WORK]
Good AfternoonPlease be advised that we continue to work with the local provider for updates and progress in relation to this case, we have requested an urgent update and confirmation of when service will be restored, as original ETR provided has now passed, we will aim to provide a further update in the next 60 minutes
[PLAN OF ACTION]
await local provider update and update customer once feedbacj received[TIME – NOW] 2022-03-11 13:43 (UTC)
[UPDATE ETA] 2022-03-11 14:43 (UTC)
***************************************************Next update by: 2022-03-11 14:45 GMT
UPDATE 2022-03-11 14:20 AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (ScLo)[SUMMARY OF WORK]
Good AfternoonPlease be advised that we can confirm engineers continue to work to restore service, as advised previously, we have been given an ETR of 13:00 GMT, and this still stands at this time , however damage to fibre was extensive so this maybe pushed back ,we will aim to provide a further update in the next 60 minutes
[PLAN OF ACTION]
await local provider update and update customer once feedback received[TIME – NOW] 2022-03-11 12:20 (UTC)
[UPDATE ETA] 2022-03-11 13:20 (UTC)
***************************************************Next update by: 2022-03-11 13:20 GMT
UPDATE 2022-03-11 08:18AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (ScLo)[SUMMARY OF WORK]
Good MorningPlease be advised that we can confirm that engineers are onsite and working to restore service, the fibre break is located at Wuppertal City Germany, the local provider has confirmed that the ETR for completion of the work is 13:00 GMT
We will aim to provide a further update around 12:00-12:30 GMT to confirm if we are still on target for the ETR, once we have this confirmation, we will forward this over to you
[PLAN OF ACTION]
chase local provider around 12:30 to confirm we are still on target for ETR provided, once confirmed update customer[TIME – NOW] 2022-03-11 09:42 (UTC)
***************************************************Next update by: 2022-03-11 12:15 GMT
UPDATE 2022-03-11 08:18AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk () [SUMMARY OF WORK]
Good MorningWe are still awaiting testing from the vendor. We will keep you updated in all progress.
Kind Regards
Lumen[PLAN OF ACTION]
Investigating[TIME – NOW] 2022-03-11 07:15 (UTC)
***************************************************
UPDATE 2022-03-11 07:19AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (AsHa)[SUMMARY OF WORK]
Good Afternoon,Field engineers are actively repairing the fault and we shall update you as soon as information is available.
Kind Regards,
Lumen
[PLAN OF ACTION]
Engage Local Carrier[TIME – NOW] 2022-03-11 06:17 (UTC)
***************************************************
UPDATE 2022-03-11 03:37AM CET
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (AsHa)[SUMMARY OF WORK]
Good Afternoon,Our local carrier have informed us there engineers are expected to arrive at the affected location at 04:30 GMT. We will update you at this time of there findings.
Kind Regards
Lumen
[PLAN OF ACTION]
Engage Local Carrier[TIME – NOW] 2022-03-11 02:23 (UTC)
***************************************************
UPDATE 2022-03-11 02:39AM CET
[CUSTOMER UPDATE] EMEA Service Desk (AsHa)[SUMMARY OF WORK]
Good Afternoon,There is an issue in our partners network on a link between Dortmund and Dusseldorf, Germany.We will inform you accordingly of any information as it becomes available.
Kind Regards
Lumen
[PLAN OF ACTION]
Engage Local Carrier[TIME – NOW] 2022-03-11 01:36 (UTC)
***************************************************Next update by: 2022-03-11 02:40 GMT
UPDATE 2022-03-11 02:22AM CET
no update from support of datacenter carrier LUMEN
escalation level has been raised
UPDATE 2022-03-11 00:56AM CET
ticket has been raised at support of datacenter carrier LUMEN
***************************************************
[CUSTOMER UPDATE] EMEA Service Desk (AsHa)
[SUMMARY OF WORK]
Good Afternoon,
Your services are affected by a major outage in our local carrier’s network. We are engaging them and will updat eyou accordingly.
Kind Regards,
Lumen
[PLAN OF ACTION]
Engage Local Carrier
[TIME – NOW] 2022-03-10 23:57 (UTC)
**********************************************
Next update by: 2022-03-11 01:00 GMT
UPDATE 2022-03-10 11:48PM CET
we just identified that uplinks of or datacenter carrier LUMEN are down right now
this being said we are currently tracking with their support to find a quick solution
UPDATE 2022-03-10 11:00PM CET
network is currently not working as expected
thus sites and services are not available right now
we are working with high pressure to resolve this issue as fast as possible
Update 07:04 PM CET:
RESOLVED
my.mcon-group.com is up&running now.
Dear customer,
my.mcon-group.com is temporarely not available.
We are working on it and will update you soonest.
We apologize for any inconvenience.
Dear Customer,
a trivial change on our external firewall with a (normal) subsequent sync to the secondary device caused both firewalls to go into “disabled”
state and not forward any packets anymore.
An deactivation and activation of the High Availability solved the problem.
The connection interruption lasted from 11:08 – 11:41 CEST, we apologize for any inconvenience.
We are in contact with the developers to find the rootcause of this issue.
We apologize for any inconvenience.
RESOLVED
UPDATE 2021-08-28 02:59PM CEST
On Friday 27th August 19:30 a redundant storage cluster in our colocation DC3.HAM was failing during normal operations.
After onsite ananlyis we found that the storage stopped all services due to a suspected split brain error.
As a result of the storage cluster virtual servers running on VMware could not run properly.
The repair of the storage cluster was started immediately after analysis and finished around 28th August 5a.m.
After storage recovery all running virtual servers have been restarted and checked, all production systems have been up and running after 28th August 09:45 a.m.
We are in further analysis of the root cause.
UPDATE 2021-08-28 10:25AM CEST
most systems are back, we are working to fix remaining problems mainly on QA system
UPDATE 2021-08-27 07:30PM CEST
VM storage cluster is currently not working as expected
thus sites and services are not available right now
we are working with high pressure to resolve this issue as fast as possible