User avatar
ab1jx
Posts: 868
Joined: Thu Sep 26, 2013 1:54 pm
Location: Heath, MA USA
Contact: Website

What happened to British Airways computers?

Mon May 29, 2017 1:22 am

The BBC app just says it was some kind of a global power failure, which is none too technical and doesn't make much sense. 1000 or so flights grounded over a couple days, and I gather it wasn't even caused by a Windows problem.

Now the BBC is saying a third day, but how could they be so crippled? http://www.bbc.com/news/uk-40081112

There used to be a usenet news group about the risks of adopting computers, this is the sort of stuff they covered. I don't have usenet access any more.

User avatar
bensimmo
Posts: 4187
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: What happened to British Airways computers?

Mon May 29, 2017 8:20 am

But three days out would be no time or probably money compared to what they have gained by throughput from using computers.

Bank holiday weekend and probably one of the many grumpy striking style workers switched a plug off ;-)

Ernst
Posts: 1257
Joined: Sat Feb 04, 2017 9:39 am
Location: Germany

Re: What happened to British Airways computers?

Mon May 29, 2017 10:10 am

Nothing special, you will see this more often in the future.
I have read the bbc article and I can fully understand why things like this happen because it is a new trend in the IT world to cut costs and offshore to "lower" cost countries and at the same time releasing experienced IT experts into retirement.
The road to insanity is paved with static ip addresses

User avatar
B.Goode
Posts: 8987
Joined: Mon Sep 01, 2014 4:03 pm
Location: UK

Re: What happened to British Airways computers?

Mon May 29, 2017 10:21 am

What happened to British Airways computers?
We don't know. The people who might know are either not making it public, or maybe don't fully understand yet.

In the absence of informed reliable statements about the underlying cause anything else has to be regarded as conjecture (politely) or as "Fake News".

hippy
Posts: 6255
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: What happened to British Airways computers?

Mon May 29, 2017 7:14 pm

B.Goode wrote:In the absence of informed reliable statements about the underlying cause anything else has to be regarded as conjecture (politely) or as "Fake News".
Indeed. BA CEO Alex Cruz is saying there was a brief power surge, a backup system which did not kick-in at the time but was restored later. But there has been no explanation as to how that evolved into the outcome witnessed so far.

http://www.bbc.co.uk/news/uk-40083778

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: What happened to British Airways computers?

Mon May 29, 2017 8:26 pm

A better question is what the customers were thinking. The numbers presented show an order of magnitude more effected customers in one day than should be served by all airlines worldwide combined in a single day (based on a rough estimate of a world population of 12,000,000,000). So what were people thinking to cause that many people to be taking commercial flights at that time?
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 13910
Joined: Tue Jul 17, 2012 3:02 pm

Re: What happened to British Airways computers?

Mon May 29, 2017 9:00 pm

12,000,000,000? Where did you get that from?

The world population is ridiculously huge but I think we are only up to seven and a half billion.

In the states it's a long weekend holiday so I guess that accounts for a lot of extra traffic.
Memory in C++ is a leaky abstraction .

drgeoff
Posts: 9924
Joined: Wed Jan 25, 2012 6:39 pm

Re: What happened to British Airways computers?

Mon May 29, 2017 10:06 pm

Heater wrote: In the states it's a long weekend holiday so I guess that accounts for a lot of extra traffic.
And in the UK. The replacement for Whit Monday. https://en.wikipedia.org/wiki/Whitsun

User avatar
mahjongg
Forum Moderator
Forum Moderator
Posts: 12411
Joined: Sun Mar 11, 2012 12:19 am
Location: South Holland, The Netherlands

Re: What happened to British Airways computers?

Mon May 29, 2017 11:47 pm

British Airways, like most sensible airways, probably uses a mainframe computer. They certainly won't use a low end consumer OS like Windows for a mission critical system (I hope).
But even a mainframe can develop a glitch.

peterlite
Posts: 720
Joined: Sun Apr 17, 2016 4:00 am

Re: What happened to British Airways computers?

Tue May 30, 2017 3:34 am

If it is a centralised system and breaks, recovery is not automatic and requires the recovery of everything. If their system ran on 500,000 Pi 3Bs, they would be recovering only one and all the other transactions/flights would be unaffected.

Recovery is a b*tch. This is where you find out all the changes made after the last test of the recovery procedure. Your IBM mainframe recovery procedure, "Press button B", assumes:
* You are using staff in London, not Pune, India.
* An IBM mainframe in London, not Lenovo servers in the Diaoyu islands.
* Z390, not DragonflyBSD, which is only half implemented as part of a conversion from MirOS.
* Your backups will restore despite using a new encryption system that is not yet tested through decryption.
* The decryption passwords will be available when that IT guy is out of a coma from that thing with a bus.
* You can always revert to the tape backups made on the tape drives you dumped in the trash.

User avatar
DougieLawson
Posts: 36570
Joined: Sun Jun 16, 2013 11:19 pm
Location: Basingstoke, UK
Contact: Website Twitter

Re: What happened to British Airways computers?

Tue May 30, 2017 4:29 am

mahjongg wrote:British Airways, like most sensible airways, probably uses a mainframe computer. They certainly won't use a low end consumer OS like Windows for a mission critical system (I hope).
But even a mainframe can develop a glitch.
They used to use a mainframe. It's not clear if they've moved off that to X86 blades running Windows Server.

Mainframes tend to have uninterupptable power supplies that get tested. They also tend to have disaster recovery hot standby systems that get tested.
Note: Having anything humorous in your signature is completely banned on this forum. Wear a tin-foil hat and you'll get a ban.

Any DMs sent on Twitter will be answered next month.

This is a doctor free zone.

User avatar
DougieLawson
Posts: 36570
Joined: Sun Jun 16, 2013 11:19 pm
Location: Basingstoke, UK
Contact: Website Twitter

Re: What happened to British Airways computers?

Tue May 30, 2017 4:39 am

peterlite wrote: * Z390, not DragonflyBSD, which is only half implemented as part of a conversion from MirOS.
The zSeries operating systems are z/OS, z/VM, z/VSE, zLinux or zTPF.

No such thing as z390. No such hardware either. Current top end zSeries mainframe is a z13.

There's no more IBM Blue paint either.
Image
Note: Having anything humorous in your signature is completely banned on this forum. Wear a tin-foil hat and you'll get a ban.

Any DMs sent on Twitter will be answered next month.

This is a doctor free zone.

User avatar
bensimmo
Posts: 4187
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: What happened to British Airways computers?

Tue May 30, 2017 7:24 am

drgeoff wrote:
Heater wrote: In the states it's a long weekend holiday so I guess that accounts for a lot of extra traffic.
And in the UK. The replacement for Whit Monday. https://en.wikipedia.org/wiki/Whitsun
Half term too, start of it for many areas of the country. So lots of families heading off on holiday all at once.

User avatar
RaTTuS
Posts: 10500
Joined: Tue Nov 29, 2011 11:12 am
Location: North West UK
Contact: Twitter YouTube

Re: What happened to British Airways computers?

Tue May 30, 2017 10:11 am

How To ask Questions :- http://www.catb.org/esr/faqs/smart-questions.html
WARNING - some parts of this post may be erroneous YMMV

1QC43qbL5FySu2Pi51vGqKqxy3UiJgukSX
Covfefe

User avatar
ab1jx
Posts: 868
Joined: Thu Sep 26, 2013 1:54 pm
Location: Heath, MA USA
Contact: Website

Re: What happened to British Airways computers?

Mon Jun 05, 2017 10:00 pm

Well I was shocked. I've known about database replication for 15 years or so, it was hard to understand why flights all over the world were being affected. A power surge? Worldwide? Over leased lines or something? I mean this is the era of Bitcoin and huge blockchains that there are many copies of. Their route map shows most flights going through Heathrow but I thought their computers would be more distributed.
ARM-BA.gif
ARM-BA.gif (55.5 KiB) Viewed 4098 times
Lately these seem to have surfaced but I gather there's still some mystery involved and investigations are underway.
http://www.bbc.com/news/technology-40118386
http://www.bbc.com/news/business-40159202

peterlite
Posts: 720
Joined: Sun Apr 17, 2016 4:00 am

Re: What happened to British Airways computers?

Tue Jun 06, 2017 6:53 am

Maybe they used PoHTTP to power their network. Someone unplugged the TP-Link router in head office. They no longer had packets of power going out to other devices. System down...

Someone should have told them about the Pi Zero and how Zero stands for Zero electricity use. :ugeek:

User avatar
RaTTuS
Posts: 10500
Joined: Tue Nov 29, 2011 11:12 am
Location: North West UK
Contact: Twitter YouTube

Re: What happened to British Airways computers?

Tue Jun 06, 2017 7:07 am

https://www.theregister.co.uk/2017/06/0 ... _analysis/
has a bit more info -
however er people error was the main cause
switch it off then back on again ... not good this time
How To ask Questions :- http://www.catb.org/esr/faqs/smart-questions.html
WARNING - some parts of this post may be erroneous YMMV

1QC43qbL5FySu2Pi51vGqKqxy3UiJgukSX
Covfefe

Heater
Posts: 13910
Joined: Tue Jul 17, 2012 3:02 pm

Re: What happened to British Airways computers?

Tue Jun 06, 2017 7:24 am

Now they want blame some guy for pulling the plug, and plugging it back incorrectly.

I don't buy it.

A human error like that is no different than some random hardware failure.

There is no way bringing down one part of your distributed system should cause total failure for days.

Oh, they did not have a distributed system.... well, that's not the guys fault now is it.

People like Facebook yank power on their data centers at random all the time. Just to see that everything keeps humming nicely.

Reminds me of the time I was working on the team testing the fly-by-wire Primary Flight Computers of the Boeing 777. Before the first flight of the 777 the test pilot climbed into the plane, yanked out all the circuit breakers and then restarted all the systems. Half of them did not come up again. Well, he was not flying that machine anywhere til that issue was resolved.
Memory in C++ is a leaky abstraction .

User avatar
RaTTuS
Posts: 10500
Joined: Tue Nov 29, 2011 11:12 am
Location: North West UK
Contact: Twitter YouTube

Re: What happened to British Airways computers?

Tue Jun 06, 2017 7:40 am

yes you cannot blame someone plugging it in wrong that your backup system is not working ...
How To ask Questions :- http://www.catb.org/esr/faqs/smart-questions.html
WARNING - some parts of this post may be erroneous YMMV

1QC43qbL5FySu2Pi51vGqKqxy3UiJgukSX
Covfefe

S0litaire
Posts: 216
Joined: Thu Dec 29, 2011 4:24 pm
Location: Ayrshire, Scotland
Contact: ICQ Skype Twitter

Re: What happened to British Airways computers?

Tue Jun 06, 2017 12:05 pm

this is an apt comic..
Image
--
Laters

Bill "Solitaire" C

Anáil nathrach, ortha bhas betha, do cheol déanta

User avatar
DougieLawson
Posts: 36570
Joined: Sun Jun 16, 2013 11:19 pm
Location: Basingstoke, UK
Contact: Website Twitter

Re: What happened to British Airways computers?

Thu Jun 08, 2017 6:12 am

Heater wrote:Now they want blame some guy for pulling the plug, and plugging it back incorrectly.
There's a somewhat rude word that starts with a "B", ends with a "t" and has "ullshi" in the middle to describe Willy Walsh's attempt to keep his c-level job and to avoid IAG/BA having to pay €600 penalty to every passenger (into or out of the EU) that was affected by their critical failure.

Every major data centre has UPS, every major computing system has hot-standby (or your money saving efforts are cutting off noses to spite faces). The problem is more often the mass of "stuff" between the data centre and the worker typing stuff on their screen at the airport check-in or baggage drop that may have to re-establish its network connection to the data centre on failover.

If one lone contractor can cause an critical failure then the problem lies in their hardware planning, their data centre access, their "four-eyes" buddy checking and all of that stuff that should take place to ensure reliability and continuity. That again becomes Willy Walsh's problem if the IAG/BA processes are not fit for purpose.
Note: Having anything humorous in your signature is completely banned on this forum. Wear a tin-foil hat and you'll get a ban.

Any DMs sent on Twitter will be answered next month.

This is a doctor free zone.

BMS Doug
Posts: 3824
Joined: Thu Mar 27, 2014 2:42 pm
Location: London, UK

Re: What happened to British Airways computers?

Thu Jun 08, 2017 8:09 am

DougieLawson wrote:
Heater wrote:Now they want blame some guy for pulling the plug, and plugging it back incorrectly.
There's a somewhat rude word that starts with a "B", ends with a "t" and has "ullshi" in the middle to describe Willy Walsh's attempt to keep his c-level job and to avoid IAG/BA having to pay €600 penalty to every passenger (into or out of the EU) that was affected by their critical failure.

Every major data centre has UPS, every major computing system has hot-standby (or your money saving efforts are cutting off noses to spite faces). The problem is more often the mass of "stuff" between the data centre and the worker typing stuff on their screen at the airport check-in or baggage drop that may have to re-establish its network connection to the data centre on failover.

If one lone contractor can cause an critical failure then the problem lies in their hardware planning, their data centre access, their "four-eyes" buddy checking and all of that stuff that should take place to ensure reliability and continuity. That again becomes Willy Walsh's problem if the IAG/BA processes are not fit for purpose.
I completely agree, all critical systems are supposed to be designed to avoid a single point of failure, I can only see two scenarios in which you would lose everything:
Inadequate design.
Deliberate sabotage.

Data centers that I have worked in have Power Distribution Units (PDU) powering the equipment in each Equipment room. (Each piece of equipment would only be fed by one PDU).
The PDU has two power feeds, mains and UPS.
In the event of mains failure the PDU switches seamlessly to UPS until the generator systems have kicked in and the mains circuit is back up.
Once the incoming power supply is restored the generator can be manually switched back to mains.

The Data center would have 2-3 UPS systems, two incoming mains power supplies, and redundant generators (usually 1-2 more generators than required for full building load).

If the Equipment is correctly set up I would expect it to be spread out between multiple data centers, failing that it would be split between multiple equipment rooms within the same data center. I would not expect all of the equipment to be in a single equipment room, on a single PDU or on a single UPS system.

If all of the equipment (or enough of it to cause a single point failure) was in one room then the EPO (emergency power off, Big Red Button near the door) could take it all down in one hit. This fits the scenario described (contractor turned it off then quickly turned it back on again)

If all of the equipment was on a single UPS system then a power fluctuation while that UPS was in bypass mode (for servicing) would have a similar effect.
Doug.
Building Management Systems Engineer.

hippy
Posts: 6255
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: What happened to British Airways computers?

Thu Jun 08, 2017 8:48 am

My suspicion is they had inadvertently created some kind of deadlock situation. Code which updates a database can be difficult to sort out and get running again if that code expects the initial database to exist when it doesn't.

User avatar
DougieLawson
Posts: 36570
Joined: Sun Jun 16, 2013 11:19 pm
Location: Basingstoke, UK
Contact: Website Twitter

Re: What happened to British Airways computers?

Thu Jun 08, 2017 6:26 pm

hippy wrote:My suspicion is they had inadvertently created some kind of deadlock situation. Code which updates a database can be difficult to sort out and get running again if that code expects the initial database to exist when it doesn't.
Not a chance of that. They're not disclosing the truth because it's going to be embarrassing for Willy Walsh (who wants to keep his bonus). It's bound to be due incompetence and lack of planning (possibly off-shore) during recovery. Unless someone is willing to risk their job and disclose everything we'll never get round the mis-information and terminological inexactitudes that we've had so far.

Having seen the way part of an organisation I work for recovered from a significant power outage this week (albeit not in a data centre but in a very key installation) was amazing, there's an airline that could learn a lot from it. They had a robust plan, they had a call out list, they got the right folks to focus on solving their sections of problem in an organised & structured way. The event was out of the blue, the already built and tested recovery plan simply worked and worked well.
Note: Having anything humorous in your signature is completely banned on this forum. Wear a tin-foil hat and you'll get a ban.

Any DMs sent on Twitter will be answered next month.

This is a doctor free zone.

User avatar
rpdom
Posts: 15591
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: What happened to British Airways computers?

Thu Jun 08, 2017 6:48 pm

I have seen some interesting power failures in datacentres - some of which the UPS and backup generators handled and some they didn't. Where I work now is much better in handling failures with multiple redundancy over several datacentres.

Return to “Off topic discussion”