United finds loose bolts on plug doors during 737 Max 9 inspections

MarkMarine · 2024-01-09T06:55:10

I was responsible for safe for flight inspections on military aircraft and the photo included in that post is completely insane to me.

Those bolts being loose (and they are BIG bolts) would mean multiple people in the installation process didn’t do their jobs, and signed their life on the line saying they did.

When I did maintenance, there was someone (QA) there to witness every torqued bolt, inspect every safety wire and installed part.

There is something rotten in Boeing.

RF_Savage · 2024-01-09T06:59:24

The management imported from McDonnel-Douglas?

The same management that drove it to ground?

The fish rots from the head. And these constant problems sure do sound like a new company culture of cutting corners instead engineering first.

giardini · 2024-01-09T20:50:56

Wby the rush to judgement ("The fish rots from the head."). What was done and not done will be traced and the causes determined with good old engineering and detective work. Responsibility will be revealed and the proper parties punished.

Rushing to judgement merely obscures the true responsibilities.

cyanydeez · 2024-01-09T22:42:45

the point is maintenance personnel might be directly signing off but that doesn't mean they're directly responsible if their management set unrealistic timetables. that's the point, systems are as accountable as individuals but often individuals are easier to indict and scapegoat.

markdown · 2024-01-09T07:41:10

> The management imported from McDonnel-Douglas?

I've heard this since they killed hundreds of people in the two crashes. Why are they being protected? They have names.

rob74 · 2024-01-09T08:20:19

Because the shareholders like the management? Cutting corners to increase (short-term) profits is more popular with them than focusing on quality. And now that the company has been in the red consistently since 2019 (https://www.statista.com/chart/20660/boeing-earnings-loss/), of course the answer has to be to lower costs even more wherever possible in order to return to profitability...

caycep · 2024-01-09T18:08:03

So should something as mission critical as an aircraft manufacturing co have some ownership/oversight that's not public shareholders? Give the NTSB a few seats on the board

dotancohen · 2024-01-10T02:06:49

Who do you think the NTSB are?

The NTSB is staffed with people familiar with the industry - it has to be. Thus, a significant portion of these people are former Boeing employees. And historically, many NTSB staff have gone to work for Boeing (and other industry giants) after their stint at the NTSB as well. It's not that large of an industry after all. This is known as the Revolving Door.

markdown · 2024-01-10T09:05:05

> This is known as the Revolving Door.

Regulatory capture

euroderf · 2024-01-09T19:42:43

NTSB?? In American style management there's no room for namby-pamby "stakeholders" on the board.

kakwa_ · 2024-01-09T07:55:29

It's not necessarily a question of persons (and the CEO did a resigned after the Max crashes).

It's more a question of culture (oversimplifying, sales vs engineering) and this is harder to change most of the time. Apparently, even the Max debacle was not enough.

aredox · 2024-01-09T09:52:45

The MD "moles" managed to change the culture of Boeing pretty quickly from engineering- to sales-first. https://www.theatlantic.com/ideas/archive/2019/11/how-boeing... https://archive.ph/vy5p7

sushibowl · 2024-01-09T11:37:47

Some of these executive quotes read (admittedly with the benefit of hindsight) like satire.

> But the nearest Boeing commercial-airplane assembly facility would be 1,700 miles away. The isolation was deliberate. “When the headquarters is located in proximity to a principal business—as ours was in Seattle—the corporate center is inevitably drawn into day-to-day business operations,” Condit explained at the time.

Oh man, wouldn't want that to happen.

> With ethics now front and center, Condit was forced out and replaced with Stonecipher, who promptly affirmed: “When people say I changed the culture of Boeing, that was the intent, so that it’s run like a business rather than a great engineering firm.”

Indeed.

yencabulator · 2024-01-09T15:34:03

With a 63 million golden handshake for job well done, on top of the all the money he previously made.

https://www.imdb.com/title/tt11893274/

markdown · 2024-01-09T09:20:21

> and the CEO did a resigned after the Max crashes

That was a funny one. They replaced him with his chairman... hence, more of the same.

leptons · 2024-01-09T08:15:11

The "MAX" in 737 MAX means "maximum profit", not "maximum safety".

siva7 · 2024-01-09T08:18:17

You're joking but it's the real meaning conveyed to their customers, the airliners.

rob74 · 2024-01-09T08:47:06

Yeah, that's why the leading low-fare airlines (Southwest, Ryanair) love it - so it's up to the customers to say that they no longer want to fly with an over 50 year old design that was initially a regional airplane but is now being used for transatlantic flights.

dotancohen · 2024-01-10T02:10:53

  > that was initially a regional airplane but is now being used for transatlantic flights.

And had its engines moved to an unstable location 50 years after the type first flew, necessitating software controls to compensate for which, paradoxically, are there so that the pilots do not need to train on the new aircraft's inherent flight characteristics yet should be disabled by the pilots under some emergency situations.

rvba · 2024-01-09T07:23:08

Looks like the installer did their job poorly and the QA rubber stamped it without checking. Are the QAs required to make photos to prove that they did their job? Like those food delivery people.

Alternative is wrong bolts, or sabotage.

But more possible - one lazy QA ghosting.

goku12 · 2024-01-09T12:49:09

They're finding problems on multiple aircrafts. This isn't one QA. In fact, forget QA missing this. Bolts on aerospace systems are supposed to be properly torqued and arrested in most cases. How do the assembly people make such big mistakes?

In my experience, no one puts their job on the line over silly reasons like this, unless there is intense pressure (unrealistic deadlines, heavy workload and poor working conditions) that makes mistakes like this inevitable. I wouldn't be surprised if an honest independent review of either company found ridiculous cost-cutting measures and/or emotional overload.

mikrotikker · 2024-01-10T00:22:18

Perhaps DEI hires that don't have to do their full duties because expecting them to is racist or transphobic or something stupid and they'll be protected by their manager who got shafted in his last performance review because he had too many "white males" on staff.

mywacaday · 2024-01-09T11:36:18

I would doubt one lazy QA, more likely a over worked and time short QA in a company culture that does not allow them to speak up when they need additional support.

legacynl · 2024-01-09T12:12:15

Exactly. This is a failure of management.

joquarky · 2024-01-09T21:54:28

Perhaps the installer and/or QA person should be required to be a passenger on the plane after their inspection and before they can mark the task complete.

glcihgnwe · 2024-01-09T08:36:46

[flagged]

aredox · 2024-01-09T09:49:50

The MD merger is the reason quality nosedives. Management is responsible for ensuring a robust quality culture. If (if!) a few DEI trainings are enough to destroy that quality culture, then something is very wrong at Boeing anyway, and it was only a question of time.

But of course, the parent comment isn't made in good faith. The 737 MAX crashes happened end 2019-2020, and Boeing didn't release its first-ever diversity report more than a year later: https://www.forbes.com/sites/lorenthompson/2021/04/30/boeing...

As for why it didn't nosedive immediatly after the merger, these kind of bankruptcies tend to happen "Gradually, Then Suddenly".

nmacias · 2024-01-09T09:23:56

this site, yikes. In summary, we don’t know what happened yet, but you assume it’s the direct and obvious outcome of ? And that ?

In that’s the game, then, I suspect the Spotify team was onsite to record a behind the scenes at Boeing podcast, and in the process replaced the locking nuts with Joe Rogan stickers.

yftsui · 2024-01-09T09:01:19

I heard a similar story regarding Boeing’s North Charleston factory due to that push, it leads to some Boeing customers request the last inspection done in the Everett factoryinstead before delivery. My bigger worry is 787 as apparently it is moving to SC.

matthewmacleod · 2024-01-09T09:20:21

“This aircraft failed because Boeing is too woke” is a pretty amazing take, I’ll grant you that.

didntcheck · 2024-01-09T09:27:38

"because they were prioritizing other factors instead of competency"

That's not a particularly outlandish take, and the other factor is orthogonal

Snow_Falls · 2024-01-09T11:27:56

Do you believe that other factor is wokeness, or is it maximising profit

foldr · 2024-01-09T10:23:52

GP appears to have a bee in their bonnet about wokeness (see post history). Unless they provide some evidence to support their take, I see no reason to take it seriously.

boxed · 2024-01-09T08:45:47

You can prop up a bad system for many years by having a lot of workers still around from the old system. It takes time to fully drive a company as good as Boeing into the ground.

Managers think they are the reason anything happens, but fortunately in this case this is a delusion. In places where a great manager tries to turn a business around that is broken that's a bad thing.

lvl102 · 2024-01-09T11:47:07

[REDACTED]

throwup238 · 2024-01-09T11:56:59

737 Max 9s are made in Renton, WA.

storf45 · 2024-01-09T12:56:41

From when I worked as an engineer on the assembly line for smaller jets, know that there would be a record trail of exactly who completed the work, who signed off on it and what the work order steps were for anything related to these assemblies and components. This would include the work done at Boeing and their suppliers. Will be interesting to ultimately hear the root cause here.

x86x87 · 2024-01-09T14:53:55

Let me ask this: assuming they did not tighten the bolts correctly and qa didn't check it, what are the odds they keep an accurate paper trail?

burnerthrow008 · 2024-01-09T17:16:34

Close to zero, but that doesn’t help them. If the paperwork was falsified to say that everything was tightened and verified, and Boeing stands behind that, now Boeing has to investigate and come up with a plan to prevent these bolts from “mysteriously loosening” (because everything was fine when it left the factory)… which will be an even bigger pain in the ass than just fixing the QA process.

euroderf · 2024-01-09T19:44:50

Maybe bring in some QA professionals from... Airbus.

x86x87 · 2024-01-09T19:58:47

I mean, if they are not doing it in good faith no matter what happens is going to be half assed.

whatever1 · 2024-01-09T08:17:02

Bolts can get unscrewed with vibrations. So probably a design error, they did not use the correct type of bolt (the one with the safety pin).

gregoriol · 2024-01-09T08:49:32

Don't they have many ways to prevent bolts from unscrewing? I know at least a few by doing mechanical stuff on motorcycles, and it seems that other planes don't have such problems (at least not within 2 months after the last inspection).

aredox · 2024-01-09T09:57:10

There are washers, or thread lock glue, but it is still a question of correct execution: has the glue been applied? Was the glue batch used before its shelf life? Were the parts degreased before glue application?

Or did anyone decided to cut corners by using an old batch/skipping degreasing/not putting glue because they were late on delivery?

MarkMarine · 2024-01-09T14:17:43

safety wire and cotter pins are preferred methods on aircraft.

lobsterthief · 2024-01-09T17:55:22

Yes. For mission-critical bolts and nuts, absolutely safety wire and cotter pins should be used.

MarkMarine · 2024-01-09T20:47:31

The engineers at Bell also thought that blind bolts you couldn’t inspect in dailies should be wired.

somewhereoutth · 2024-01-09T10:57:31

My understanding is that the bolts did have castle nuts and retaining wires in the design. So either they were incorrectly fitted, or the bolts themselves were under specified with regard to strength.

0xbadc0de5 · 2024-01-09T13:23:38

The door hinge bolt had a castellated nut and pin, but the screws holding the hinge mount to the airframe apparently had no retaining wire. See leaked photo at: https://twitter.com/ByERussell/status/1744460136855294106/ph...

aziaziazi · 2024-01-09T11:46:12

Not sure what happened from “the design” to “the field”. The two loosen bolts in the picture are not castle nuts.

MarkMarine · 2024-01-09T14:19:11

Those bolts aren’t drilled for safety wire.

stevehawk · 2024-01-09T11:37:02

ah. i've been wondering what the method of securing the bolts would have been and have not seen anyone mention it. torque values or threadlocker is rarely enough for the FAA. it's usually safety wire, castle nuts, lock washers, etc.

DANmode · 2024-01-09T12:42:29

> signed their life on the line

Clearly this is hyperbole.

MarkMarine · 2024-01-09T13:56:10

It wasn’t in the military. Doing this would land you in Leavenworth.

DANmode · 2024-01-10T07:39:33

What I mean is: clearly that's not the case in the civilian world.

Though it should be!

stjohnswarts · 2024-01-09T12:37:45

Did you torque every bolt every time before every flight?

MarkMarine · 2024-01-09T14:08:32

Every bolt you could see was checked before every flight yes. Every important bolt you couldn’t see during inspection was torqued, witnessed by QA, secured via safety wire or cotter pin, and secondary torque holding was then inspected by QA.

This thing is obviously not just an interior part, look at the meat in those castings, and it’s obviously safety critical, look at the cotter pins on other bolts. Sounds like it was going to be installed behind interior paneling and not inspected every day. For something like that, every important bolt should be secured by secondary methods, torqued and witnessed installed correctly. This looks like a failure in engineering (not having wire on this bolts), then a profound failure in assembly with multiple people not doing their jobs (not torquing, not witnessing, faking logs), risking the lives of passengers.

If this happened at cruising altitude and speed, people would have died. I can’t find the flight number but I believe 9 people died when a jet lost cabin pressure and a piece of the plane while decompressing during cruising altitude over water.

thebiss · 2024-01-10T04:32:14

Correct: https://en.wikipedia.org/wiki/United_Airlines_Flight_811

defrost · 2024-01-09T14:02:10

Every scheduled maintainance that had that on the punch list, yes.

With a seperate follow through by another party to check the work.

That's SOP for US | AU | UK | EU military air mechanical crews.

jasode · 2024-01-08T21:15:39

I found it interesting that Boeing did proactively tell airlines to inspect 737 MAXs for a possible loose bolt in a different part of the plane (rudder section) at least 8 days before the January 5th event. Example story: https://www.reuters.com/business/aerospace-defense/boeing-ur...

Unfortunately, Boeing did not know they had other issues with the plug door bolts.

nolok · 2024-01-08T23:58:28

Imagine the quality of the manufacturing and QA / final inspection to have that kind of issue.

ethbr1 · 2024-01-09T00:12:19

I expect even at its worst, software development could learn a lot from aircraft QA.

Especially since most shops have pretty much tossed professional career QA out the window.

vegetablepotpie · 2024-01-09T00:56:17

I work in the defense industry, it’s very much like the aerospace industry in that we deal with human life as a consequence of our work. We have software QA departments that operate very much like manufacturing or aerospace QA.

Software QA provides nothing of value to software development; having it as a dedicated function works against the overtly stated goals of the function and counterintuitively acts to degrade quality within software by mandating strict top down process and brittle end-to-end testing.

Although Software QA is intended to be an independent verification body that provides engineering organizations with tools and resources, in practice they function as a moral crumple zone [1] within the complex socio-technical defense industrial system, being one of the groups that the finger will be pointed to when something goes wrong and absorb shock to the business in the event of a failure. As a result they have a strong incentive to highly systematize their work with specific process steps, to shield them from liability, which can be applied generically to all projects.

Good software teams build quality into projects by introducing continuous integration, unit testing, creating feedback, and tightening these feedback loops. This acts to find problems quickly and resolve them quickly. Software QAs need for high level, top down, generic systemization requires them to work against these principles in practice. Bespoke project specific checks, such as unit testing, is not viewed as contributing to the final product and is discouraged by leadership who see it as waste.

To give an example of how these dynamics destroy quality in software. I once found a bug in software on a piece of test equipment where a logarithmic search function was not operating on a strictly sorted list. When I pointed this out to my leadership I was told that if we changed any part of code, it would require a new FQT, which would be too expensive to conduct and was not in the budget. Although the bug would have been trivial to solve, and was clearly wrong and would not provide any benefits by remaining in the test equipment software, the process required for changes prevented solving the issue.

[1] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2757236

amluto · 2024-01-09T02:21:51

> We have software QA departments that operate very much like manufacturing or aerospace QA.

I don’t work in this industry, but this seems fairly ridiculous on its face: software is not at all like manufacturing.

In manufacturing, there’s a design and a manufacturing process, and a critical function of QA is ensuring that the manufactured produce is manufactured to spec.

With software, the software is written, compiled, and then repeatedly copied. And something should verify that it’s copied correctly, but this is straightforward and boring.

So software QA ought to be much more like the kind of validation that happens when designing hardware, not like the kind of testing and validation that happens as products are manufactured.

irrational · 2024-01-09T03:25:45

Ideally there should be a solid spec written and then qa can test against the spec. Maybe there is somewhere that does write solid specs, including accounting for corner cases, but in my 25 years working professionally in the industry, I’ve never seen it.

_glass · 2024-01-09T08:23:14

I work in MedTech. We do this. A design has to be reviewed by QA, and is then tested, and the test is reviewed again. So just to counter the narrative, there are companies that do that, and it is working. In other jobs I also saw the cargo cult of QA. But in some industries it is just crucial, otherwise the pressure is too high to cut corners to implement something. It is a good mechanism to counter the need to move fast and break things.

stickfigure · 2024-01-09T04:48:05

The only complete and precise specification of software is the code itself. If some other form of specification was complete, we would be able to auto-generate the code.

cogwheel · 2024-01-09T05:52:59

This is beside the point. The code specifies what the product _is_, not what it "should" be. If you ask for a word processor and I deliver a perfectly bug-free and feature-complete calculator would you really believe it lived up to spec?

_gabe_ · 2024-01-09T06:17:55

This is also beside the point. I think both of you are trying to warn against the dangers that lie on both sides of this coin: people can invest too heavily in a specification and waste an enormous amount of time, and people can immediately jump into coding and code something that does not do what it was intended to do. Like with all things in life, there’s a balance between these two extremes that’s correct.

You need some level of specification so you know what you’re building, but you have to keep in mind that the final code defines what the behavior truly is. Sometimes, that behavior unintentionally becomes part of the specification because users begin to rely on it.

I do like the fact that you both used hyperbole to succinctly illustrate the dangers of veering too far in either direction though :)

stickfigure · 2024-01-09T17:18:14

A (human language) specification is simply _enough_ information about a system that a human can figure out the intention of the author. The smarter and more context-rich the human, the simpler the specification can be. The dumber and less context-rich the human, the closer the specification needs to be to code.

It's asymptotic. By the time you reach a human who is as dumb as an actual computer, the specification _is_ the code.

sgerenser · 2024-01-09T14:41:16

I work in software for CPU design/verification. Even here, where in theory there should be a rock-solid spec, there's not. There's a 12,000 page architectural specification, which is very helpful for specifying all the end-user visible state. But the microarchitectural specification is scattered all over different PDFs, visio docs, excel sheets, and sometimes the only spec is the RTL code itself.

legacynl · 2024-01-09T14:24:55

> this seems fairly ridiculous on its face

I think that's why people always tell each other to not take things at face value.

Of course there is a big difference between sw and hw QA, in the thing that they test, and how they test them.

But they are also very similar. Any QA department has to think about ways that things can go wrong, and what things to test for, how to test, which testing methods, which standards to handle, keeping certifications, etc. During testing you also need to keep reevaluating if you actually are catching each problem/bug and how to implement changes in your company that decreases the amount of problems or increase the amount that you catch.

I think in that way there's a lot of overlap in thinking about business processes and how to identify problems with them.

Of course once a specific binary gets tested and approved by QA it shouldn't matter if it gets copied or whatever as long as you make sure its the same binary (by a checksum for example).

But still making sure that errors don't reach the customer, is vital in any QA. If errors does happen, QA is the department that can make sure that it doesn't happen again. And ofc be able to proof in court that you did your due diligence if something does happen.

shermantanktop · 2024-01-09T03:10:20

I’m going to be using “moral crumple zone” in every conversation I possibly can from now on.

BlueUmarell · 2024-01-09T06:56:17

"Hey, wanna hang out with a beer this evening?"

"Nah, I don't need to slouch anymore in this moral crumple zone!"

"We need this new feature in our program!"

"If we implement this it means the management fell in a moral crumple zone"

"What seat would you have to have for your flight?"

"Anywhere, but not in the moral crumple zone, please"

shermantanktop · 2024-01-09T07:04:23

An example from my actual work life, lightly fictionalized—

Them: “Please review this design.”

Me: “Ok, sure, when do you plan to start coding?”

Them: “Oh, it’s already in beta.”

Me: “So you can’t do anything with my feedback, but you’ll say I reviewed it?”

Them: “Well…”

Me: “You’re putting me in a moral crumple zone here!”

9659 · 2024-01-09T01:56:17

About 2000, Software QA (and almost all traditional QA activities) were changed. The focus was on process over inspection. "Design in quality, do not inspect it into the product"

Suppliers (to include software) were expected to manage the quality of the product they provided; the purchaser would focus on how they managed the process, not in the compliance of every part.

This had a chance until software process was tossed in the name of "agile".

steve_gh · 2024-01-09T07:35:36

I recall a bug I was involved with at a telecoms equipment market in the early 2000s. The bug only showed up in our biggest base stations in high load situations. We diagnosed the bug, and there were a couple of parts to it. Sloppy software design in an optional hardware module (no state machine) was one part - and was fixed. But there was another underlying issue in the way message queues were handled.

Anyhow, the fix for this was created and written. But we never got to put it into production. The reason: the company didn't have a lab test facility that could put a sufficient load on the software to prove it. Even though we were getting field failures because of this issue that were getting a bad rep, we couldn't fix it because even though the old code was known to be buggy, we couldn't prove the new code. So the process said we couldn't ship it.

DougBTX · 2024-01-09T18:49:00

Another way of looking at that is that within the ability to test, the implementations were indistinguishable, so the process mandated that the older implementation must be used. I wonder if they would have explicitly specified age as a metric if this was considered when designing the process.

bb88 · 2024-01-09T03:02:30

Here's a stupid question: How do you know your process is good unless you inspect it?

"Hey Bob I know you're a competent engineer, but don't worry about specifying a certain type of bolt or loctite, the untrained assembly personnel will figure it out. I'm sure they won't let 200 people die in a plane crash."

zilti · 2024-01-09T11:52:26

Make it impossible to use a wrong bolt, and train the assembly workers.

ethbr1 · 2024-01-09T01:06:07

> Good software teams build quality into projects by introducing continuous integration, unit testing, creating feedback, and tightening these feedback loops.

Agreed, for good software teams.

I would content that most software teams at most companies are not good.

Which is to ask, with an average to bad software team is it better to have integrated or separate QA?

ajmurmann · 2024-01-09T06:18:53

If your devs aren't good what are the chances of your QA team being good enough to make up for their short comings? The dynamics laid out by the parent comment will just hit even harder. Your best bet is to enforce basic practices like continuous integration, coverage goals and maybe a coverage ratchet as a merge gate. Training and education on areas were the team is weak is also a must.

vegetablepotpie · 2024-01-09T01:54:56

Does it make sense to degrade the performance of good software teams because bad software teams exist?

Ideally we’d always have good software teams, but in the real world sometimes you have to build software with bad teams.

Leaders have options, they can do things like reduce scope, increase budget, increase schedule, or full on abandon or cancel the project. These are all options available to leaders, but they require tradeoffs and decisions to be made on a project by project basis.

It is scalable to have a strict process that everyone has to follow, then impose a watchdog to enforce it on a wide scale. It may not be better to have separate QA, but it is easier for those in charge.

ethbr1 · 2024-01-09T02:00:30

It makes the most sense to me to match the org structure to the teams you have.

If I'm trying to build something with undertrained, demoralized, underpaid engineers... it's not optimal to use methods intended for self-motivated, high-performance teams.

And nothing says there must be company-wide mandates. Maybe this area gets a formal, independent QA team, but this other area doesn't.

My experience just doesn't bear out that collapsing the QA function into development always leads to better outcomes.

I've seen the opposite happen too often, and QA be the sole bulwark between idiocy and customers.

bb88 · 2024-01-09T03:05:07

> Ideally we’d always have good software teams, but in the real world sometimes you have to build software with bad teams.

Here's a real, perhaps unexpected counterpoint. Say you have a good software team. How do they build good software with bad management?

iancmceachern · 2024-01-09T05:48:06

Quit and do it for someone else

xyzzy_plugh · 2024-01-09T03:34:50

They quit.

bb88 · 2024-01-09T03:36:56

Exactly!

fzingle · 2024-01-09T04:57:31

> Does it make sense to degrade the performance of good software teams because bad software teams exist?

Consider the classic statistic "most drivers think they are above average".

I posit that the same is true of software teams, almost every team will self-assess as above average, i.e. good. Those teams will then imagine that, being good, they build quality into the process and very little verification QA is done.

I have worked as a software consultant for 15 years now. I've worked with at least 40 separate software teams in that time. Every single team manager would pep talk with "this is the best team I've ever seen". Some of this is obviously blowing smoke to get people to work harder and feel good. But over the years I've had candid conversations with managers and realized that most of the time the genuinely think their team is really good, truly top 10-20%.

Here's the rub. Being a consultant, I'm almost always brought in by higher level management because something is going horribly wrong. The team can't deliver quickly. The software they deliver is bug ridden. They routinely deliver the wrong software (i.e. incorrect interpretation of requirements.)

Often times these problems are not only the fault of the development team, management has issues too. But in every single case, the development team is in dire straits. They have continuous integration sure, and unit tests, and nightly builds, and lots of green check marks. But the unit tests test that the test works. The stress tests have no reality based basis for expected load. The continuous integration system builds software but it can't be deployed in that form for x, y & z reasons, so production has a special build system, etc...

In 15 years I have never once encountered a team that would not benefit from a QA team doing boring, old school, black box manual testing. And the teams that most adamantly refuse to accept that reality are precisely those that think they are really top tier because they have 90+% unit test coverage, use agile and do nightly builds.

So, my question is, do you (I don't mean the specific "you" here, rather everyone should ask themselves this, all the time) think that most bad software teams know they are bad? Including the one you are part of? Would it really hurt to have some ye olde QA, just in case, you know, you are actually just average? :)

coderenegade · 2024-01-10T01:06:37

Depending on the shape of the distribution, most drivers could be above average. Average doesn't imply 50th percentile, that's what the median is for. A minority of tremendously poor drivers could certainly mean that most drivers are in fact better than average, in the same way that my friends on average have more friends than I do.

shiroiuma · 2024-01-09T06:10:45

I'm curious: in your many years of being a consultant to these bad teams, where the manager really thought they were top 20%, did you get a chance to talk to the rank-and-file team members, and did they paint a very different picture of the team health and software quality than their manager?

Also, did you run across any orgs where they basically refused to use a process like Agile, and instead just did ad-hoc coding, insisting that this was the best way since it worked just fine for them back when they were a 5-person startup?

ethbr1 · 2024-01-09T14:01:44

Not parent, but in my experience as a consultant working with bad teams, the rank and file were 'doing the job.'

You usually had a few personality archetypes:

- The most technical dev on the team, always with a chip on their shoulder and serious personality issues, who had decided to settle for this job for (reasons)

- The vastly undertrained dev who was trying to keep up with the rest of the team, but would eventually be found out and tossed, usually to blame for a major issue

- The earnest and surprisingly competent meek dev, who presumably didn't have enough confidence to apply to a better job, but easily could have made it on merit, work ethic, and skill

- The over-confident dev who read a bit of SDLC practice, and could see every tree while missing the forest

The key is that, aside from the incompetent person, they had all always been working there for awhile. Consequently, there wasn't good or bad health and quality: there was just "the system" (at that company) and dealing with it.

And none of these folks ever worked at 5-person startups. ;) I think it was definitely more an issue of SDLC "unknown unknowns" they should be doing, than willful decisions not to.

fzingle · 2024-01-09T15:15:50

> I'm curious: in your many years of being a consultant to these bad teams, where the manager really thought they were top 20%, did you get a chance to talk to the rank-and-file team members, and did they paint a very different picture of the team health and software quality than their manager?

Yes, generally I join teams and work as an engineer or sometimes as a team lead, so I'm talking to all the team members.

Most start up teams are composed of junior developers, often pretty smart people. Usually 5 or fewer years of experience. Many times these are people who have already accomplished stuff they didn't think they could do. So that generally means that yes they think pretty highly of themselves. To a degree it is quite justifiable, they tend to be very accomplished but in a narrow domain. Unfortunately they don't realize that their technical accomplishments in a specific field does not mean that they are experts everywhere. Their managers understand that these are smart people and assume again that this is therefore a good team.

Non start ups that I join are usually just plain dysfunctional.

> Also, did you run across any orgs where they basically refused to use a process like Agile, and instead just did ad-hoc coding, insisting that this was the best way since it worked just fine for them back when they were a 5-person startup?

Usually more the opposite. In my experience I come across teams that are sure they must not need any help because they follow all the rules in Scrum and have great code coverage metrics.

It is really common to see this kind of thing. I call it "the proxy endpoint fallacy". It can crop up anywhere that there is something that can be measured. In that example, it would be confusing adherence to Scrum with having a working SDLC or perhaps confusing code coverage metrics with the objective of having bug-free releases.

This isn't a software only fallacy. In politics, GDP is often confused with societal well-being. Always be wary of your metrics and change them as required to keep you tracking your actual goals.

jl6 · 2024-01-09T12:43:14

I'm not going to argue with the general thrust of your comment, which I think is insightful as to how incentives can compromise objectives. But...

> To give an example of how these dynamics destroy quality in software. I once found a bug in software on a piece of test equipment where a logarithmic search function was not operating on a strictly sorted list. When I pointed this out to my leadership I was told that if we changed any part of code, it would require a new FQT, which would be too expensive to conduct and was not in the budget. Although the bug would have been trivial to solve, and was clearly wrong and would not provide any benefits by remaining in the test equipment software, the process required for changes prevented solving the issue.

I've seen this happen where it was a bad thing, but also where it was a good thing.

It's all about risk.

What risk does the software defect pose to the mission? What risk is inherent in making any change to the software? Noting that even trivial changes can be fat-fingered and thus are a source of risk. I've seen it go wrong this way: a seemingly trivial change was made, but the developer accidentally checked an extra file into source control, causing a further defect.

And then: what is the cost of mitigating these risks? Maybe the software defect is as trivial as its fix. Maybe an acceptable fix would be to write up a workaround in the documentation.

I don't think it's always wrong to say no to fixing issues. I also don't think it's always right that a separate QA department contributes nothing to the organization, even if they act as a handbrake on the software developers (sometimes, precisely because they do that). Human factors are real.

legacynl · 2024-01-09T13:30:53

I think you're sort of misunderstanding the role of QA.

You think that QA is a liability shield, but that is only a side effect of the work that they actually do.

The task of QA is exactly that: an entity that tries to assure that the quality is up to some standard. Even in favourable conditions mistakes happen, so how do you make sure as a company that not 1 in every 100 product are faulty and tarnishes the good reputation that your company has spent so much time and money on to build? You hire a QA to make sure problems get caught before delivery.

But if all humans make mistakes, and QA is human, how do you make sure that the QA doesn't make a mistake? A never ending chain of QAs expecting each other?

No of course not. One thing that helps with reducing errors is to have a rigid protocol that is followed to the letter everytime. Pilots, for example, have a preflight checklist that they have to run every time they operate the plane.

The rigid protocol of QA teams is therefore an essential part of their jobs.

Although from your standpoint as a developer it might seem strange that QA is 'preventing' you from fixing a bug, it is actually very reasonable.

Especially since you work in the defence industry, I hope you understand that it is very important that the software that operates radars, planes, missiles, bombs, etc is working exactly as expected. Understandably there is a great deal of effort made to assure that when those things are needed they work exactly to spec.

So in your example it is probably very reasonable that any change you make needs to go through some rigorous process. The fact that it 'only' was about test equipment, doesn't matter because test equipment is just as, if not more important as the stuff it tests.

The reason why QA has the side-effect of being a 'liability shield' is that it gives companies the ability to argue (and proof) after the fact that the company did their due diligence in making sure that the product was to spec.

Especially certification is basically to get an external organisation to approve your QA. In that case if you get sued you can rightfully claim that you did everything that was legally asked of you, and if there is blame, then it is the certifying company using insufficient standards.

bb88 · 2024-01-09T02:30:19

> Good software teams build quality into projects by introducing continuous integration, unit testing, creating feedback, and tightening these feedback loops.

No. Good software teams are led by competent, technical management. Managers who aren't afraid to get down into the dirty details. Managers who aren't afraid to roll up their sleeves and write code if they need to.

The process doesn't matter. The management of what is or is not important does. Agile is just one process out of many.

Imagine an accounting team led by someone who never did accounting in their life: "Just make the numbers work out! I don't care how you do it! My bonus is at stake!"

MattPalmer1086 · 2024-01-09T12:47:24

Sigh... This myth that the only people who can competently manage developers are other developers has been floating round for decades.

For some reason, developers seem remarkably blind to the skills other roles and disciplines require. Only a developer can do that, everyone else is basically useless fluff. Maybe it's a form of arrogance or just deep unself-awareness.

pi-e-sigma · 2024-01-09T15:54:49

Let's apply your reasoning to medicine. I'm sure you would be completely fine with managers telling your surgeon what parts of the surgery can be 'optimized away'.

MattPalmer1086 · 2024-01-09T16:26:13

Haha, funny strawman. My reasoning is that non developers are capable of managing developers, notably people who have good management skills.

Your contention is that the surgeon should be running the hospital.

pi-e-sigma · 2024-01-09T16:29:39

Hahah, indeed. So have you seen a law department in a company headed by someone who doesn't come from law background? How about a finance department headed by some schmuck who doesn't know anything about finance?

MattPalmer1086 · 2024-01-09T19:09:29

I've seen plenty of departments managed by people who don't come from the background of the department. My current boss is extremely good and came from a different discipline.

Although I don't deny it can help to have the background, it is not necessary to be a good manager of something. Also seen plenty of good techies promoted to management and failing badly.

pi-e-sigma · 2024-01-09T19:21:56

This is a lie, and you know it. Even a mere idea about lawyers being managed by a non-lawyer would be laughed at. Same with finance, nobody would be stupid enough to even try it

MattPalmer1086 · 2024-01-09T19:31:11

It's ok to disagree, we don't have to accuse each other of mendacity.

I stand by what I said, although my experience is in the technical domain, not finance or law. Maybe those departments are different, I don't know.

mangodrunk · 2024-01-09T16:59:51

I disagree with you. You are stating it, but you are not giving reasons. Managers who weren’t developers tend to not be able to manage the team. They can’t help with or understand the technical decisions made. The non-technical managers tend to be project managers just focused on dates.

MattPalmer1086 · 2024-01-09T19:18:59

We may be talking about different levels of management here. A manager should not be making technical decisions, they should have team leads and architects who do that. It's their job to manage the team, interface with the business, prioritise work and give cover to the team so they can get on with it.

I guess if you have a manager who is making technical decisions, they are really a hybrid manager/contributor role. Maybe that works better in smaller organisations.

bb88 · 2024-01-09T22:27:20

Then what exactly is the non technical manager's added value?

He has no experience to lead the team in high pressure situations. Like production being down.

He can't truly have a first person understanding of the work of the people who he manages. He has to rely upon others to tell him who's good and who's bad. That sets up a pecking order.

He can't help or mentor engineers with design decisions, or provide a historical context.

He doesn't understand the technology so there's an immediate communication and knowledge barrier that has to be overcome between him and his directs.

He doesn't feel the pain of a bad decision, because he's not coding it, and he can't emphasize with them since he doesn't code.

He tends to push feature development without fixing technical debt. Again that's pain he personally doesn't feel.

MattPalmer1086 · 2024-01-10T09:06:52

To me, the role you are describing is a principal engineer or team lead, not a manager.

Simply not true however that a good manager can't lead the team in a high pressure situation. I'd say that exactly what a good manager could do well. Obviously they won't be making overtly technical decisions, that's what you are for. They can make business decisions, provide cover, get resources, communicate to other stakeholders... All the bits that need doing but would be a huge hassle for the techies who are trying to fix the issue.

bb88 · 2024-01-09T17:56:13

The head of surgery should be a surgeon, not an accounting manager.

The head of accounting should be an accountant, not a surgeon.

And even at the executive level of a hospital, you would want people who have spent their careers in healthcare, rather than, say, architecture.

madhadron · 2024-01-09T06:59:50

> Good software teams are led by competent, technical management.

...or perhaps with no managers at all. I'm less and less convinced of the importance of management in engineering except to give investors an illusion of control.

mangodrunk · 2024-01-09T17:01:43

I sort of agree, and I do think it’s possible depending on the team. But unfortunately developers can be too opinionated and get focused on low priority things.

johnnyworker · 2024-01-09T03:18:12

But that doesn't contradict the parent, does it? I'd say you both make good points.

bb88 · 2024-01-09T03:36:00

Well... here's a thought experiment.

Let's say you have a bunch of school children and architects create a skyscraper. I've given both groups the process to design a skyscraper.

So in both cases, I should end up with a safe building?

withinboredom · 2024-01-09T07:17:47

I’d bet the children would come out better simply because they have parents who are likely multi-disciplined as a group. A disparite group will (almost) always come up with better results than a homogeneous one (at least in my experience)

johnnyworker · 2024-01-09T07:01:52

Why not both? Am I missing something? You can have feedback loops and CI and all that, "good craftsmanship" or "good practices" (not "best" practices because those often suck hah), where of course opinions vary on the details of that -- and then someone who is also good at the craft who spends more or most time on helping the rest work together, i.e. manage/lead them.

Eisenstein · 2024-01-09T01:02:44

It sounds like they are calling something QA but using it as a liability shield. It makes sense that you are upset about that, but naming something QA and having it do something else doesn't mean that QA as an effort is bad. It means that the people doing that are being deceptive.

vegetablepotpie · 2024-01-09T01:36:11

Fair point, you are correct in your inference that there are some bad actors in my workplace. However, I’ll argue that the fundamental dynamics of bifurcating the responsibility of quality from software leads to a steady state where all QA departments end up as a liability shield given enough time.

This is driven by Pournelle's iron law of bureaucracy [1], which says that people who promote the bureaucracy rather than the mission of the bureaucracy will get promoted within the organization and come to dominate its decision making.

For example, in schools, administrators make more money than teachers. This is despite both groups having similar levels of education and intelligence. The reason for this is that administrators know the laws and regulations of the environment they’re working in and ensure the continuity of the organization. Despite not directly contributing to the organization’s stated mission of education, they are in charge of the organization and take more benefits from it.

Software QA has similar dynamics. A QA department may start out making good faith contributions to the organization. Eventually there are product failures, eventually leadership needs a scapegoat to show they’re doing something, and eventually QA takes the blame. People get moved, demoted, or fired. QA realizes its risk, and takes steps to mitigate it. They create a highly systematized workflow and process, adopt or introduce standards. Then assert that following process equates to good outcomes. When bad outcomes occur, they point to their strict adherence to following process as evidence of innocence.

If the process does not support the work or mission, that is a cost they are happy to impose on other functions to deal with. This is the final state until a system disruption happens.

[1] https://en.m.wikipedia.org/wiki/Jerry_Pournelle

moring · 2024-01-09T06:59:23

I have seen a case of Software QA taking a very different shape, so I'd like to argue that the outcome you describe is not intrinsic to software QA, but rather to company culture.

The case I'm talking about does not have a separate QA department, but QA people as part of every software team. If a product fails, that team is responsible, so software devs are in the same boat as QA. They focus on learning from these failures, so no scapegoat is needed. Process does get followed, but not as a defense mechanism, but because not doing so introduces noise that is an obstacle to improvement. In case of bad outcomes, people do point out that they followed process because then it is clear that the process is involved in the failure and should be improved.

Unfortunately, companies with that kind of culture are rare.

Eisenstein · 2024-01-09T02:22:53

But what makes 'software QA' fundamentally different than 'non-software QA' to give it the problems you foresee?

nikau · 2024-01-09T03:20:27

Because every QA is something new for development barring regression testing.

An equivalent software QA to building planes would be to verify a known process with existing tooling.

xwolfi · 2024-01-09T04:56:50

It's like saying communism isnt the problem, but that it s how every single group attempted to implement it that should be blamed.

Sure, maybe, but if nobody ever can implement the theoretical utopia, maybe we should talk of things humans can do instead and ditch the unimplementable idea.

QA cannot be done by a separate team the way you dream: it will always be a political buffer zone staffed by the cheapest half-competent people you can find, expulsing good people into dev or management. Or you merge it into dev/solution design.

The reason is simple: just like contract law, you only care about quality once you are in trouble and need to reverse back the source of the issue to give to the client a post mortem. Otherwise, you care first about velocity, or $ input/hr of effort.

Eisenstein · 2024-01-09T05:26:59

Other fields do QA just fine.

__loam · 2024-01-09T06:45:26

High quality comment

Robotbeat · 2024-01-09T02:17:00

I agree.

I’m regularly critical of Boeing Defense (particularly space contracts where I’m a huge Boeing skeptic), but I think people are pretty off base if they think Boeing is just completely incompetent.

Airliner safety is insanely good. Just vast seas of competence, but when there’s a super rare failure, the incorrect impression people get is that Boeing (or Airbus) is just full of incompetency. Almost nothing that humans do is held to the same standard. Not spaceflight, not software, not healthcare, and certainly not automotive.

Flying a 737 Max with a bad door and without the fix to the angle of attack sensor is probably still better per mile than driving. In spite of going at 10 times the speed and miles above the Earth.

You can almost argue it’s held to a higher standard than it should, slowing development of cleaner aviation (and therefore killing more people in the future due to tertiary effects of climate change, etc).

It kind of annoys me when comment sections are filled with people talking about how incompetent Boeing is. It feels like out of shape slobs on their La-Z-boy chairs talking about how incompetent or slow some professional sports players are. Like, airliner safety is just a totally different league than almost anyone else plays in. On the worst day, their better than almost anyone else is on their best.

ethbr1 · 2024-01-09T03:02:38

> Airliner safety is insanely good.

Because I dug it up for another comment, commercial carriers operating under Part 121 (roughly: scheduled passenger and cargo operation) had 4 fatal incidents in the last 10 years. [0]

Totalling 6 deaths.

In 10 years of US commercial carrier aviation.

One of those was literally 'the engine exploded and threw part of the turbine into the cabin (and also shredded some of the wing)'!!

Which resulted in 1 person dying and a successful landing.

[0] https://news.ycombinator.com/item?id=38921664

thsksbd · 2024-01-09T03:35:18

Ya but your sample size is way too small to measure the death rate. Aircraft deaths are rare, but flying is too.

The two MAX 8s that fell from the sky were 100% Boeing's fault and could have happened in the US. If 5% of airline traffic is in the US you can renormalize those hundreds of dead and you get dozens dead.

chx · 2024-01-09T11:58:13

We know US pilots have been warning about the same issues that led to the deathly crashes later but were ignored. The thing is, one part of US commercial aviation being so safe is a lot of pilots responsible for the jet airliners are ex-military. Someone mentioned Southwest Airlines Flight 1380, yup, captain Tammie Jo Shults was one of the first Navy female fighter pilots. Miracle on the Hudson? Sully Sullenberger was an Air Force captain and training officer. Civilian training, no matter how good, is just no replacement for military training and experience.

I can't find specific numbers but estimates say about one in three has a military background. That's an awful lot.

thsksbd · 2024-01-09T18:11:56

Let's assume American pilots are gods. They were shouting that their crafts were unsafe.

No matter how good they are and how prescient, that doesn't help them if the aircraft computer decides it's stalling, forces a nose down and they cant fight the controls.

But, even if we assume omnipotence from these American pilot gods, and assume they can fly outside the bird and Superman-style catch it, they are still only 30% of American pilots. Just another population to normalize out.

chx · 2024-01-10T02:13:50

> forces a nose down and they cant fight the controls.

But that's not what happened. According to every report, it is possible to take back control, it's just very much not intuitive and the situation was confusing.

According to the Seattle Times

> However on both accident flights, the angle-of-attack sensor failure set off multiple alerts causing distraction and confusion from the moment of takeoff, even before MCAS kicked in.

> On the Ethiopian Airlines flight, for example, a “stick shaker” noisily vibrated the pilot’s control column throughout the flight, warning the plane was in danger of a stall, which it wasn’t; a computerized voice repeating a loud “Don’t sink!” warned that the jet was too close to the ground; a “clacker” making a very loud clicking sound signaled the jet was going too fast; and multiple warning lights told the crew that the speed, altitude and other readings on their instruments were unreliable.

You can find some of the US reports complaining about MCAS here https://s3.documentcloud.org/documents/5766398/ASRS-Reports-... and it includes

> I manually positioned the thrust levers ASAP. This resolved the threat

Then there's

> B737 MAX First Officer reported that the aircraft pitched nose down after engaging autopilot on departure. Autopilot was disconnected and flight continued to destination

> I called "descending" just prior to the GPWS sounding "don't sink, don't sink." The Captain immediately disconnected the autopilot and pitched into a climb

Another

> Takeoff and climb in light to moderate turbulence. After flaps 1 to "up" and above clean "MASI up speed" with LNAV engaged I looked at and engaged A Autopilot. As I was returning to my PFD (Primary Flight Display) PM (Pilot Monitoring) called "DESCENDING" followed by almost an immediate: "DONT SINK DONT SINK!" I immediately disconnected AP (Autopilot) (it WAS engaged as we got full horn etc.) and resumed climb

mschuster91 · 2024-01-09T12:15:05

> I can't find specific numbers but estimates say about one in three has a military background. That's an awful lot.

Not surprising given that pilot training is really really expensive. Airlines love former military pilots because they are a significantly lower financial risk for them. Put them into type rating and off they go, it's rare that one ends up as a dud.

bb88 · 2024-01-09T03:18:10

> Totalling 6 deaths. In 10 years of US commercial carrier aviation.

Okay, awesome. But how much of that was luck with the 737 Max that they didn't crash on US soil by US airlines?

thsksbd · 2024-01-09T03:36:48

Most of it. Statistically. Its not hard to assign part of the deaths from the MAXes to the US.

ethbr1 · 2024-01-09T03:39:33

How much was a rigorous safety regime and high quality training?

chx · 2024-01-09T11:59:10

Military training. See my comment above: https://news.ycombinator.com/item?id=38925089

Eisenstein · 2024-01-09T03:10:28

> It kind of annoys me when comment sections are filled with people talking about how incompetent Boeing is. It feels like out of shape slobs on their La-Z-boy chairs talking about how incompetent or slow some professional sports players are.

People do this with everything though, and air travel induces a large amount of fear in the populace. Not only are we not generally comfortable flying in the air for obvious reasons, but when it happens almost everyone has to concede control to a few people in the cockpit and on the ground. Driving, even if exponentially more dangerous, affords the illusion of control of one's outcome, given driving or having someone you know driving, and control over the vehicle maintenance, etc, as well as familiarity with the control and mechanism of the vehicle. These things don't exist with airplanes for the vast majority of people.

So, you can see why there is a need to find a human component to air travel problems, because that is something one can fix (fire the incompetent people, fine them, whatever), as opposed to all of the other things which must be accepted or rejected entirely.

It is entirely in line with human nature to do this, regardless of its accuracy or effectiveness.

mschuster91 · 2024-01-09T12:12:09

> Airliner safety is insanely good. Just vast seas of competence, but when there’s a super rare failure, the incorrect impression people get is that Boeing (or Airbus) is just full of incompetency. Almost nothing that humans do is held to the same standard. Not spaceflight, not software, not healthcare, and certainly not automotive.

And there's good reasons for that. Spaceflight actually is regulated pretty strictly (partially, because any spaceworthy rocket is effectively a missile), and space pilots and tourists both sign up for such missions fully knowing that they will have a very significant chance of dying one way or another - there simply hasn't been enough human spaceflight activity to work out and understand all the failure modes, unlike with other forms of transportation.

Humans, unlike birds, aren't naturally wired to travel by air... they need to be able to trust their lives to a significantly higher degree to someone else behaving like they should, because unlike in a car they have zero control (or the illusion of control) in an aircraft.

Additionally, the inherent security risk of an airliner is very high: what is a widebody airplane at its core? Hundreds of tons of weight, a decent portion of which is fuel, propelled at near-supersonic speed, and only two people in control of it. Anything goes bonkers and you can get thousands of people killed and injured (see 9/11).

In contrast, cars, even trucks, have way less capability to cause damage simply because they weigh so much less. The only thing that comes close is railways, and hell I don't get what the US is doing there, there's barely any regulation compared to European standards (see the videos I linked at https://news.ycombinator.com/item?id=38725988).

kennethrc · 2024-01-09T11:13:45

So much THIS

panick21_ · 2024-01-09T05:25:38

Being better then driving shouldn't be the standart. Specially driving in the US.

Flying isn't safer then trains I would assume.

Flying has the advantage of being seperated from almost everything else. Most accidents happen when there is mixed traffic, specially cars operated by people with minimal training.

maigret · 2024-01-09T12:39:04

https://turbli.com/blog/the-safest-transport-modes-ranked-by...

rectang · 2024-01-09T01:09:49

Is it plausible that Boeing has "learned" from software/startup/venture-capital culture with regards to tolerating higher risk to minimize costs?

I suspect it's rather a case of parallel evolution between McDonnell Douglas brass and software startup culture, since cost-cutting culture goes back many decades (remember "Chainsaw" Al Dunlap[1] ?) — but I wonder if there's a more direct influence.

[1] https://en.wikipedia.org/wiki/Albert_J._Dunlap

FabHK · 2024-01-09T03:06:52

Here's a Netflix documentary (in the wake of the MCAS crashes) that alleges that after the merger with McDonnell Douglas, the culture of the firm changed. Previously dominated by engineers, it was now dominated by MBAs with a focus on profit and shareholder value.

"With impressive clarity, Downfall: The Case Against Boeing reveals corporate corruption that's enraging in its callousness and frightening in its scope."

https://en.wikipedia.org/wiki/Downfall:_The_Case_Against_Boe... https://www.netflix.com/hk-en/title/81272421

Robotbeat · 2024-01-09T02:18:01

Boeing Airliners are much safer now than before they merged with McDonnel-Douglas. (Because basically all airliners are.) And I say that as a regular Boeing critiquer.

fsh · 2024-01-09T06:27:51

The 737 NG from before the merger has a much better safety track record than the 737 MAX.

Groxx · 2024-01-09T01:14:21

In lots of ways, the "learning" there would just be "capitalism".

It's inherently short-sighted unless forced to do otherwise by legislation. Cutting small corners pays off A LOT until the hammer falls, so there's a massive advantage to doing it / you need to do it if competition is doing it, or you eventually shut down as they take all your business.

It's inherently a race to the bottom. Sometimes that's a net gain, sometimes it isn't.

mc32 · 2024-01-09T01:59:59

All major economic systems of all major national economies over the last century have perverse incentives. It’s not a capitalist thing.

Other systems had incentives such as, get it running by such and such date or have yourself and relatives sent to inhospitable place. So people rushed flawed designs into production.

That said, upper management at Boeing needs a shake-up. People need to get fired. They need to do what Intel is trying and that is to get more engineers in charge, or at least grant them veto power on designs.

acdha · 2024-01-09T02:29:31

> All major economic systems of all major national economies over the last century have perverse incentives. It’s not a capitalist thing.

It should be a lesson against dogmatic pursuit of absolutes: capitalism comes in a wide range of flavors, and the worst is if it’s completely unrestrained. Communism produced worse and worse results the further it got from any sort of public accountability, etc.

The two problems that I see is that the concept of nuance is somewhat at odds with having a simple concept to teach kids at school, and there’s always a group which is more motivated to game the system than the average person who really just wants to hang out with their friends, raise a family, etc. rather than play political games. Boeing didn’t start it by any means but they’ve benefited enormously from decades of reduced oversight and elevated pay driven by a sort of cartoon libertarianism where letting people get enormously rich will motivate them to build great things unfettered by “red tape”.

mschuster91 · 2024-01-09T12:18:51

> All major economic systems of all major national economies over the last century have perverse incentives. It’s not a capitalist thing.

They have, but post-Thatcher neoliberal capitalism has taken the existing perverse incentives and made them exponentially worse. We're on a course heading straight to feudalism, just with fancy titles with legal rights replaced by economic might.

boringg · 2024-01-09T01:59:50

But all those communist airlines had no problems at all - exceptional build quality and operational efficiency! /S

mc32 · 2024-01-09T02:06:08

Yep, Chernobyl being a prime example. Or Komarov's failed re-entry after complaining about the design faults of the vehicle long before launch. Then there was the more uhhm run of the mill backyard blast furnace campaign which contributed to misallocation of workforce which then led to mass starvation.

boringg · 2024-01-09T14:26:02

There are many many more examples. I find it so tiresome to see young people just use capitalism as a catch all for the failure of something. It's such a lazy and uninformed argument.

I'm not carte blanche defending capitalism - its a mixed bag but it sure outpaces the competing systems put forward to date. It does need some stronger safeguards against industry self regulation - that has a bad track record.

mc32 · 2024-01-09T15:22:05

I think we're on the same page. Economic systems need failsafes so that they don't suffer from positive feedback loops.

What anti-capitalist sympathizers, in my view, don't realize is that this is due to people being in the loop. These economic systems are merely vehicles, some better than others, but the conductors are people, be they communists or capitalists. At least with capitalisms there is a delayed regulator (negative feedback) in communism it's up to the system to decide if it needs to modify itself.

throwup238 · 2024-01-09T01:52:53

"Better to ask for forgiveness than for permission"

Ironically, I believe it was Grace Hopper who said it... Whoops.

ssivark · 2024-01-09T02:16:50

That adage is okay, but for it to work not everything can be forgiven — there actually has to be an expectation to be held responsible towards acting on good faith.

cf1241290841 · 2024-01-09T02:34:16

Cockpit resource management is also something a lot of industries can learn from. As well as human error analysis. How an error came to be is often much more interesting then the personal shortcomings of the person who caused it.

Symbiote · 2024-01-09T09:11:26

At it's best, software QA and related methods should be equal to airline manufacture.

Think of railway signalling systems, control-by-wire bits of modern cars, medical equipment, etc. Where the design of the software is formally proven, and the implementation verified to ensure it fits the design.

dilyevsky · 2024-01-09T01:39:28

Would you pay at least 2x for your software to have couple more nines of reliability? I’m gonna guess that “no”. At places where it costs $$$ to have bugs shipped to the end customer (e.g phones) or where there’re regulatory requirements they still have dedicated qa.

bb88 · 2024-01-09T02:54:10

It depends.

1% of 10000 is 100.

.01% is 1.

If someone came up to me and said, "Hey I can save you 99% of expected costs with 1% of your profit.", I might go for it.

dilyevsky · 2024-01-09T04:12:07

Which is what i said in second part of my comment. For most software businesses the cost of shipping a bug is trivial and/or poorly measured so due to McNamara fallacy it is readily exchanged for well measured cost of having a functioning qa team

williamdclt · 2024-01-09T12:23:48

of course, but most of us aren't working on products where a quality problem would kill hundreds of people. Having aircraft-level QA would be plain silly, you don't expect that level of quality from most other industry like eg guitar manufacturing, do you?

darylteo · 2024-01-09T00:21:51

software development: what is QA?

bb88 · 2024-01-09T03:41:42

That's where failed software devs / management go to.

ShadowBanThis01 · 2024-01-09T00:24:05

Except, of course, from Boeing's aircraft-software QA... which killed hundreds of people already.

actionfromafar · 2024-01-09T01:02:15

The problem was not really the software in isolation, but that pilots expected the 737 NG to behave exactly like the old version - because Boeing decided it was too expensive to retrain pilots.

unyttigfjelltol · 2024-01-09T01:49:22

The problem was software that prioritized input from a fauly external sensor, over pilot control, and literally crashed planes directly into the ground. At a certain step in the sequence it was not physically possible for a pilot to pull hard enough on the control element to counteract the software. Could they have disabled the system? Only if they could figure out the specific software trying to crash the plane.

Is that what you meant by "the problem wasn't the software?" Because the pilots should have been trained to unplug the computer to stop it from crashing the plane?

SoftTalker · 2024-01-09T04:54:02

Pilots should (are supposed to) disable the auto-trim if it's doing something uncommanded/unexpected. Runaway trim can happen for reasons other than faulty software. MCAS was a new factor and they should have been told about it, I don't dispute that at all.

roelschroeven · 2024-01-09T18:06:30

Here we are again, this misconception just won't die.

In the 737 MAX, the only way to disable auto-trim also disables powered trim (the thumb buttons). As grand parent says, at a certain step in the sequence it was not physically possible for a pilot to trim the plain back to stability manually. It simply can't be done.

In the 737 ng, there was a button to do just that. That would have been useful.

And that's even ignoring the fact that all symptoms were very different from those present in a runaway trim situation as described in the manual and learned by the pilots.

SoftTalker · 2024-01-09T21:20:26

The thumb buttons would override MCAS. But then you'd have to disable the trim motors and trim manually (by hand-cranking a wheel). That part was not clearly understood by the pilots, because they were not told about MCAS.

drtgh · 2024-01-10T00:37:59

How are pilots expected to disable a malfunctioning MCAS in an emergency, and balance manually by trial and error the aerodynamic extravagances of the angle of attack of such unbalanced aircraft in the middle of procedures?

The user of the parent comment is remarking about time.

The aircraft can be certified without MCAS?

By what I read, MCAS is there for to avoid entering into an aerodynamic stall when the aircraft is approaching a high angle of attack, due it's using larger motors for what classical 737 was designed for. It's balancing an unbalanced aircraft using software to repeatedly adjust the horizontal stabilizer.

It is not my field, but I'm not even sure if it should be called to trim, it sounds like a euphemism for what's going on.

drtgh · 2024-01-09T14:13:27

The manufacturer company put in larger engines than the aircraft is designed for. And they did it to avoid all the homologation licences and design costs involved in bringing a new aircraft to market with the appropriate tolerances, and to compete with another company's aircraft in time (Loss of sales).

They introduced MCAS in the aircraft for to balance by software a hardware issue, a big design negligent issue which can lead to stalling. It is beyond to trim an aircraft, and because of this there is a big difference in the scale of the values that the algorithm manages from a trimming.

It is not my field, but I think it is not a simple factor, and that it should not be put this over the Pilots like if it were a normal aircraft that received a simple update. Every pilot flying that plane should have been warned that it was not a classic plane with a classic update.

If this type of behaviour by aircraft manufacturers becomes the norm, costs over safety, we as passengers will suffer it, as other passengers unfortunately suffered it, while they blame the Pilots. In addition that nowadays the China's aircraft manufacturing industry wants to enter global market. Some days ago I read they want permission (homologations approvals) for to enter in the European Union.

PS: They also cut costs retiring backup sensors, delegating responsibility for a vital system due the MCAS to the buyer as if it was an unimportant feature; disaster was the order of the day. And the spending cuts were not limited to that, as we have seen in recent days.

drtgh · 2024-01-09T20:38:53

Where I wrote,

> They introduced MCAS in the aircraft for to balance by software a hardware issue, a big design negligent issue which can lead to stalling.

> Every pilot flying that plane should have been warned that it was not a classic plane with a classic update.

I was mean,

> They introduced MCAS to use software to attempt to balance an aerodynamically unbalanced aircraft with a high stall tendency, in order to avoid designing a new aircraft.

> Any pilot flying that aircraft should have been warned that it was a plane that didn't want to fly aerodynamically, with software forcing it to fly without backed redundancy. It was not mere trimming.

disillusioned · 2024-01-09T02:09:18

Even more ridiculous, Boeing offered a second source of truth option, but marked it as an upcharge, which the airlines in question rejected. "No thanks, no need for a second AoA sensor, one is none is probably fine!"

bshacklett · 2024-01-09T16:22:08

Additionally, two feels like a really strange number. I would think three for a tiebreaker would be standard for any sensor with that much impact (no pun intended).

ethbr1 · 2024-01-09T02:10:25

Pilots are definitely trained how to disable the autopilot, if needed.

Afaic, the fault apportionment was Boeing documentation > airlines >> pilots > Boeing technical design.

acdha · 2024-01-09T02:35:26

This wasn’t related to autopilot and they removed mention of the MCAS system from the documentation to support the main selling point of the 737 MAX, which was that existing 737 pilots would be able to switch easily without recertification. They knew that they’d lose most sales to Airbus if the aircraft were compared on their merits so they were banking hard on their huge pool of certified pilots as the competitive edge.

If you listen to podcasts, these two episodes of Causality are excellent:

https://engineered.network/causality/episode-33-737-max/

https://engineered.network/causality/episode-50-737-max-ethi...

bb88 · 2024-01-09T03:31:51

You might enjoy this. I have a pin that blinks "AOA Disagree".

Back when I flew regularly before covid, I was tempted to create a bunch of these and hand them out to the flight crew for the flights I flew on.

acdha · 2024-01-09T13:57:35

Ha, playing hardball! I wonder whether you’d find pilots who are Boeing loyalists who’d take offense, or if those guys are even madder at the current management for letting them down.

diabeetusman · 2024-01-09T02:27:49

Not sure why you’re bringing up autopilot— the MCAS system runs even when the autopilot is disabled.

Edit: Also, how does the fault lie with the airlines? Boeing didn’t document the existence of MCAS in the flight manual or training materials.

numpad0 · 2024-01-09T04:56:10

Wasn't MCAS designed to activate when A/P is disconnected, also?

ethbr1 · 2024-01-09T02:57:22

Because the comment I was replying to

>> Because the pilots should have been trained to unplug the computer to stop it from crashing the plane?

Yes.

The fault lies with the airlines because I don't for a second believe they didn't put pressure on Boeing to get the MAX certified without mandating retraining.

And then once that was done, didn't dig into the details too hard about what changes were made.

I have a low tolerance for 'I set up all the conditions and incentives to encourage you to break the law... but you should take all the blame when it explodes.'

At some point, the customer has to take some responsibility for what they asked for.

actionfromafar · 2024-01-09T11:37:37

It’s easier to blame Boeing because they made the damn thing its documentation. We know for a fact they are at fault. Some or all of the airlines may or may not have put pressure on Boeing.

WalterBright · 2024-01-09T01:12:16

The expense for retraining pilots falls on the airline.

Retraining has its own problems. No matter how well retraining is done, pilots still make mistakes from doing the right thing for the previous plane that is the wrong thing for the one they are currently flying.

Adjusting airplanes to fly the same way is a major safety advantage.

ethbr1 · 2024-01-09T01:22:21

Arguably, Boeing hit the uncanny safety valley -- similar enough so that pilots and airlines relaxed, but different enough so that relaxation ultimately killed people.

WalterBright · 2024-01-09T01:26:21

The emergency procedure for runaway trim was the same for both aircraft types, and was not followed. After the first crash, an Emergency Airworthiness Directive was issued to all MAX pilots reiterating the procedure, which was not followed in the second crash, as well as not reacting to an overspeed warning.

Unreported by the media, there was another MAX incident before the first crash. The crew had no knowledge of MCAS, but did follow the emergency runaway trim procedure, and continued the flight and landed safely.

9659 · 2024-01-09T02:03:24

"Runaway stab trim". It is a memory item, every pilot should be able to perform it from memory.

Turn off the motor, and the trim is manual. There is a crank right there in the cockpit. If it is too hard to turn, change aircraft configuration to reduce the forces required to. Pilot know how to do this. This pilot stuff, they understand the forces on the flight controls and what impacts them.

Boeing made an engineering mistake. The pilots also made an operational mistake. Unfortunately, both mistakes at the same time were fatal.

I pray that pilot training has improved. And that Boeing has made systems level changes to the aircraft that will preclude it happening in the future.

And that is how aviation becomes safer every year; at a significant cost of customers lives.

ethbr1 · 2024-01-09T02:52:11

> And that is how aviation becomes safer every year; at a significant cost of customers lives.

"Significant" might be inaccurate.

It looks like FAA Part 121 accidents over the last 10 years with fatalities have been... 4. [0]

For a total of 6 fatalities.

[0] https://www.ntsb.gov/Pages/AviationQueryV2.aspx; 2018 (1 passenger fatality) https://www.ntsb.gov/investigations/Pages/DCA18MA142.aspx ; 2019 (3 crew fatalities, cargo flight) https://www.ntsb.gov/investigations/Pages/DCA19MA086.aspx and (1 passenger fatality) https://www.ntsb.gov/investigations/Pages/DCA20MA002.aspx ; 2022 (1 ramp fatality) https://data.ntsb.gov/carol-repgen/api/Aviation/ReportMain/G...

kennethrc · 2024-01-09T11:20:03

One of which (Atlas Flight 3591) was Pilot error:

> The probable cause of this accident was the inappropriate response by the first officer as the pilot flying to an inadvertent activation of the go-around mode, which led to his spatial disorientation and nose-down control inputs that placed the airplane in a steep descent from which the crew did not recover.

WalterBright · 2024-01-09T04:23:01

That low accident rate is nigh inconceivable. It's an incredible achievement.

ethbr1 · 2024-01-09T14:14:47

The fatal accident count is higher for GA, but I didn't normalize against flight hours or flights, just glanced at it.

I'm sure there's been a study somewhere that attempts to untangle all the factors that differ between commercial carriers and GA, to see which safety is most sensitive to -- continuous highly professional maintenance, highly trained and experienced crew, rigorous airliner certification regime, etc.

SAI_Peregrinus · 2024-01-09T03:02:22

Boeing also reduced the size of the manual trim wheels, which let them become impossible to turn sooner than on previous 737s.

WalterBright · 2024-01-09T03:55:57

The electric trim switches override MCAS. This was explained in the Emergency Airworthiness Directive sent to all MAX pilots after the first crash.

Also, overspeeding the airplane makes it much harder to turn the manual trim wheel. The cockpit voice recorder on the EA flight recorded the overspeed warning horn, which the crew did nothing about (they were at full power, should have pulled the throttles back).

The LA crew restored normal trim twenty-five times before crashing. What they never did was turn it off after restoring normal trim.

ShadowBanThis01 · 2024-01-09T04:26:22

If a pilot can't be expected to maintain the pitch of a plane on takeoff, he has no business flying ANYTHING.

What Boeing did (and is STILL doing) is expect pilots to know or remember obscure NON-PILOTAGE (and in the case of MCAS, BURIED) trivia to prevent disaster.

Now... what's the more-responsible approach? Expect pilots to pilot, or expect them to recall an ever-growing list of workarounds to incompetent system design?

gmokki · 2024-01-09T09:20:53

The whole MCAS was just unnecessary feature (bug fix). Without it the plane would have worked just fine. The pilots would just have had to go some amount of training scenarios to get the certification on how the MAX plain flies.

ShadowBanThis01 · 2024-01-09T04:29:21

Wrong.

echohack5 · 2024-01-09T00:39:26

QA was literally invented for the airline industry.

Software QA when actually practiced is more advanced now than airline QA.

paranoidrobot · 2024-01-09T00:57:54

> Software QA when actually practiced is more advanced now than airline QA.

...eh, I think "when actually practiced" is doing a lot of carrying there.

What do you mean by "actually practiced".

Outside of the aerospace and healthcare industries, I'm not sure there are many software shops that are doing QA to a level I would like to trust anyone's life with.

serf · 2024-01-09T01:03:37

what does advanced mean when comparing things so unlike from each other?

also software is the least likely comparison I would have made; software quality is a shit-show on a general level, and the vast public is quite aware of this every time a subway timeboard blue-screens or gets frozen on an AMI screen, or the POS machine that they're forced to interact with at work does something equally as stupid.

booleandilemma · 2024-01-09T01:46:30

Nah, in the software world, the truth is QA is where the people who can't get jobs as programmers end up. I've seen testers go on to become programmers, but I've never seen a programmer become a tester. Maybe it's different for real-time or life-critical systems, sure, but I can confidently say this is how it is in web development.

MilStdJunkie · 2024-01-09T14:57:25

Given everything I've seen so far, I'd bet good money that what happened here was miscommunication between Spirit and Boeing. Spirit started out locking down the plug, then Boeing asked them to just loosely attach it[1] so Boeing could yank the plug for interior/wiring/AC/paint, then someone at Boeing forgot about the "loosely". So now, they get in a hurry (maybe the AC/interior didn't need any access to work on, which makes sense for this MAX variant, it wouldn't need as many hatches to pull wire) and it went down the Renton line as if the plug was fully installed. It's enough to pass high blow inspection and other inspections, but then over time that "shipment config" attachment vibrated out, and pop goes the plug.

Almost certainly systemic issue though, so that sucks. Sucks real bad.

They need to get a Tiger Team or whatever together to look at everything with a shipment config, and make sure those "ship kits" don't leak into the real actual airplane configuration. This is . . ok, this is really manufacturing 101 stuff, but well, things happen.

I'm in the industry, but haven't touched the MAX, so take this with a grain of salt.

[1] Basically a "shipping" or train configuration

garyfirestorm · 2024-01-09T02:21:55

Bolts are most likely tightened with a torque wrench or a gun that is set to a torque spec. Over tightening a bolt is as bad as a loose bolt. I speculate these passed QA from Boeing because they might have been correctly torqued to the spec. What happens in field is hard to understand. One possibility is vicinity to the engine can cause extreme vibrations, these can make them loose. Other possibility is the maintenance side of things - maybe a badly calibrated torque wrench could be the reason. Mechanical systems are not inherently immutable.

kevin_thibedeau · 2024-01-09T02:24:42

When it's happening on a two month old plane it's a production problem.

FredFS456 · 2024-01-09T02:24:25

I would expect lock wire or some other method of ensuring the bolt does not un-torque itself. Especially for bolts that are not required to be removed past final assembly...

（评论） (comments)

（评论）
(comments)