After taking an automotive Functional Safety class I learned that there was a lot to automotive software safety planning. There are tons of standards, guidelines, and best practices and the whole topic is filled with specialized terms of art. These notes are not complete or especially well organized; they are simply a place for me to keep track of some of the details that would obviously escape the recollection of someone not practicing in this field daily.
Do not use these notes to make consequential safety plans!
Functional Safety
Functional safety is the practice of trying to eliminate unacceptable risk of physical injury. In automotive contexts this is prosecuted with respect to electronic systems (including automotive computers and software) by adhering to the ISO26262 standard.
-
Standards
-
ISO 26262 - "Road vehicles - Functional Safety" Reminder: This considers all automotive electronic and electrical safety-related systems.
-
IEC 651508
-
https://www.unece.org/trans/main/wp29/meeting_docs_wp29.html
-
-
Safety = "absence of unreasonable risk"
-
identify hazards
-
measure risk
-
use system engineering to lower risk
-
-
Focuses on keeping risks below "society’s current threshold"
-
Requirements - ("shall" usually represents an engineering requirement)
-
functional: what the system is supposed to do X system shall do Y
-
non-functional: how the system should behave, e.g. how reliable is the system? X system shall be Y
-
-
Problems can come from these sources
-
Humans
-
Tech
-
Human/Tech interaction
-
Documentation
As with ISO9000 documentation is the cornerstone of ISO26262 and includes these components.
-
Safety plan
-
System and subsystem definition
-
Hazard analysis and risk assessment (HARA)
-
Functional Safety Concept (bigger picture)
-
Technical Safety Concept at system level (technical details)
Terms
-
ISO 26262 terms
-
item - high level system like "lane assistance system"
-
system - has at least a sensor, controller, and actuator
-
subsystem - a subdivision of a system
-
component - e.g. a power steering ECU has a hardware and software component.
-
element - specific hardware e.g. the motor, ECU, etc of the system
-
-
fault - When something undesirable happens, malfunction.
-
failure - System no longer is doing what it should, perhaps because of a fault.
-
hazard - A failed system that could lead to an injury.
-
risk - Level of danger in a situation.
Risk
risk = severity of harm * probability of occurance
Faults leads to failures. A failure leads to a hazard. A hazard has a certain a level of risk.
Misc Terms
-
ADAS - Advanced Driver Assist System
-
LDW - Lane Departure Warning
These are used as examples of a targeted electronic sub-system.
Safety roles
-
project manager - allocates resources
-
safety manager - pre-audits, plans development phase
-
safety engineer - develop prototypes, integrate subsystems
-
safety auditor - ensures project conforms to safety plan
-
safety assessor - judges if project has increased safety
-
test manager - plan and oversee testing activities
Hazard Analysis and Risk Assessment (HARA)
-
Identification of hazards
-
function - e.g. Lane Departure Warning shall apply haptic feedback
-
malfunction - Actor effect is too much
-
malfunction details - oscillating torque above limit
-
hazardous event - collision with another vehicle, loss of control
-
event details - intense haptic feedback can overwhelm driver
-
summary description - LDW applies excessive steering wheel motion, resulting in loss of control
-
-
Classification according to severity and probability
-
Calculating the ASIL
-
Generate safety goals
-
Situational analysis - conditions (rain, city)
-
"Operational mode on operational scenario during environmental details with situational details and item usage system."
-
E.g. "Backward driving" on "city road" during "fog" with "low speed" and "correctly used" system.
-
Operation Mode - Examples
-
Parked
-
ignition on
-
normal driving
-
backward driving
-
degraded driving (limp home mode)
-
towing
-
towed
-
service (being repaired in a service setting)
Operational Scenario - Examples
-
any road
-
city road
-
country road
-
highway
-
mountain road
-
off road
-
road with gradient
-
road with bump
-
tunnel
-
construction site
Environmental Details - Examples
-
normal
-
sun glare (visibility)
-
snowfall (visibility)
-
cross-wind (side force)
-
rain (surface)
-
ice (surface)
Situation Details - Examples
-
low speed
-
high speed
-
normal acceleration
-
high acceleration
-
normal braking
-
high braking
Item Usage - Examples
-
correctly used
-
incorrectly used
Safety Plan
Metrics
-
Exposure
-
Severity
-
Controllability
Malfunction (deviation) Guide Words
-
function not/unexpectedly activated
-
actor effect is excessive/insufficient
-
actor action is too late/early
-
actor effect is reversed]
-
sensor sensitivity is too high/low
-
sensor detection is too early/late
Hazardous Event Guide Words
-
None
-
Front collision with oncoming traffic
-
Front collision with oncoming obstacle
-
Rear collision with trailing traffic
-
Collision with cyclist
-
Collision with pedestrian
-
Car catches fire
risk = severity * probability of loss (ISO 26262)
Severity (ISO 26262 terms)
-
S0 - no injuries
-
S1 - moderate injuries
-
S2 - life threatening (Hmmm…Seems wrong…)
-
S3 - fatal injuries
Also note on the topic of "severity", commenter kgrep says:
ISO 26262-3:2011; Annex B.2 has examples of setting severity levels using existing injury scales using the AIS scale developed by the Association for the Advancement of Automotive Medicine. The AIS scale is very specific and more illustrative for purpose. Table B.1 shows a mapping with detailed examples
SAE J2980 is another good reference. This is a best practices document for determining ASIL ratings. Appendix B has several pages of additional suggestions on different approaches used in industry to set the severity levels.
Exposure
-
E4 - Always
-
E3 - Quite often
-
E2 - Sometimes
-
E1 - Rarely
-
E0 - Never
Controllability
-
C3 - difficult to control
-
C2 - normally controllable
-
C1 - easily controllable
-
C0 - controllable
Functional Safety Concept
A document containing…
-
ASIL
-
FTTI - fault tolerant time interval; how quickly system must react to hazard
-
Diagnostic Test Interval - time to detect problem
-
Fault Reaction Time - time to fix problem
-
May need to involve an idiot driver.
-
-
safe state; what system is like after avoiding an accident
-
does not go into technical details (save that for Technical Safety Concept)
-
part of the concept phase (not development phase)
Technical Safety Concept
Part of the development phase (not concept phase). 1. Detecting faults within a system 2. Detecting faults in an external device interacting with the system 3. Reaching a safe state 4. Implementing warning and degradation concept 5. Preventing latent faults (e.g. memory test for faulty memory)
Methods of Safety Analysis
ASIL - Automotive Safety Integrity Level
risk = severity * exposure * controllability
-
ASIL A = Low risk
-
ASIL B = Moderate risk 100 failures/1e9hrs
-
ASIL C = High risk (Requires Fault Tree Analysis)
-
ASIL D = Highest risk (Requires Fault Tree Analysis and MC/DC) 10 failures/1e9hrs
-
QM = Quality Management IATF 16949
ASIL Failure rates
-
ASIL D < 10e-8 per hour
-
ASIL C < 10e-7 per hour ??
-
ASIL B < 10e-7 per hour ??
-
ASIL A < 10e-6 per hour
Architectural Safety Metrics
Each ASIL has different levels for these things.
-
Single point fault metric
-
Latent fault metric
-
Probabilistic metric for random hardware failures
Categories of Faults
-
Single point fault - no safety fallback to catch it
-
Residual fault - safety fallback mechanisms present, but this is something the safety mechanisms don’t cover
-
Latent multiple point fault - faults that go undetected by safety mechanisms (or driver).
-
Perceived multiple point fault - faults that generate limited vehicle functionality, or warning light
-
Detected multiple point fault - faults mitigated to safe state by safety mechanisms
-
Safe faults - does not lead to safety goal violation
Misc Safety Topics
Fault Tree Analysis
Safety goal
E.g. "The oscillating steering torque from the LDW fucntion shall be limited".
MISRA
The Motor Industry Software Reliability Association is a programming guideline for car software safety in safety critical applications. Includes…
-
MISRA C++
Here’s George Hotz talking about his respect for these guidelines at 1h36m50s.
MISRA is written by computer scientists and you can tell by the language that they use. They talk about whether certain conditions in MISRA are "decidable" or "undecideable" — you mean like the halting problem? And — yes! Alright, you’ve earned my respect. I will read carefully what you have to say and we want to make our code compliant with that.
V model of Software Design
Used by ISO 26262. Hardware and software have their own Vs in ISO 26262 Wikipedia has a diagram.
Major features.
-
left side= planning and design
-
right= integration and testing
-
top = entire system
-
bottom = focused subcomponents
From top left to bottom to top right.
-
Specification of software safety requirements
-
Software architectural design
-
Software unit design
-
Bottom of V
-
Software Unit testing
-
Software integration and testing
-
Verification of software safety requirements
Ensuring Robustness and Quality
-
"invalid input"
-
"stressful conditions"
Freedom from interference
-
One software element should not cause a failure in another.
-
Especially problematic if low ASIL causes a serious ASIL.
-
Freedom from spatial interference - memory and storage separation. ASIL D can read from a QM element, but not the other way around.
-
MPU= memory protection units
Temporal interference
-
Blocking of execution
-
Deadlocks
-
Livelock - both threads let the other go first
-
Incorrect syncronization between elements (e.g. bad clock times)
-
"Incorrect allocation of execution time" and "Incorrect execution of sequence"
-
Alive supervision - limits execution frequency
-
Deadline monitoring - times execution and checks for unexpected slowness
-
Control flow monitoring - uses checkpoints to make sure stupid ordering of execution is not occurring
-
Communication interference - End to end protocol (E2E)
Communication problems:
-
Repetition of information
-
Loss of information
-
Delay of information
-
Insertion of information
-
Masquerade or incorrect addressing of information
-
Incorrect sequence of information
-
Corruption of information
E-Gas - software design pattern
Name comes from original electronic throttle controls, maybe for cruise control, or just a throttle by wire. Obviously this is super important and is massively designed for fault tolerance and safety.
-
Level 1 - Functional level
-
Level 2 - Functional monitoring
-
Level 3 - Processor monitoring