Temperature Sensors

Thermal sensors are very cheap. It should be trivial to have a computer know the temperature of anything you want to monitor. But it’s not so easy.

DS1820

(Everything I know about this is thanks to theamk!)

The inexpensive solution I used was a Dallas Semiconductor DS1820 wired directly to a 1-wire/USB converter using normal RJ11 4 wire telephone cable. The 1-wire protocol is explained in this wikipedia article.

Wiring

DS1820 leads go 1 to 3 left to right looking at contacts emerging from device with flat part facing up. Looking at the RJ11/12 receptacle with the retaining tab facing down, the contacts are numbered 1 to 6 (RJ12) or 1 to 4 (RJ11).

  • contact 1 = No connection

  • contact 2 = DS1820 lead #3 Vdd - Optional Vdd and lead #1 GND - Ground (joined)

  • contact 3 = middle lead #2 of DS1820 DQ - Data In/Out

  • contact 4 = No connection

I believe that multiple sensors can be wired in parallel up to some absurd limit (related to how long the cabling is; 100m cables (total) should be no problem for single devices.

One Wire File System (owfs)

You can query the state of the sensors using Linux’s one wire file system.

Warning
It can be tricky to install. On Ubuntu, don’t for get these packages: libusb-dev libfuse-dev fuse-utils (and maybe more).

Load fuse module (fuse = filesystem in userspace) which will allow 1-wire information to be present as a virtual file system.

$ sudo modprobe fuse

Run the owfs binary that mounts the mountpoint (/mnt here) with the 1-wire data.

$ sudo ./module/owfs/src/c/owfs -u /mnt/
DEFAULT: ow_usb_msg.c:DS9490_open(263) Opened USB DS9490 bus master at 4:5.
DEFAULT: ow_usb_cycle.c:DS9490_ID_this_master(191) Set DS9490 4:5 unique id to 81 2C 70 26 00 00 00 75

Start querying the data. 10.5BBC0F010800 is the unique ID of my DS1820 sensor. Other sensors in the system will have other IDs.

$ printf "temp:%s\n" `sudo cat /mnt/10.5BBC0F010800/temperature`
temp:24.3125

Accuracy

The specified accuracy is +/-0.5C from -10C to +85C but I feel it’s generally within .2C. It’s certainly good enough for normal purposes. Strangely the temperature seems to be reported in full decimal notation where the resolution is 1/16 degree Celsius.

Cost

  • $2 sensor

  • $28 converter

  • $2 50ft RJ11 Modular Telephone Cord, 4 Conductor 6P4C, Pin 1-1

  • $3 Modular Surface Mount Jack, End Access, RJ12

Note
multiple sensors can use one converter.
Note
Prices in 2010.

acpi/thermal_zone

Often motherboard and CPU temperature sensors work out of the box with Linux:

$ cat /proc/acpi/thermal_zone/THM0/temperature
temperature:             50 C

It looks that that very handy virtual file is being replaced with a nice confusing mess which lives somewhere in the region of:

/sys/class/thermal

lm_sensors

If you’re pretty sure that there are sensors on the motherboard but acpi doesn’t show them, often lm_sensors can.

The major distributions usually have this available. To install lm_sensors on CentOS: sudo yum install lm_sensors. On Debian it seems to be lm-sensors.

Then you have to run sensors-detect as root. Do what it says. If, like me, you have no idea what it’s doing, it’s probably efficient to just accept all defaults with:

yes "" | sensors-detect

If all goes well you’ll be able to use the sensors command that produces all kinds of info including temparatures, voltages, and fan speeds.

Typical Output Of sensors Command
$ sensors
lm78-i2c-0-2d
Adapter: SMBus PIIX4 adapter at 0b00
VCore 1:   +1.74 V  (min =  +0.00 V, max =  +3.49 V)
VCore 2:   +1.68 V  (min =  +0.45 V, max =  +0.16 V)   ALARM
+3.3V:     +3.33 V  (min =  +2.85 V, max =  +2.77 V)   ALARM
+5V:       +5.59 V  (min =  +4.84 V, max =  +2.47 V)   ALARM
+12V:      +8.63 V  (min =  +8.15 V, max =  +1.46 V)   ALARM
-12V:      -5.90 V  (min =  -9.46 V, max =  -4.01 V)   ALARM
-5V:       -2.50 V  (min =  -3.95 V, max =  -2.70 V)
fan1:        0 RPM  (min = 3461 RPM, div = 2)          ALARM
fan2:        0 RPM  (min = 6818 RPM, div = 2)          ALARM
fan3:        0 RPM  (min = 168750 RPM, div = 2)          ALARM
temp:      +39.0°C  (high =   +32°C, hyst =  +105°C)   ALARM
vid:       +3.00 V
alarms:   Board temperature input (LM75)               ALARM
alarms:   Chassis intrusion detection                  ALARM

If all does not goe well, check out this really useful on-line diagnostic of your hardware. Just feed it the output of lspci -n.

Temperature Monitoring

Often you are curious if some hardware instability is related to temperature. One easy thing to do is to set up a little script to log the temperature over time. Maybe you can see a temperature rise when the machine stops working (i.e. stops logging).

$ while sleep 10; do echo  `date` `sensors|grep ^temp|awk '{print $2}'` >> ~/tempwatch_node87; done &

Commercial Temperature Monitoring Offerings

IPMI

IPMI stands for "Intelligent Platform Management Interface" and is a standard used by hardware to manage monitoring by various hardware devices. IPMI can support SNMP (Simple Network Management Protocol) for querying and setting hardware states.

OpenIPMI is probably a good place to start looking for Linux IPMI functionality.