Chris X Edwards

With proper isolation precautions it can be fun to download and play with predatory viruses. I'm playing with Skype right now.
2017-09-17 07:36
When everyone can read and write--permissions of the Beast!
2017-09-15 10:52
To me the most important aspect of the Equifax breach is not just that data was stolen but that it could also presumably be subtly modified.
2017-09-14 13:12
Another fun Google Voice mistranscription: "Hi Kristen, my calling from Cobalt. Gimme a call..." etc. She actually said "from Google". Doh!
2017-09-14 09:57
Online shopping sites: if you inexcusably don't have larger images don't let me spend several seconds waiting to see what "enlarge" will do!
2017-09-11 18:48
Etc.
--------------------------

Simple Tool For Creating Training Image Sets

2017-09-01 10:01

Imagine you wanted to make a classifier that could reliably identify meerkat faces. You would need a lot of images of typical meerkat faces. Unfortunately most of the photos of meerkats you find are like this.

m1.jpg

Here’s another photo of some nice San Diego meerkats.

m2.jpg

Those images are going to be difficult to use to train a machine learning classifier to recognize meerkat faces.

I wrote a little Python program which uses OpenCV to allow a curator to quickly go through a set of stock images and manually extract useful features into nicely sized (generally square as shown here) sub images. This makes it relatively easy to pull out just the cars, or just the faces, or just the signs, etc. It’s not automatic but it does reduce the donkey work to a minimum.

Here’s how it is used. If the previous images were m1.jpg and m2.jpg, the program would be run like this.

roi_sq_selector.py -p meerkat m1.jpg m2.jpg

Each of the input images is then presented in sequence. While the image is visible you can click on it and it will draw a 64x64 box and capture a sub image centered on the click. Or you can press down the mouse button on the center of the image, drag to the edge of the box you want, and release the mouse button. This box will then be scaled to NxN (N=64 for this project) and saved in a unique numbered file with a prefix of meerkat.

boxes.jpg

In about 11 seconds I converted the two photos I show above to these 7 standardized images ready for training.

meerkat-mont.jpg

I have used this technique on many different training projects. I have been able to quickly go through video frames and, with minimum tedium, extract 1000s of regularized images suitable for classifier training.

Here’s the code.

#!/usr/bin/python
# roi_sq_selector.py
# Chris X Edwards - 2017-04-30
#
# usage: roi_sq_selector.py [-h] [-p PREFIX] file [file ...]
#
# positional arguments:
#   file                  One or more filenames.
#
# optional arguments:
#   -h, --help            show this help message and exit
#   -p PREFIX, --prefix PREFIX
#                         Output filename prefix
#
# * Press "c" to change to the next source image.
# * Press "r" to reset boxes drawn on image.
#
# Interactive way to clip out square regions of interest from images.
# Select the center of the ROI first. Then select one corner of the
# square box which should evenly contain it. If the box is not
# N it is resized to be N.
#
# Modified from an idea by Adrian Rosebrock.
# http://www.pyimagesearch.com/2015/03/09/capturing-mouse-click-events-with-python-and-opencv/
import argparse
import cv2

N= 64 # Minimum box size allowed.
iN= 0 # Output image number.
C= None # Center point (x,y)
P= None # Corner point (x,y)
cropping= False # In box selection mode awaiting 2nd point?

def box_from_input(c,p): # Box coords from center (c) and corner point (p).
    cx,cy= c ; px,py= p
    d= max(abs(cx-px),abs(cy-py))
    if d < N//2:
        d= N//2
    return (cx-d,cy-d),(cx+d,cy+d)

def click_and_crop(event,x,y,flags,param):
    global C,P, cropping, iN, A
    if event == cv2.EVENT_LBUTTONDOWN: # Looking for center.
        C= (x,y)
        cropping= True
    elif event == cv2.EVENT_LBUTTONUP: # Looking for corner.
        iN+= 1
        P= (x,y)
        cropping= False
        b1,b2= box_from_input(C,P)
        cv2.rectangle(image, b1, b2, (255,0,0),2)
        cv2.imshow("image",image)
        roi= clone[b1[1]:b2[1],b1[0]:b2[0]]
        roi= cv2.resize(roi,(64,64))
        ofn='%s%03d.png'%(A.prefix,iN)
        cv2.imwrite(ofn,roi)
        print("Saved image section: %s"%ofn)
    # Something like this to make live drag boxes.
    #elif event == cv2.EVENT_MOUSEMOVE and cropping:
    #    sel_rect_p2= [(x, y)] # Another global.

ap= argparse.ArgumentParser()
ap.add_argument('-p','--prefix',required=False,help='Output filename prefix',default="roi")
ap.add_argument('images', metavar='file', type=str, nargs='+', help='One or more filenames.')
A= ap.parse_args()

for ifn in A.images:
    image= cv2.imread(ifn)
    clone= image.copy()
    cv2.namedWindow('image')
    cv2.setMouseCallback("image",click_and_crop)
    while True:
        cv2.imshow('image',image)
        key= cv2.waitKey(100) & 0xFF
        if key == ord('r'): # Reset if "r" is pressed.
            image= clone.copy()
        elif key == ord('c'): # Finish if "c" is pressed.
            break
    cv2.destroyAllWindows()

The Paradox Of Immutable Custom Linux

2017-08-17 19:20

Live distributions of Linux that come ready to run on a single CD are incredibly useful. These days (and for the past 10 or so years) the smart way to use these systems is from a USB flash drive. The brilliant SystemRescueCd, for example, has been a real life saver for me and I carry it around everywhere when at work. The great thing about a live CD is that it can be booted on any computer instantly turning that machine from a brain dead Windows cesspit into a high functioning Linux system ready for professional use. You can troubleshoot networks and hardware or have a peek at what’s on the drives. When you shut down and remove the media, you can be very certain that no unauthorized changes were made to the normal system that lives on the computer’s permanent drives. Or you can make changes, fix broken things, reset passwords, etc. But you’re in control.

The down side of these systems is they are what they are. If you download an iso image for an Ubuntu live CD, you get what the Ubuntu people thought would be the best setup. For weird people like me, this is never satisfactory. Because of the extreme efficiency these systems require, it is difficult to make permanent changes to the system. The normal way this is supported is with a concept called "persistence". This basically uses some kind of union file system overlay to keep track of the changes you’ve made to the underlying stock system. This means that if you wanted to, say, remove LibreOffice from the Ubuntu image, you could do that in intention, but what would really happen would be a note would be filed that said, "The user deleted LibreOffice". When you looked for it subsequently, it would show you what it thinks you want to see, that LibreOffice is gone, but in fact, it will remain in the image untouched. This makes the persistence method slow and clumsy for all but the simplest of tasks.

My goal was to take a live Debian installation or some kind of similar live installation and truly modify it. This is not easy and it requires a separate system, meaning it is effectively impossible to change a running system with itself. This is important because it implies a safety advantage. If you need to guarantee that a good system is still good after some potentially corrupting activity, this is the perfect solution. If, for example, you were researching some security problem and you found a hacker web site that claimed to have information, you might want to check out that site but be absolutely certain that you would not have your system permanently corrupted by that activity. Since these live systems are practically immutable, simply rebooting them pretty much is guaranteed to reset everything to pristine condition. This can be useful for public terminals or kiosks too. I’m going to say that this method also makes the use case for Tripwire and intrusion detection systems much smaller.

Additionally, many of these systems have a toram option where the root file system is loaded into RAM before being mounted. This means that the boot media containing the OS can actually be removed. Therefore you can take a pristine known clean system, boot it, remove the boot media, let your adversary use the machine, and still be 100% sure that the next time you boot that system it will not be corrupt. Which is pretty cool. Brian Krebs has endorsed this concept for things like banking. (I agree though I would add the caution that you will lose your logs and HTTPS site preference history.)

How then can the base live CD system be changed so that when you boot it up, it is exactly the way you want it? I have written some technical notes describing the process in some detail which you can find here: http://xed.ch/help/live

This photo shows two systems booted with the resulting custom OS.

livelinux.jpg

There are several interesting things to note. First the time stamps are different; this is because this distribution creates the user on the fly from a skeleton. Note that there are no desktop icons (which I can’t stand) and the alias v is operation in a terminal which has no menu bar. (I got used to that v with Slackware on my first Linux installation and can’t live without it now.) You can also see my happy face prompt indicating the last exit status. Needless to say, the capslock has been corrected. These systems are well and truly perfectly customized for me to use immediately.

In general a custom Linux is not especially novel. What’s unique here is that both of these machines booted from the same flash drive. And they’re both running live concurrently. The flash drive they were booted from was actually lying on my desk when this was taken. The laptop is kind of a funny case because it only has 1GB of RAM total (RAM use shown in top center display). To put the entire OS media in RAM and then run on that is really pushing it. It struggles, but it does work! This laptop is quite old and its hard drive had failed long ago. I just recently physically removed it to save some weight. You’re looking at a hardened laptop running with absolutely no permanent storage media of any kind. Maybe Edward Snowden can put that to use.

Nerd Price Index

2017-08-15 08:46

Since I build and buy serious computers for researchers, I feel like I have a rough idea about the costs of computing. A long time ago, say in the 1990s, whenever someone would ask me if they should buy some computer item, I would always say "no" unless it was critically needed. The reasoning was that if there was a way they could get by without it, the time they spent delaying would double the item’s performance and cut its price in half. It was always better to wait.

But for a while now, I’ve been feeling that this is just not true. I went ahead and plotted the numbers and, indeed, things are different today.

cpi.png

I took this data from the BLS Long-term price trends for computers, etc. It clearly shows that the value of waiting to buy tech gear has completely been nullified. Now, the trend looks like if you want some tech gear, you should buy it. You’ll enjoy a long operational life with it and it won’t be so dramatically made obsolete.

I’ve seen this in searching for upgrades to my personal computers and cluster hardware I manage. The main computer I sit at today was purchased in 2008. From Craigslist, used. I may have paid $300 for it and perhaps today I could pay $150 for similar performing (new) hardware. The reality is that power supplies and cases haven’t gotten that much more clever or efficient. The motherboards still need to hook everything up in a similar physical way that they did back in the 1990s. Storage is still descending in price/capicty but I treat storage separately anyway. That leaves the real areas for improvement in the CPUs and RAM and I just don’t feel like we’ve seen revolutions there. GPUs have brought some new cleverness onto the scene for those lucky enough to be able to take advantage of them (gamers, machine learning, molecular dynamics). I think the "TV" line is about right for what I’m noticing in monitors as TVs become monitors.

I can’t explain the audio equipment’s weird 2006ish drop. The cameras and internet seem to be pretty mild baseline tech trends. The real surprise for me here was the amount people are spending on software. As someone who spends almost nothing on software this steady rise in software costs seems pretty strange. Maybe there are subtleties of the way the BLS calculates these CPI metrics (well, obviously there are).

I’ve been thinking about buying some new hardware. I feel like at this point the good advice is don’t rush your purchase and don’t delay it. Focus on what’s convenient to your agenda and the price will remain stable.

UPDATE 2017-08-15 20:06: Well, that’s damn strange. I was using the 2008 computer I described in this post today and the neighborhood power went out. (Getting kind of 3rd world here with respect to power reliability.) I was actually shopping on-line for a new set of monitors at that exact moment since one of mine has been giving me trouble for a while and I just conclusively determined it was not the graphics card or cable, etc. While the power was out, I went out and bought a new monitor. When I came back the power was on; I fired up my computer and it wouldn’t boot! I’m pretty sure it was a motherboard failure or low power condition. So the day I write about how long all my ancient gear has lasted, a bunch of it dies of old age. Amazing. At least I have a clear excuse to buy what I need. I’ll have to choose carefully; I could have this stuff even longer!

A Weird Glimpse Into Autonomous Vehicle Top Management - Part 2

2017-08-09 22:25

A few days ago I wrote a post about the extraordinary glimpse into the top level of autonomous vehicle management afforded by the recent deposition of Alphabet CEO, Larry Page. Today I was equally captivated by the recently released deposition of former Uber CEO, Travis Kalanick. You can read it yourself here but again I’ll give you what I thought were the highlights.

A quick recap - Uber seems to have hired Anthony Levondowski and a coherent team from Google but the transition vector from Google to independent company Otto to Uber is muddled and sprinkled with intrigue. Primarily at issue is Levondowski’s apparent possession of some kind of data archive of proprietary information from his former employer, the plaintiff.

To start with, Kalanick seems earnest defending his engineering team, especially James Haslam, leader of Uber’s laser team who never worked at Google. I found this convincing.

TK: To be accused of doing something that he didn’t do when he put in his own — his own mind, his own effort to make something he was proud of was — was an emotional thing for him, and I think for a lot of people.

Kalanick seems believably emphatic about the fact that he really didn’t want to find Google’s tech in his shop. He says this kind of stuff in several places.

Waymo Lawyer: What did you say to Mr. Levandowski on [the topic of bringing data from a previous employer]?

TK: I made it very clear to him that we — I made it very — well, the first question is, Did anything make it to Uber? And he made it very clear to me that absolutely nothing that he downloaded made it to Uber in any way. And the second part is, I made it very clear to him how important it was to me that that was the case and that we would look into everything, every server, every person at the company, to make sure that that was true.

Indeed, the lawyers are blurring this way more than Kalanick is.

TK: I remember being very clear with Mr. Levandowski and with others in the room my desire for Uber and the Court to be able to get to the bottom of this and get to the facts.

One thing I found interesting is the insight into culture and work/life balance this testimony provided. Most seems to properly only apply to the millionaires' cadres.

Waymo Lawyer: People work from home at Uber all the time, too, don’t they?

TK: Yes.

Waymo Lawyer: How common a practice is that, generally?

TK: More common than I would like.

The Waymo lawyer sets up Kalanick with a hypothetical situation where the roles are reversed and one of his engineers leaves with proprietary stuff to a competitor and yet still hassles Kalanick for a bonus.

TK: I would respond pretty seriously to that kind of discussion.

Then some more needling by Waymo. Kalanick zings right back.

TK: Look, I certainly wouldn’t wait a year to do something about it.

This seems to be an important point of substance in the case.

Waymo Laywer: Fair to say you came away with the impression that he didn’t need to use those documents or didn’t disclose those documents in connection with getting his bonus?

TK: Correct.

The documents being related to the bonus later is coined by Kalanick himself as "the bonus explanation" (of why the documents were taken). But I feel it was pretty well established that the documents were not used to pressure Google or to advance Uber. If there really was a sloppy trail of them on Levondowski’s part, nothing substantive came of it. Except maybe this lawsuit.

There’s an interesting episode discussed where Kalanick is forced to give up his phone for a couple of hours. No big deal, but consider all his text messages were extracted as part of some court order. To me this is fine to say, if you’ve done nothing wrong it won’t be a problem, but what if there was other private or sensitive communication? Eek.

Here is some insight into why Uber is interested in autonomous vehicles at all.

Waymo Lawyer: …AV technology represents an existential threat to Uber’s business model.

TK: I would say it differently. I would say that it is—in order for Uber to exist in the future, we will likely need to be a leader in the AV, autonomous vehicle, space.

Waymo Lawyer: And why is that?

TK: Because autonomous vehicles are going to be far safer than human-driven vehicles. And a service that’s very safe compared to human-driven vehicles is going to be one that consumers want. And it will also ultimately be far cheaper than a human-driven vehicle. And consumers that can get safer rides far cheaper are going to be the consumer—those consumers are going to go to the service that provides that. And if you don’t provide that, I don’t believe you’re going to be able to sustain your business.

Kalanick well understands the dearth of experienced people in this field.

Waymo Lawyer: Why are they in high demand?

TK: Because, like, there are only so many people who are really good at machine learning as it relates to perception software, and there’s a lot more demand for those people than there is people that can do it. And so the price goes up.

This line from some internal communication was very poignant for me. Remember, Levondowski is no idiot; he knows Google’s AV program as well as anyone.

Waymo Lawyer: Quote, Levondowski says that our biggest threat is Google, but also doesn’t have faith in Google pulling it off.

That’s quite an extraordinary revelation, that Levondowski does not have confidence in a successful outcome for Google’s AV initiative.

Another interesting bit of insight I discovered is that at one time (maybe not now) Uber was working on an in-house LIDAR.

And then there is this comment which I believe refers to a Google employee or board member who is on the board of Uber and resigning over potential conflicts because…

TK: He said that Google is intending to compete with Uber in the ridesharing space. And that the efforts were substantive enough and serious enough that he felt compelled to tell us that that was happening.

Interesting. Is Google just messing with Uber?

This testimony is quite different from Larry Page’s. Very little is redacted, nothing in the entire first half. Contra popular media portrayals, Kalanick seems very polite and reasonable. He says "please" and generally seems to show patience with the lawyers. He may be an insufferable upper class twit in normal contexts, but knowing him only from this deposition, I found him intelligent, sincere, well-spoken, and competent.

It’s interesting that in all these depositions, the deponent (which seems to be a real word) can basically get out of answering any question if the answer involved some interaction in which a lawyer was present. So if you’re an uber-rich guy like Travis or Larry, it seems you could always have one lurking. If I ever become an oligarch and need an effective lawyer, I’m hiring the intense Ms. Karen Dunn, Kalanick’s lawyer, who really put on an impressive show of how to stand her client’s legal ground.

There were several moments which were kind of funny.

TK: I mean, I don’t—I don’t know if it’s an Excel spreadsheet, but I recognize that it’s a piece of paper that has words on it.

And some funny irony with Google, the plaintiff.

Waymo Lawyer: Is there a application that [your assistant] uses?

TK: It’s called — it’s called Google Calendar. I’m sure your client would be happy about that.

Waymo Lawyer: Is there any particular software in your company for this or--

TK: My guess is, it’s all Google Docs.

Lawyer: Did you have any conversation during this break about your testimony with Counsel?

TK: I did not… I watched YouTube videos.

I felt like I learned a bit more from this testimony than the previous one. However, I still feel like much of the full story has yet to be told.

2017-08-15 UPDATE: El Reg just posted a ton of text messages between Kalanick and Levondowski. I found it a bit too voyeuristic and not enlightening enough to comment further on. But check it out if you’re into that.

"Autonomous" Vehicle Research

2017-08-08 09:24

I just saw this very funny article about the "autonomous" vehicle research being done by the university I work for.

carseatsuit.jpg

They are disguising people to look like car seats so that the car looks empty. In other words, they’re spoofing autonomous cars. While that is somewhat hilarious, I can see the merit of this. I’m not sure that cars will have the same form when they’re moving around pedestrians unoccupied, but whatever. It’s probably interesting to research related behavior and attitudes just because so much of the topic is unknown.

Update 2017-08-09: Is this becoming a weird fad?

--------------------------

For older posts and RSS feed see the blog archives.
Chris X Edwards © 1999-2017