This summer I’ve been taking some time off and working on a few side projects. The most interesting stuff has come from Udacity’s Self-Driving Car class, which has been a wonderful way to build on my experience and exercise what I already learned from my robotics experience at Duke (I just graduated some months ago!).

Today I’m going to talk about the “advanced” lane following project. The goals Udacity set up were direct, but certainly open-ended.

Given a video:

Highlight the current lane the car is in,
Determine the position of the car in the lane,
Determine the lane’s radius of curvature.

These goals emulate some of the core information you would need for Level 2 automation, such as lane centering, drift alerts, and highway lane following, all of which we can see in existing production cars. Of course, tracking lanes is also used in higher levels of autonomy, but is paired with a ton more information to make it robust to more situations. For this project, there is no LIDAR, no high-resolution maps with known lane information, no GPS, no inertial data. All we are using is video.

If you’re looking for the code, head on over to the GitHub repository.

Summary & Results

Let’s start with a high level summary of the project and some of the interesting results, then I’ll explain how I got there. The gif below shows how the system performs on a typical stretch of road:

But first, what is even going on here? Well I won’t go into all the details until later, but what you’re seeing is a diagnostic view showing each part of the lane tracking pipeline.

In the bottom panel you can see the view of the road from the dashboard of the car, with the lane highlighted and extending as far as the system is confident. On the left is an estimate of the car’s position in the lane and the road’s radius of curvature.
In the top left panel you can see an overhead view of the road, found by perspective warping the raw input video.
In the top center panel, you can see two things. The white blotches are pixels from the overhead video that the program thinks could be part of the lane line, mostly based on color. It then refine those pixels by finding tight clusterings of white pixels in different horizontal strips along the image. The red windows highlight these clusters and have no smoothing applied, so I call it the “raw” lane detections. We can expect the red windows to move around a lot and be inaccurate at times when there are more clusters of white pixels. Additionally, when no pixels are found along the search strip, the window is not displayed.
The top right panel shows the filtered lane windows, which takes into account the position of the raw lane detections (red windows) over time and gives us the most likely position of the lane given its history and our confidence in past measurements. This is done using Kalman filters in addition to rejecting outliers. When there is no raw window detected or don’t trust the detection, the corresponding green filtered window is “frozen” and its position is not changed. The software represent this by fading the color. If it stays frozen for too long, the window is dropped entirely and no longer displayed. The highlighted lane in the bottom panel of the video is found by applying a curve fit to the green filtered windows that are not dropped and then warping it back to the dashboard view.

Findings

Robust to noise

The clip below shows how even when the tracker selects a lot of incorrect pixels (the white pixel blotches seen in the center image) and the red raw lane detections are incorrect, the final filtered lane detection stays accurate. A.k.a., this system can keep a strong fix on the lane lines in spite of the noise caused by the tree’s shadow.

Robust to dropout

The next clip shows how the system is able to handle cases where the tracker doesn’t find any lane pixels, meaning some of the the red raw windows have nowhere to go. However, we do know where the lane used to be, so the tracker deals with this problem by keeping the green filtered windows in place for some number of frames. If the program wait too long, it gets rid of the window until there is a new measurement.

We also have the special case of dropout at the end of the line, when the program doesn’t see the lane pixels extend all the way to the top of the image. Without these pixels, it can’t select any windows, and without all of the windows it can’t be sure where the lane is along the entire length of the image. We don’t want to extrapolate, so the tracker only display the lane to the length of the shorter line. This is why the lane shrinks and grows in this clip as green filtered windows appear and disappear.

Technical breakdown

Alright, but what is really going on? No more handwaving; let’s dive into the details. If you want to read along with the code, check out the GitHub repository.

Getting an Overhead Perspective

We start off with a view from the dashboard of the car. While this is a familiar perspective, it would be far easier to work from above. Fortunately, we can do that by warping our perspective — something that OpenCV supports out of the box. But first, we need to undistort the images in order to remove any radial or tangential distortion caused by the lens.

Also, in order to keep my code organized, I combined all of the functions needed to deal with the camera into a single DashboardCamera class in find_lanes.py. Check it out if you’d like to see the code. I’m going to omit it here.

Calibration & Undistorting. Calibrating a camera is pretty much the same wherever you do it. You get a bunch of images of a chessboard, you use some OpenCV functions to find pixel locations of the corners on the chess board, and pair them with the known real-world positions of those corners. Pass those all to cv2.calibrateCamera() to get back the camera matrix and distortion coefficients, and then pass that, with your image of choice, to cv2.undistort() to undistort the image.

Perspective Warping. Once you have an undistorted image, making it appear as if it is from a different perspective is even easier. With cv2.getPerspectiveTransform() you can select four points in the image you would like to warp to new positions. In this case, I want to warp the lane lines so that they are parallel (at least on straight roads), rather than leaning inwards towards the vanishing point. Selecting points from a straight section of highway, I defined a warp with cv2.warpPerspective() that would move the lanes so that the top and bottom of of the lines were aligned vertically. This transform can then be applied with cv2.warpPerspective().

Selecting Lane Line Pixels

Once we have an overhead image, the next step in the pipeline is to select pixels that we think are on the lane line. I decided to do this by:

Converting the image to another color space.
Selecting a specific color channel from that color space.
Normalizing the image using CLAHE (Contrast Limited Adaptive Histogram Equalization), which is particularly helpful because it normalizes different segments of the image separately, ensuring that there is strong local contrast and correction for shadows.
Creating a binary image of the lane pixels by only selecting for pixels above a certain intensity.

These steps are done once for each chosen color channel, the results of which are added into a single image. I call this final image the “score” image, as the intensity is highest where the color channels agree on lane pixel locations. To tune these parameters, I created a Jupyter notebook threshold_tests.ipynb that let me run the pixel scoring procedure on a series of test images.

After tuning, I chose to use the LAB B, HSV value, and HLS lightness channels as they seemed the best at finding either the yellow or white lane lines with as little noise as possible. The code for doing all of this is below:

[source]

One important thing to realize about this implementation is that pixels are chosen individually based on their intensities alone, regardless of the pixels or environment around them.

I did try a few different gradient based methods, which would introduce some local context, but decided they were not reliable enough. In the future I would probably use more advanced methods to select lane pixels or regions, such as neural networks. A covnet would be able to account for the context surrounding each pixel, would not require tuning color space, and would self optimize. Of course, there would be the extra work of setting up a dataset for training lane-lines vs not-lane-lines.

What is a lane line anyway?

Before going further, it is important think more about what we consider a lane line to be, what properties we can use to identify them, and the assumptions we are making.

From our pixel selection strategy, we’ve already decided that lane lines have a certain color or appearance that exists locally to every pixel that make it up. From our goals, we’ve also decided on a global property of a lane line — that it generally follows the shape of a 2nd order polynomial and that we would like to know exactly what that polynomial is. We also expect the curve to be primarily vertical, that the lanes never cross each other, and that the left lane starts left of center and the right lane starts to the right, all of which may seem obvious, but are still important.

PS: I also expected lane lines to be parallel and share the same curvature, something we could take advantage of by fitting them as two datasets with common parameters. I actually tried this and you can find the method in my older commits but I found that this property did not hold up in practice. While the perspective transform is supposed to remove the one point perspective, it is only an approximation that assumes a lot of things about the environment. In the overhead view for the challenge video, you can see the lane lines still slope towards each other rather than staying parallel. This is an artifact of the one point perspective and messes up the simultaneous curvefit, so it is safer to fit them separately.

These properties alone might seem like enough to find and define a lane line. Why not simply run a curve fit on the lane pixels the tracker selected in the last step? Boom. Done. But of course, this method entirely falls apart when (as expected) the system selects incorrect pixels, introducing noise. Just look at the score image below, how would you fit that?

What about RANSAC? Some of you might be thinking to use more robust measures like RANSAC, a well known curve fitting method robust to outliers. RANSAC is great, but after trying it out I found it also had trouble in some situations (not to mention it was very slow). For example, in the right lane in the image above there is very little signal and a lot of noise. Curving from the middle of the image to the top right is about just as good as fitting vertically in terms of number of inliers. RANSAC can only be pushed so far.

Accounting for motion is key

The reason these curve fitting methods fail is that they do not account for time or motion. After all, I expect the lane line to stay in the same position over time and when the lane line starts moving to the left or right, I expect it to keep moving in that direction. Especially, I never expect the lane line to suddenly jump to a new position. Thus we should not only be tracking the position of the lane in each frame, we should also be tracking the velocity of the lane line over time. This insight is essential to tracking lane lines robustly, despite noise and dropout.

My Lane Line Model

Having considered both the shape and movement of a lane line, we are ready to create a model for describing it.

We break the lane line down into a series of nine points along the y-axis. Keeping their y position fixed, the points are expected to slide along the x-axis with some velocity. These points are tracked individually, since different parts of the lane line will move at different rates. We do expect to find consecutive points near each other on the x-axis. Finally, given these nine points, the lane line can be fully described by a curve fit along them.

The tracker then robustly filters the movement of these points along the x-axis using a 1D Kalman filter for each point. Without going into too much detail, a Kalman filter is great in this situation for three reasons. First, we don’t have to measure velocity by hand, the Kalman filter infers velocity on its own based on how it is set it up. Second, the Kalman filter can account for motion — it will smooth out the positions I give it in a way that tends to keep velocity consistent and prevent rapid changes in velocity. Finally, a Kalman filter can determine if a measurement you gave it does not fit the model it is set up for (i.e. a point moving left and right with some stable velocity). For example, if we’re to hand the filter a new measurement that is waaaaaay far from the last position the point was at, the Kalman filter can recognize that the measurement is not within any region of where the model predicted that point to be next, even with big confidence bounds. In other words, based on the past position and velocity of that point, this new position measurement is a total outlier. In this case, the program can entirely reject that outlier.

Measuring & Filtering with Window Search

To implement this model, the program will need to measure the position of each point on the line and then pass it through the filter. It tracks these points using sliding windows, one for each point. For example, in the image below, you can see that each lane has nine windows. The left image shows the measurements, while the right shows the filtered positions that it will later curve fit.

To achieve this behavior, the program locates a cluster of pixels that could be part of the lane line, rejects that measurement if it thinks the cluster is an outlier, and filters the measurement if it is not.

Specifically, for each of the nine windows along the lane line, starting at the bottom of the image, the program will:

Scan the window across the image (again keeping its y-position fixed) and find the x-position where that window covers the most pixels.
- If the window does not find any signal in its search region, it is marked as undetected.
This x-axis position is then checked by a series of outlier detectors:
- If the Kalman filter reports a very low log likelihood for this measurement, it is considered an outlier.
- If the window contains less than 60% of the pixels in the search area, it is also considered an outlier.
If something was detected and the measurement is not an outlier, the x position is passed through the the current window’s Kalman filter and its position is updated.

Each consecutive window is constrained to an x-axis search region centered on the filtered position of the previous window. This means that filtering not only improves the accuracy of the filtered windows, it also ensures that the raw windows are constrained to the right search region. As a result, the raw windows are actually less error prone than on a system without filtering, where errors at the bottom of the image could propagate to the windows at the top. At times, the search region is also truncated to prevent it from overlapping into the search region of the windows in the opposite lane, which prevents the lane detections from crossing over each other.

This next part is really important. When any window is undetected or considered an outlier, its filtered position does not change; it is frozen. This allows us to model how we expect the lane line to stay in the same position for a short amount of time even when the program cannot find it or see it. However, if a window stays frozen for too long, the program drops it entirely and prevents it from being used in curve fitting. In all visualizations, the green windows are faded when frozen and disappear when they are dropped after being frozen too long. All of this information is tracked in the Window object.

The bulk of the code that performs these operations can be found in joint_sliding_window_update() and Window.update() below.

[source]

Fitting the Windows

Finally, we now have a reliable filtered position of the points along the lane line in the form of the green filtered windows. We also know which points are unreliable and should not be used (i.e. the dropped windows).

So, all the tracker does now is apply a polynomial fit along the filtered windows that have not been dropped. From that, it can calculate the position of the car in the lane line and work out the radius of curvature for the lane.

Code for doing this can be found in the second half of the LaneFinder.find_lines() function, which makes calls to LaneFinder.fit_lanes() and LaneFinder.calc_curvature(). This is pretty standard stuff, so I’m keeping it out of the post.

With that, we are done. The tracker has found the lanes, found the position of the car in the lane, and found the curvature of the road. There is a tiny bit of housekeeping that remains though, mostly dealing with visualizing all this data in a nice clean way. To learn about that, you’ll have to read the source code.

Robust Lane Tracking

Summary & Results

Findings

Robust to noise

Robust to dropout

Technical breakdown

Getting an Overhead Perspective

Selecting Lane Line Pixels

What is a lane line anyway?

Accounting for motion is key

My Lane Line Model

Measuring & Filtering with Window Search

Fitting the Windows

Video Results

Basic Test Video

Challenge Video

Similar Posts

One thought on “Robust Lane Tracking”

Leave a Reply Cancel reply