Teensy 3.2 + Ethernet

Teensy 3.2 with a (standard Arduino) W5500 ethernet shield

As a part of my project to scale beyond the ~4000 pixels that Teensy 3.2 supports at 60Hz, I’m looking into ways of using multiple Teensy 3.2’s and farming out pixels to each of them. The idea would be to build a simple “Branch Controller” consisting of a Teensy 3.2, a WIZnet W5500 Ethernet adapter, and an OctoWS2811 adapter.

The first part of the project was just making sure that I could use the W5500 with Teensy 3.2. I had bought the Arduino Shield version of the W5500 ($23). The whole Arduino Shield is large and clunky and includes an SD card slot I don’t need, so I really should have used a WIZ850io ($20) which is just the ethernet adapter in a much more compact form, so I switched to that:

Teensy 3.2 wired up to a WIZnet WIX850io 10/100 BASE-T Ethernet adapter

To get the Ethernet to work with the Teensy 3.2, all I had to do was make six connections (see the picture above):

Teensy 3.2 PinsW5500 Ethernet Shield PinsWIZ850io Pins
(pinout)
10 – SCSD10SCNn
11 – MOSID11MOSI
12 – MISOD12MISO
13 – SCLKD13SCLK
GNDGNDGND
Vin (3.6 to 6.0 Volts)5V
3.3V3.3V

Then I connected the W5500 to my local area network and plugged the Teensy into the computer. From the Arduino IDE, I loaded the Ethernet > Web Server example, and uncommented the line Ethernet.init(10) in setup(). After running this I had a web server running and everything seemed to be working perfectly.

Next step: I want to see how much data I can push down to the Teensy how fast. The W5500 is a 10/100 ethernet board, but the serial protocol it’s using to talk to the board will slow that down a lot.

The first experiment I did was just a minimal web server running on the Teensy. For the client, I ran node.js calling request-promise in a loop. This was able to make about 100 HTTP requests per second, which could provide a decent frame rate, but there is no data payload yet.

Next I added some payload to try to figure out the bandwidth of a Teensy. The first experiment I did got about 100 kilobytes/sec. This could handle pixel-level data for:

555 pixelsat 60 fps
1110 pixelsat 30 fps
4416 pixelsat 7.5 fps

That’s pretty disappointing; it almost defeats the purpose of using the OctoWS2811 to drive 8 separate strands. According to Paul Stoffregen’s benchmark, I ought to be able to get about 958 kilobytes per second. Which would be enough for my needs! So there was obviously some kind of optimization I’m missing.

Looking closely at the sample web server code I was using, I noticed that it was set up to read one byte at a time, no matter how many bytes were available. A modification to the Teensy code to read blocks of bytes into a 256 byte buffer got much better results; I was able to get up to 500 kilobytes per second! Which translates to:

2777 pixelsat 60 fps
4416 pixelsat 37 fps

This is a significant improvement and probably adequate, but I wasn’t happy.

I wondered if the fact that the test computer was on WiFi instead of a wired LAN could be the bottleneck. Sure enough, moving my computer to LAN dramatically improved the throughput, and I got 1,125,093 bytes across in a second, which was faster than even Paul’s benchmark. This would translate to a frame rate of 85 frames per second with 4416 pixels, well above the 60 fps limit of the WS2812b-type LED strips!

Finally, I tried combining the Ethernet and LED code into one script (branchController) to see how the timing was. The ideal architecture, I thought, would be to open a client connection to the branchController every time you have another frame to send. The trouble with this method is that there is too much overhead to opening each connection. I only got about 20 frames per second doing it this way. Another option is to send four frames at a time (connecting 15 times per second)… this worked fine (but as the code is structured right now, probably freezes LED refresh unnecessarily while the TCP connection is happening).

What I think I’m really going to need to do is open a connection and keep it open, then stuff down one frame of data whenever I have one. That will require changing the protocol a bit so that the connection is expected to stay open, which will require rejiggering the code a bit, but I’m pretty confident that is going to work and have the performance that I expect.

Interlude: BeagleBones

BeagleBone Black

The Story So Far

In investigating how to drive enormous numbers of WS2812b LEDs from Arduino-style controllers at 60 fps or faster, I found that a pretty solid option is the Teensy 3.2 with the OctoWS2811 adapter board which can drive up to 4400 pixels. But what if you need more pixels? A common approach seems to be using a TCP/IP network with CAT-5 cable to connect a bunch of controllers, then using a central PC to coordinate everything.

I was about to start working on this when I thought of maybe using a Linux-style development platform (like Raspberry Pi or BeagleBone) instead of Arduino. Let’s look at some specs:

Teensy 3.2Teensy 4.0BeagleBone Black
CoreMark Benchmark Speed12623142497
RAM64K1024K512M
Flash256K2048K4GB
Price$19.80$19.95$62.38

The neat thing about the BeagleBone is that it has two PRUs. Those are tiny little 32-bit computers running at 200MHz which have access to 16 of the output pins. You can leave the PRUs in charge of sending output to the LED strips, which frees up 100% of the BeagleBone CPU for your own image processing. There’s also an insane amount of flash memory for storing images, animations, and even videos. There’s onboard ethernet so you don’t have to mess around with wiring up a W5500 to your controller.

Still, there are some disadvantages:

  • You have to wait about 10 seconds for Linux to boot up when you power up
  • Since the PRUs can only access 16 output pins, you can’t really drive insane numbers of pixels from a single BeagleBone, so you still need some kind of distribution protocol
  • BeagleBone doesn’t have FastLED and not a lot of people are using it for addressable pixels, so you’ll have to do a lot more work trying to drive pixels than you would on the Arduino-type controllers
  • And, it’s more expensive.

One obvious idea is to use BeagleBones at the center of your architecture, sending commands over Ethernet to an army of Teensy 3.2s which serve as low-cost WS2812b drivers. You can even get industrial-strength BeagleBones like this one which support Gigabit ethernet in case the 10/100 ethernet doesn’t pump enough pixels for your design.

So now the idea would be that you can run arbitrary Linux software at the center of your architecture using tons of CPU, RAM, Flash, and even all kinds of cool 3D accelerators and stuff that are built into the BeagleBone Black. We would then design a little board for the branch controllers with Teensy 3.2, W5500 ethernet, level shifters and resistors, and use one of those for each 8 WS2812b strips.

A fancier version of this board could also include a power distribution bus for the LEDs themselves. If you were driving 4400 LEDs you would theoretically need 242 amps (1210 watts) so if you were thinking “PoE” please stop thinking that.

So you need more than 4000 addressable LEDs

Maybe you want to build something for a big thing in the desert. And the desert is wide and large and your thing is going to be really, really, big. For example, the Tree of Ténéré by Zachary Smith and team has 175,000 LEDs.

A photograph of the multi-colored Tree of Tenere at night; each leaf has 7 LEDs.
Photo by Duncan Rawlinson under CC Attribution / Non Commercial License
duncan.co

That is significantly higher than the 4416 LEDs that you can drive off of a single Teensy 3.2 with an OctoWS2811 adapter and ultimately you’re going to need some kind of central controller sending signals out to multiple branch controllers that each handle a set of LEDs.

There are a bunch of ways to think about how to do this, but a good approach might be just using simple ethernet. That way you can use a lot of off-the-shelf equipment. For example, the popular WizNET W5500 can provide internet access for the branch controllers. Indeed the Tree of Ténéré people apparently used custom boards with a W5500 and a cheapo ESP32 controller. Now you can just use generic CAT5 cables and any kind of Ethernet hub you want. As the main controller, you have a single PC. It blasts pixel data out to all the branch controllers.

There are a couple of interesting design considerations:

Do you want to use Power-over-Ethernet?

It seems like it might simplify wiring a bit to deliver some power over the CAT-5 cables. The trouble is that this would not supply nearly enough power to drive the number of LEDs that each branch controller is going to drive, so you still need a separate power distribution system for the LEDs. Given that you are already distributing power for the LEDs, it doesn’t simplify the wiring as much has you might think. It seems like what the Tree people did was distribute 12V throughout the project and then had a simple low drop regulator on their custom board to drop 12v to 3.3v.

Are you sending every single pixel?

If you are really using 175,000 pixels with 24 bits of data at 60Hz, that’s about 252Mbps. After adding the overhead of ethernet with real protocols, you might be able to crank this out using Gigabit ethernet but the W5500 is apparently limited to 100Mbps (100baseT). Unless you want to get really complicated and run multiple ethernet segments, you’re going to have to take shortcuts.

Luckily, we have a pretty good computer sitting at each of the branches, so there are a lot of ways to do that. We can send a packet down to each branch controller containing arbitrary parameters and let the controller decide how to map that onto its pixels.

For example, on the Tree project, which had 7 LEDs per leaf, you might just decide it’s OK for each leaf to be a single color. Then you could send custom pixel data for each leaf. You could also use other arbitrary custom compression schemes that are appropriate for your particular project.

Next Steps

Here’s what I’m working on next:

  • get an Ethernet switch and something to test W5500
  • build some software that distributes data from PCs to branch controllers

Useful links:

APA102c versus WS2815 LED Strips

If you’re just joining me, I’m trying to redo the LED lighting on this 46′-tall antenna:

My earlier experiments were all about speed, trying to increase the frame rate from last year’s abysmal 17 Hz to something better than 60 Hz. I decided I could easily double the number of LEDs and use single Teensy 3.2 with the OctoWS2811 adapter, and get 72Hz.

Honestly I could stop right now, but I was still interested in doing some testing with APA102c LEDs. These are what Adafruit calls DotStars. The big difference is that they have an extra pin for a clock signal.

Here are some of the pros and cons of the different options for LED strips that you might consider for blinky lights in the desert:

WS2812bWS2815APA102C
5V strips, which need power injection occasionally to avoid discoloration at the end of the stripUses 12V for power. (The signal is still 5V). This means you don’t have to inject power as often into the middle of the strip to prevent color loss.5V strips, which need power injection occasionally to avoid discoloration at the end of the strip
If one LED fails, the rest of the strip will not work.Provides a “backup path” for data that skips an LED. That means if a single LED fails, the rest of the strip continues to function.

Note that the backup feature is not always helpful; a completely break in the strip will still kill your strip.
If one LED fails, the rest of the strip will not work.
Slow PWM refresh rate (400Hz); doesn’t look great in video and doesn’t provide persistence-of-visionFast refresh rate (2KHz) looks great in video camerasEven faster refresh rate (4KHz – 20KHz) — provides amazing color dithering at lower brightnesses
Slow protocol with 800kHz data rate

If you want 60Hz refresh rates you are limited to 550 pixels
Slow protocol with 800kHz data rate

If you want 60Hz refresh rates you are limited to 550 pixels
Arbitrarily fast data rate of at least 24MHz

You’re probably limited by the speed of your controller and code
Only need one conductor for data.

(Nice option to use twisted pairs with GND + DATA, so, for example, CAT-6 can carry four lines of data
Only need one conductor for data.

(Nice option to use twisted pairs with GND + DATA, so, for example, CAT-6 can carry four lines of data
Requires separate conductors for data and clock.
Currently $12/meter/144pixelsCurrently $15/meter/144pixelsCurrently $19/meter/144 pixels

By the way, these things change all the time. The manufacturers are kind of unreliable and often update their products without notice. They especially love to flip the color order. I’ve bought the exact same product one month later (to replace a failed strip) and discovered that the new version had Red and Green flipped so I couldn’t really replace the failed strip. Be careful and test things yourself!

Next, I want to test how fast the APA102c really is. Can I really drive thousands of pixels with just with a single pin, no OctoWS2811? And do they really look better? I decided to wire up 4-meter strings of each to a Feather M4 and see how they did.

In terms of frame rate, as expected, I got faster refresh with the APA102. In a test with 576 pixels, I got an update rate of 52Hz with the WS2815 and 167Hz with the APA102.

In terms of how things looked: at full brightness, there was really no difference. But once the brightness got below 128 you started to notice that the APA102 did a much better job of rendering dim colors. In fact at very low brightnesses like 16 the WS2815 had basically dropped out completely but the APA102 was still showing pretty much the same hues as it showed at full brightness.

In conclusion, for bright outdoor applications, there is a lot to like about the WS2815, especially the fact that you can do less power injection and the resilience to single-pixel failures. For more subtle, indoor applications where you might run at lower brightness, APA102 might be preferable. I’m going to stick with the WS2815 for the antenna.

Measuring WS2812b Frame Rates with an Oscilloscope

As a part of trying to speed up the animation on my antenna, which uses 1800 addressable WS2812b-type LEDs, I’m trying a bunch of experiments. I wanted to figure out a reliable way to measure the actual frame rate I’m getting (i.e., updates per second).

My first, janky idea was just to measure things with a stopwatch, but that felt kind of primitive. I was pretty sure that I couldn’t use the clock on the controller, because I thought a lot of WS2812b libraries might be disabling interrupts and interfere with the real-time clock.

The idea I came up with was using an oscilloscope.

To do this I just found an unused pin, and pulsed it right before I called FastLED.show():

  ...

  digitalWrite(10, HIGH);
  delay(1);
  digitalWrite(10, LOW);
  
  FastLED.show();

That was the easiest thing ever. The oscilloscope did all the calculations for me and measured the frequency perfectly. Here you can see that with 1800 LEDs, I’m getting an actual refresh rate of 16.9 Hz.

(The WS2812b protocol takes 30μs per pixel, 1800 pixels should take about 54ms to send. As you can see from the screen grab, it actually took 59.18ms when you factor in all my other code).

Just to make sure I’m not insane, while still using the Feather M4, I modified the code to send 450 pixels instead of 1800:

Now I’m getting a frame rate above 61 Hz, which is very respectable.

Finally I tried the hardware that looks the best right now: Teensy 3.2 + OctoWS2811 using the FastLED library. Even pushing 450 pixels through all 8 strands, I get an even better frame rate of 72 Hz, which is almost exactly the theoretical limit of the WS2812b. If you want a 60 Hz refresh rate, you can put 550 LEDs on each strip for a total of 4400 pixels per Teensy 3.2.

What about the Teensy 4.0?

The Teensy 4.0 development board, by PJRC

Oh man, this thing looks amazing.

No sooner had I soldered pins on the Teensy 3.2 than a newer Teensy 4.0 arrived in the mail. The new Teensy is insanely more capable for the same price and in the same form factor. It’s 10-15 times faster on benchmarks, has 16 times as much RAM and 8 times the flash. It uses the new ARM Cortex M7. This is a 600 MHz chip. This is basically a Pentium III-class processor, with tons of memory, tons of IO, and all kinds of amazing things.

Teensy 3.2Teensy 4.0
CoreMark Benchmark Speed1262314
RAM64K1024K
Flash256K2048K
Price$19.80$19.95

I would love to use this development board to drive tons and tons of WS2812b-type LEDs, but I just can’t get it to work.

First:

  1. The Teensy 4.0 uses different pins for parallel output than the 3.2. I am pretty sure that means that it will not work with the OctoWS2811 library, which has not been updated.
  2. That would be OK, because FastLED seems to have new native support for parallel output on the Teensy 4.0, and FastLED is a much better library than OctoWS2811.
  3. But, because the pins have changed, the OctoWS2811 adapter board won’t work with the Teensy 4.0 (even though it is the same form factor), so I think I would be on my own in terms of providing level-shifting, resistors, and RJ-45 connectors.
  4. Another problem with the Teensy 4.0 is that a lot of its output pins seem to be in the form of tiny pads on the bottom (digital pins 24-33). These might be ok to use in an emergency but seem like a real mess in real projects.
The bottom of the Teensy 4.0 includes a stunning number of annoying little solder pads
The bottom of the Teensy 4.0 includes a stunning number of annoying little solder pads

I prototyped a little circuit that level shifts eight of the pins (19,18,14,15,17,16,22,23) from 3.3v to 5.0v:

Unfortunately, this is just not working for me no matter what I try. I even tried using an oscilloscope to figure out where it’s going wrong, but at this point, I’m in way over my head. From the current discussion on Reddit I see a lot of other people having problems so I’m pretty much ready to give up on Teensy 4.0 for a while until someone smarter than me figures out how to make it work.

(By the way, by my calculations, even the Teensy 3.2 is powerful enough to drive eight strips of 500-1000 LEDs at 30Hz- 60Hz. Putting any more WS2812b-type LEDs on a Teensy 4.0 would actually just reduce the frame rate due to limitations in the protocol. So there’s no compelling case for a Teensy 4.0 right now.)

The OctoWS2811

My first experiment in speeding up the LEDs is to use something called OctoWS2811 which is designed to let you drive 8 strips of LEDs in parallel. Theoretically, this would allow me to boost the speed of my antenna from the dismal 9 frames per second to a much more attractive 72 frames per second.

OctoWS2811 started out a library from PJRC for their Teensy 3.2 development board (an Arduino-type board based on a Cortex M4). The main thing you need to know about this library is that it uses direct memory access to drive eight output pins at the same time. You hook up those 8 output pins to eight separate LED strips and, boom!, you’re in business.

PJRC also makes an adapter board that looks like this:

The OctoWS2811 is a small board, about 55mm by 35mm. One end has two RJ-45 ethernet jacks. The rest of the board can be used to mount a Teensy 3.2 controller.
The OctoWS2811 is a small board, about 55mm by 35mm. One end has two RJ-45 ethernet jacks. The rest of the board can be used to mount a Teensy 3.2 controller.

The adapter board basically takes your eight output pins, level-shifts them from 3.3v to 5.0v, and pushes them through 100 ohm resistors. Then it wires them out to two RJ-45 jacks.

The idea is that you mount a Teensy 3.2 controller on that thing. Each RJ-45 jack then provides four twisted pairs (ground + data) which can be sent reliably over longish distances to the LED strips themselves (I found that 25 feet of CAT-6 worked fine).

I wish the adapter had sockets so you could just snap in the controller, but they weren’t too hard to solder on.

For my first test, I made 8 little strips of 8 WS2812b pixels. Here is what the whole thing looks like:

Teensy 3.2 mounted on an OctoWS2811 adapter, with two CAT-6 cables connecting it to a small dual RJ-45 breakout board, which is wired to 8 strips of 8 WS2812b LEDs.
Teensy 3.2 mounted on an OctoWS2811 adapter, with two CAT-6 cables connecting it to a small dual RJ-45 breakout board, which is wired to 8 strips of 8 WS2812b LEDs.

I ran this with the OctoWS2811 library without problems, even up to 1100 pixels per strip.

I also tried using the FastLED library driving the OctoWS2811 library; that worked fine too but was limited to about 680 pixels per strip, presumably due to memory usage.

There are more powerful Teensy boards available than the 3.2, and you might be tempted to use them in your LED projects:

Teensy 3.2Teensy 3.6Teensy 4.0
Speed72 MHx180 MHz600 MHz
Memory256K1M1M

Should you consider the 3.6 or the 4.0 instead of the 3.2? Maybe, if you are planning to do a ton of processing on the chip to prepare the visuals that you send to the LEDs and need the memory or speed for that. But the truth is that if you are driving WS2812s and want a reasonable frame rate, the Teensy 3.2 is powerful enough. If you hook it up to 8800 pixels (8 strips of 1100) it will use 86% of available memory, and those strips of 1100 pixels will only get a 30 Hz refresh rate due to the limitations of the protocol. So adding more memory or processing power is not going help you push more pixels. If you want to build something like Mark Lottor’s Hextron, with 213,840 pixels, you’ll need to use a lot of Teensy’s and figure out how to synchronize them.

Hexatron, by Mark Lottor, was a forest of 486, 20-foot tall LED light poles, each containing 440 LEDs.

Step one: performance!

Thanks for coming to my blog! I’m pretty sure you’re going to find this to be the most boring thing ever.

One of last years’ projects was a 46′ tall antenna “wayfinder” that we used to find our camp from miles away. Here’s what it looked like:

Last year’s camp, with antenna in the background. Click through for a YouTube video showing one of the animations.

It was awesome, but next year I want it to be even awesomer.

The antenna was an off-the-shelf military style field antenna, able to be set up by two soldiers in half an hour assuming that those two soldiers had eight friends and had previously spent an entire weekend in the backyard figuring out how to set it up.

We zip-tied LED strips to the antenna on four sides. Each side consisted of three 5-meter strips at a density of 30 pixels per meter, so there were 450 pixels on each side or 1800 pixels total. (Each pixel is 3 LEDs so I’m calling this 5400 LEDs).

It looked OK, but it wasn’t quite bright enough for my taste. Next year I want to double the number of pixels. And it wasn’t quite fast enough. The maximum frame rate I could get was 18Hz, and if I have to double the number of pixels, that would reduce the frame rate to 9Hz.

(The WS2812B protocol just uses a single wire for the data. You send it a high voltage followed by a low voltage; if the high voltage duration is longer, it’s interpreted as a 1 bit. That requires about 1.25μs per bit. With 8-bit color and three colors (RGB), you need 24 x 1.25μs, or 30μs per pixel, so a single update of my 1800 pixels took 54ms which meant I could get a maximum frame rate of 18fps. Not super fast. If I wanted to make a bouncing thing that bounced up and down the pole on one second intervals, it basically had to jump 25 pixels every frame, “ew David.”)

Thus, my first priority this year is figuring out how to make it faster.

I’m going to investigate two different approaches to making it faster, and blog about it as I go along. The first approach is switching to an APA102C-style LED. This chip can be sent data much, much more quickly at the cost of having an extra conductor for a clock, so you need four wires instead of three.

(Plot twist: the actual LED strips I used last year were WS2815s, a variant which also uses four wires. Instead of a clock, the extra wire is for redundancy and makes it so that if any single LED dies, the whole strip doesn’t die. That seemed important in the desert. In practice, I had two occasions where two pixels in a row died and killed the whole strip, and no occasions where one pixel died and was successfully routed-around. That thing in the desert does not want you to have an easy time. ALSO, another reason I liked the WS2815 is that it uses 12 volts instead of 5 volts, so the voltage doesn’t drop nearly as much over long runs. In short, the APA102C would solve one problem and create two more.)

The other approach is to stick with the WS2815 LEDs, but break up the whole antenna into shorter segments with maybe a few hundred pixels each, so that data can be crammed into each of the LED strips in parallel.

Here are some parts that I have on order as a part of this experiment.

I’m also keen to try Yves Bazin’s experiment. He’s using shift registers to drive 100 separate LED strips in parallel from a single ESP32 controller. My goal in life is to be as cool as Yves.

When these parts come in I’ll set up some experiments and write some reviews. In the meantime, if you have any suggestions or advice, I’d love to hear it; please email me because I am too old for social media.

What’s going on here?

Lately I’ve been working on pixel-addressable LED strips. Those are just strips of LEDs that can be individually controlled, usually with three little LEDs (red, green, and blue) for each pixel. They have a tiny integrated circuit for each one and you use something like an Arduino controller to sent them their color values and thus display cool artistic effects.

Basic pixel-addressable LEDs

The mass market just buys kinda ugly Christmas tree lights at the local hardware store which have been preprogrammed in a factory in China to do boring things with colors that seem logical to a programmer but look bright blue because most RGB colors, when chosen at random, look kinda blue. What I’m trying to do is build things that look unique and original.

This blog is not really intended for mass consumption. It’s not going to be very interesting unless you, too, are working on pixel-addressable LED strips and trying to get them to be fast and amazing!