Linux kernel module : Building a soft UART for the Raspberry Pi - part1

Sep 10, 2020

The RPi is a well-known platform intended for educational use. For its compact form and relatively low cost, this board has become very popular among enthusiasts and makers. The RPi is used in many projects from different fields, eg.Robotics, IoT… where it’s linked with different types of sensors and detectors.

In many cases, the sensors that are used, adopts a communication protocol such as SPI, UART, I2C for wired communication, or wireless using protocols such as Bluetooth and ZigBee. The most adopted, and easy to use of serial protocols is the Universal Asynchronous Receiver-Transmitter (UART for short).

Unfortunately, all versions of the RPi until version 3, comes with only two UART ports: A full featured UART (PL011) enabled by default (GPIO’s 14 and 15), and a mini UART with a reduced feature set (used by the wireless LAN/Bluetooth controller, on models which contain this controller) that necessitate some tweaking to get it working at first (modification of the Device Tree Overlay basically). This hardware limitation of the RPi can be a complication for projects that needs more than the two UART ports, especially for RPi models that doesn’t have much of peripherals (eg. RPi Zero).

The idea behind this project, comes to tackle this specific problem by creating a software UART compatible with all Linux based boards including the RPi. Because it’s a software implementation, it has only the minimal features required to establish a reliable serial connection. As you will find throughout this 2-parts write up, the software implementation obviously can’t compete with the real hardware port, nevertheless, it offers a decent alternative.

All the code can be found in my Github.

Linux driver

Yes, I opted to implement this software UART port inside the Linux kernel as a Linux driver (a.k.a Linux module). If you are not familiar with Linux modules, they simply consist of a chunk of code that runs inside the kernel space, basically to add useful features to the OS or support to some device or user app, and it can be loaded/unloaded into the Linux kernel at runtime. The advantage of Linux modules over the applications that runs in the user space, is that they have higher execution privilege and generally run’s faster than the user space apps. As we are trying to emulate a hardware protocol by software, speed of execution is a real concern for us, hence, the implementation as a Linux driver.

UART serial port

Before starting the implementation of our driver, we need first to take a look at the inner working of the UART protocol.

Below is a general summary of how the UART protocol works.

The UART protocol is a two-wire protocol: transmit Tx and receive Rx, that lets two nodes exchange data asynchronously. To synchronize data sampling, the transmitter adds to each data word a start and a stop bit (the stop can be 1, 1.5 or 2 bits long), along with an optional parity bit that helps protecting the integrity of the data word.

Each packet contains a data word that can be 5 to 9 bits long. When the receiver detects the start bit in the Rx line, it starts sampling the data at the frequency specified by the baudrate (bits/sec). Additionally, advanced UART’s implements a hardware flow control (HFC) which is basically a strategy for communication between slow and fast devices without data loses.

UART is a very simple protocol, however, its software emulation rises some exclamation marks about software preemption and the execution speed. As a matter of fact, like all other programs that runs on the processor, our Linux driver will obey to the scheduling policy of the Linux OS and depending on the baudrate used, the UART protocol can impose very strict timing constraints: Each bit that is transmitted via the Tx line, takes 1/baudrate seconds to be sent. Therefore, if we consider that the receiver samples the data bits exactly at the middle, 1/(2*baudrate) will correspond to the maximum allowed difference between the sender’s and the receiver’s baudrates, e.g. for a baudrate of 9600, the max allowed difference between the two baudrates is ~52 μs.

As a consequence, to avoid preemption and speed problems, in our implementation we will try to minimize all critical parts in the code and we will consider only the following minimal feature set of the protocol:

Data word size of 8bits.
1 start and 1 stop bit.
No parity bit.
No flow control.

What is a Linux TTY driver?

The name of tty devices came from the old abbreviation Teletype (writer), and it is commonly associated with serial devices. For a serial port to be properly integrated under the Linux kernel, it must be visible as a tty device from the user space, and for that, any serial driver must be implemented in the tty kernel subsystem.

The figure below, shows the implementation of the tty subsystem under the Linux kernel:

The tty core driver is implemented as a character driver under the Linux kernel, and it offers a set of functionalities that serves as an interface for serial devices. This driver is responsible of controlling both the data flow and the format of packets passing through it, allowing serial drivers to focus on the low-level hardware interactions instead of worrying about data exchanges with the user space. In addition, tty core include a set of extensions called ‘lines of discipline’ that mounts between the tty core and the serial driver enabling extended functionalities to the serial driver (by default, tty core uses ldscp N_TTY which directly link the serial driver to the user space).

Until version 2.6 of the Linux kernel, serial drivers were implemented directly under the tty core driver inheriting a non-negligible complexity for the driver development. However, since version 2.6, a new interface, serial core, was implemented under the tty core to ease the development of serial drivers.

In this project we will implement our driver under the serial core interface, but before that, one of course must have a functional and structural understanding of the APIs offered by this interface and how interactions with the low-level operations Tx & Rx are implemented. For that, one needs to analyze the source code of the interface: /linux/serial_core.h .

After thorough analysis, in the diagram below I show the important structural links of serial core with the top layer tty core and the low-level layer:

In fact, the use of serial core interface needs the definition of three main structures:

The structure representing the driver: struct uart_driver
The structure representing the port included in the driver: struct uart_port
The structure containing the pointers to the port operations: struct uart_ops

Other structures are automatically initialized when the driver is registered on the kernel, and they must not be defined by the user e.g. tty_driver

Implementation

For organization purposes, (As we – informaticians – like to split things into layers) the implementation of the driver will be separated in two halves:

A bottom half that contains the UART protocol implementation and manages the low level GPIO interactions.
A top half which implements the actual tty driver and manages the interaction with the user space.

Before creating the code for the two layers, we first need to have some sort of FIFO memory to store received and to-send data. For this purpose, we decided to implement a circular buffer that will manage data exchanges between the two layers of the driver. Also, this type of buffer allows an efficient use of the fixed buffer size.

A circular buffer is characterized by two pointers that manages the read/write operations: a head pointer, always pointing to the top of the buffer (address of the next write), and a tail pointer that always points to the last element of the buffer (address of the next read).

In circular_buffer.h we will create the structure of our circular buffer which include the two pointers head & tail, a fixed size data buffer (of type char) plus a flag to indicate the status of our buffer:

 struct buffer
 {
 int head;
 int tail;
 bool isfull;
 unsigned char data[BUFFER_MAX_SIZE];
 };

There will be two instantiations of this buffer: rx_buffer and tx_buffer so that data exchanges can be managed separately between the two halves of the driver.

In addition, we will define some basic functions that eases buffer manipulations:

The init function that resets the buffer pointers to zero:
```
 void initialize_buffer(struct buffer* buffer);
```
Functions to read and write one char into the buffer:
```
int push_character(struct buffer* buffer, unsigned char character);
int pull_character(struct buffer* buffer, unsigned char* character);
```
after each write (read) the head (tail) pointer is incremented. This modification of the two pointers is implemented carefully following the circular buffer algorithm.

Functions to get the status of the buffer:

bool isBufferFull(struct buffer* buffer);
bool isBufferEmpty(struct buffer* buffer);

Bottom half

Now, let’s start our implementation of the low-level UART protocol. To make things more fun, I drew the state diagram below to sum up our implementation. As it’s only a half-duplex protocol, our code will have only one function to execute at a time: write or read.

In a new file soft_uart.c we start the definition of our main functions:

INIT_fct:

The first function is the init function, in which we start by initializing the Rx and Tx buffers, the GPIO pins (note : the idle state of the Tx pin is high), and we reserve the Rx line interrupt to be able to start receiving data (in the ISR) asap. We initialize also a reception tasklet that is used to pass Rx data over to the top layer (this will be discussed further below).

This function takes in parameters the GPIO pin number of both Tx and Rx (in our implementation we used pins 17 and 27 respectively), in addition, it takes a pointer to the serial port structure struct uart_port which is handed over to the receiving tasklet.

For the baudrate, we will set it as a global variable so it can be accessed by all functions of the driver.

int uart_init(const int tx, const int rx, struct uart_port *port)
{
int ret;
/*Initilize buffers*/
initialize_buffer(&tx_buffer);
initialize_buffer(&rx_buffer);
    
/*Initialize the GPIO pins.*/
gpio_tx = tx;
gpio_rx = rx;
ret = gpio_request(gpio_tx, "uart_tx");
if(ret<0) 
ret = gpio_direction_output(gpio_tx, 1);
if(ret<0) 
ret = gpio_request(gpio_rx, "uart_tx");
if(ret<0) 
ret = gpio_direction_input(gpio_rx);
if(ret<0) 
    
/*Initialize interrupt: rx trigger*/
ret = request_irq(gpio_to_irq(gpio_rx), (irq_handler_t) handle_rx_start, IRQF_TRIGGE
R_FALLING, "Rx_handler", NULL);
if(ret<0) 
else printk(KERN_INFO "irq requested succesfully ! \n");
    
/*rx_tasklet init*/
tasklet_init(&rx_tasklet, rx_tasklet_function, (unsigned long) port);
return 0;
}

Tx_fct:

The transmit function is implemented as a simple while loop in which we retrieve data – word by word – from the Tx_buffer and send it bit per bit, along with a start bit at the beginning and a stop bit at the end of each data word.

To respect the baudrate, we will implement a delay after each bit that is sent using the ndelay() function (nano-second delay). In addition, at the end of transmission, we call the uart_write_wakeup() function to wake up the upper layers and trigger the next transmission.

int uart_handle_tx(struct uart_port *port)
{
/*send wakeup to tty layer to begin next write*/
uart_write_wakeup(port);

/*send data*/
while(pull_character(&tx_buffer, &send_char) == 0)
{
    //startbit
    gpio_set_value(gpio_tx, 0);
    ndelay(sleep_interval_ns);
    //data bits
    gpio_set_value(gpio_tx, 1 & (send_char >> 0));
    ndelay(sleep_interval_ns);
    gpio_set_value(gpio_tx, 1 & (send_char >> 1));
    ndelay(sleep_interval_ns);
    gpio_set_value(gpio_tx, 1 & (send_char >> 2));
    ndelay(sleep_interval_ns);
    gpio_set_value(gpio_tx, 1 & (send_char >> 3));
    ndelay(sleep_interval_ns);
    gpio_set_value(gpio_tx, 1 & (send_char >> 4));
    ndelay(sleep_interval_ns);
    gpio_set_value(gpio_tx, 1 & (send_char >> 5));
    ndelay(sleep_interval_ns);
    gpio_set_value(gpio_tx, 1 & (send_char >> 6));
    ndelay(sleep_interval_ns);
    gpio_set_value(gpio_tx, 1 & (send_char >> 7));
    ndelay(sleep_interval_ns);
    //stop bit
    gpio_set_value(gpio_tx, 1);
    ndelay(sleep_interval_ns);
    //statistics : increment port tx count
    port->icount.tx++;
}
return 0;
}

We’ve used the ndelay() function instead of htimer (which is a high resolution timer used in time precision based events) because the costs of frequently enabling and disabling htimers, in our use case, was experimentally judged of more harm than good (compared to using the *delay() function family) especially for high baudrate values.

Rx_fct:

The implementation of the receive function is divided in two parts: the first part, which is the critical one, directly implemented in the ISR of the Rx line and handles the sampling of the received bits, and a second part, implemented in a tasklet, responsible of passing the received data over to the top layer.

During the critical sampling part, for the interrupt to not be triggered by the data bits and scheduled by the system (the IT must only be triggered by the start bit), we must momentarily disable it using the disable_irq_nosync() function. The interrupt is reenabled after the reception of one data word. At the end, each data word is stored in the Rx_buffer and the Rx_tasklet is called.

Once again, we add nano delays between bits reception clocking out the sampling frequency. However, this time, we add a ¼ baudrate time before the first sample to avoid sampling at the edge (not ½ - sampling at the halfway point – to increase our jitter acceptance margin), also the reception of the start and stop bits is ignored using a dummy delay.

irqreturn_t uart_handle_rx(int irq, void *dev)
{
disable_irq_nosync(irq); /*disable the interrupt, so it's not triggered by the data bits*/
//start bit
ndelay(sleep_interval_ns + sleep_interval_ns/4); /*adding 1/4 offset time to avoid sampling at the edge. Not 1/2 (sampling at the halfwaypoint), so we can increase our jitter acceptance margin, as we are more likely to sample slower not faster than the baudrate.*/
//data
receive_char |= (gpio_get_value(gpio_rx) << 0);
ndelay(sleep_interval_ns);
receive_char |= (gpio_get_value(gpio_rx) << 1);
ndelay(sleep_interval_ns);
receive_char |= (gpio_get_value(gpio_rx) << 2);
ndelay(sleep_interval_ns);
receive_char |= (gpio_get_value(gpio_rx) << 3);
ndelay(sleep_interval_ns);
receive_char |= (gpio_get_value(gpio_rx) << 4);
ndelay(sleep_interval_ns);
receive_char |= (gpio_get_value(gpio_rx) << 5);
ndelay(sleep_interval_ns);
receive_char |= (gpio_get_value(gpio_rx) << 6);
ndelay(sleep_interval_ns);
receive_char |= (gpio_get_value(gpio_rx) << 7);
//stop bit
ndelay(sleep_interval_ns);
/*reenable the interrupt*/
enable_irq(irq);
/*push_character to rx_buffer*/
push_character(&rx_buffer, receive_char);
/*schedule tasklet*/
tasklet_schedule(&rx_tasklet);
/*reinitialize receive char*/
receive_char = '\0';
return IRQ_HANDLED;
}

In contrary to what their name implies, tasklets are not small tasks/threads, they are simply a mechanism that allows ISRs to execute a non-critical portion of the code outside its context. The tasklet is executed asap after the ISR is ended. Yet, only one call to the tasklet can be scheduled, if the tasklet is called a second time, only one instance is executed. In our case we implemented a while loop inside the Rx_tsaklet function in which we read data continually from the rx_buffer and send it up to the top layer.

void rx_tasklet_function(unsigned long data)
{
struct uart_port *port = (struct uart_port *) data;
unsigned char c;
while(pull_character(&rx_buffer, &c) == 0)
{
    /*insert char inside the serial_core with the appropriate flag*/
    uart_insert_char(port, 0, 0, c, TTY_NORMAL);/*the flag is always TTY_NORMAL as we don’t implement any error checking*/
    /*statistics*/
    port->icount.rx++;
} 
/*when done with insertion, send data to tty layer*/
tty_flip_buffer_push(&port->state->port);
}

Close_fct:

Lastly, we implement the close function to properly free the used resources: free IRQ + free GPIOs + kill any scheduled tasklet.

int uart_exit(void)
{
/*free IRQ*/
free_irq(gpio_to_irq(gpio_rx), NULL);
/*free gpio*/
gpio_free(gpio_tx);
gpio_free(gpio_rx);
/*kill rx tasklet*/
tasklet_kill(&rx_tasklet);
return 0;
}

At the end of the implementation we can test our code by registering it directly as a simple module (not tty) and verify the correctness of the output using a logic analyzer or simply connecting the GPIO pins 17 & 27 to the ports of a usb/serial adapter. Feel free to test this part yourself.

For me, I think we are done with this first part, long part, I will implement the top half of the driver in the part2 of this write-up. Also, for the testing part, we will see a real test of the driver in a Serial Line Internet Protocol (SLIP), all of that until the driver is complete.

See you in the next part.

A.L

References

Raspberry Pi uarts : link

Timing Errors in Serial Communication: link

Linux serial driver : link

Chapter 18. TTY Drivers : link