Interrupt management

Requesting an out-of-band IRQ

Dovetail introduces the new interrupt type flag IRQF_OOB, denoting an out-of-band handler to the generic interrupt API routines:

  • setup_irq() for early registration of special interrupts
  • request_irq() for device interrupts
  • __request_percpu_irq() for per-CPU interrupts

An IRQ action handler bearing this flag runs on the out-of-band stage, regardless of the current interrupt state of the in-band stage. If no out-of-band stage is present, the flag will be ignored, with the interrupt handler running on the in-band stage as usual.

Conversely, out-of-band handlers are dismissed using the usual calls, such as:

  • free_irq() for device interrupts
  • free_percpu_irq() for per-CPU interrupts

Out-of-band IRQ handling has the following constraints:

  • If the IRQ is shared, with multiple action handlers registered for the same event, all other handlers on the same interrupt channel must bear the IRQF_OOB flag too, or the request will fail.

If meeting real-time requirements is your goal, sharing an IRQ line among multiple devices operating from different execution stages (in-band vs out-of-band) can only be a bad idea design-wise. You should resort to this in desperate hardware situations only.

  • Obviously, out-of-band handlers cannot be threaded (IRQF_NO_THREAD is implicit, IRQF_ONESHOT is ignored).

Installing an out-of-band handler for a device interrupt

#include <linux/interrupt.h>

static irqreturn_t oob_interrupt_handler(int irq, void *dev_id)
{
	...
	return IRQ_HANDLED;
}

init __init driver_init_routine(void)
{
	int ret;

	...
	ret = request_irq(DEVICE_IRQ, oob_interrupt_handler,
			  IRQF_OOB, "Out-of-band device IRQ",
			  device_data);
	if (ret)
		goto fail;

	return 0;
fail:
	/* Unwind upon error. */
	...
}

Notifying the companion core about IRQ entry/exit

Your companion core will most likely want to be notified each time a new interrupt context is entered, typically in order to block any further task rescheduling on its end. Conversely, this core will also want to be notified when such context is exited, so that it can start its rescheduling procedure, applying any change to the scheduler state which occurred during the execution of the interrupt handler(s), such as waking up a thread which was waiting for the incoming event.

To provide such support, Dovetail calls irq_enter_pipeline() on entry to the pipeline when it receives an IRQ from the hardware, then irq_exit_pipeline() right before it leaves the interrupt frame. It defines empty placeholders for these hooks as follows, which are picked in absence of a companion core in the kernel tree:

linux/include/dovetail/irq.h

/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _DOVETAIL_IRQ_H
#define _DOVETAIL_IRQ_H

/* Placeholders for pre- and post-IRQ handling. */

static inline void irq_enter_pipeline(void) { }

static inline void irq_exit_pipeline(void) { }

#endif /* !_DOVETAIL_IRQ_H */

As an illustration, the EVL core overrides these placeholders by interposing the following file which comes earlier in the inclusion order of C headers, providing its own set of hooks as follows:

linux-evl/include/asm-generic/evl/irq.h

/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_GENERIC_EVL_IRQ_H
#define _ASM_GENERIC_EVL_IRQ_H

#include <evl/irq.h>

static inline void irq_enter_pipeline(void)
{
#ifdef CONFIG_EVL
	evl_enter_irq();
#endif
}

static inline void irq_exit_pipeline(void)
{
#ifdef CONFIG_EVL
	evl_exit_irq();
#endif
}

#endif /* !_ASM_GENERIC_EVL_IRQ_H */

Switching dynamically between in-band / out-of-band delivery

void irq_switch_oob(unsigned int irq, bool on)

This routine turns on/off out-of-band delivery for the given IRQ, for which an action must set (i.e. requested). This call comes in handy when the IRQ was already requested without mentioning the IRQF_OOB flag. In such a case, there is still the option to switch the interrupt delivery stage manually by a call to irq_switch_oob.

  • irq

    The IRQ number to switch the delivery mode for.

  • oob

    A boolean indicating whether out-of-band delivery should be enabled.


  • Disabling/enabling interrupts in the CPU

    Since the regular local_irq_*() kernel API only controls interrupt disabling only for the in-band stage when interrupt pipelining is enabled, we need a replacement for the original implementation which actually flips the interrupt enable/disable flag in the CPU. When CONFIG_IRQ_PIPELINE is disabled, this set is mapped 1:1 onto the original local_irq_*() API.

    Original/Virtual Non-virtualized call
    local_save_flags(flags) flags = hard_local_save_flags()
    local_irq_disable() hard_local_irq_disable()
    local_irq_enable() hard_local_irq_enable()
    local_irq_save(flags) flags = hard_local_irq_save()
    local_irq_restore(flags) hard_local_irq_restore(flags)
    irqs_disabled() hard_irqs_disabled()
    irqs_disabled_flags(flags) hard_irqs_disabled_flags(flags)

    Stalling the out-of-band stage

    Just like the in-band stage is affected by the state of the virtual interrupt disable flag, the interrupt state of the out-of-band stage is controlled by a dedicated stall bit flag in the out-of-band stage status. In combination with the interrupt disable bit in the CPU, this software bit controls interrupt delivery to the out-of-band stage.

    When this stall bit is set, interrupts which might be pending in the event log of the out-of-band stage for a given CPU are not played. Conversely, the out-of-band handlers attached to pending IRQs are fired when the stall bit is clear(ed). The following table represents the equivalent calls affecting the stall bit for each stage:

    In-band stage Out-of-band stage
    local_save_flags(flags) flags = oob_irq_save()
    local_irq_disable() oob_irq_disable()
    local_irq_enable() oob_irq_enable()
    local_irq_save(flags) flags = oob_irq_save()
    local_irq_restore(flags) oob_irq_restore(flags)
    irqs_disabled() oob_irqs_disabled()
    irqs_disabled_flags(flags) -none-

    Sending out-of-band IPIs to remote CPUs

    The pipeline exposes two generic IPI vectors which autonomous cores may use in SMP configuration for signaling the following events across CPUs:

    • RESCHEDULE_OOB_IPI, the cross-CPU task reschedule request. This is available to the core’s scheduler for kicking the task rescheduling procedure on remote CPUs, when the state of their respective runqueue has changed. For instance, a task sleeping on CPU #1 may be unblocked by a system call issued from CPU #0: in this case, the scheduler code running on CPU #0 is supposed to tell CPU #1 that it should reschedule. Typically, the EVL core does so from its test_resched() routine.

    • TIMER_OOB_IPI, the cross-CPU timer reschedule request. Because software timers are in essence per-CPU beasts, this IPI is available to the core’s timer management code for kicking the hardware timer programming procedure on remote CPUs, when the state of some software timer has changed. Typically, stopping a timer from a remote CPU, or migrating a timer from a CPU to another should trigger such signal. The EVL core does so from its evl_program_remote_tick() routine, which is called whenever the timer with the earliest timeout date enqueued on a remote CPU, may have changed.

    In addition, the pipeline core defines CALL_FUNCTION_OOB_IPI for its own use, in order to implement the smp_call_function_oob() routine. The latter is semantically equivalent to the regular smp_call_function_single() routine, except that its runs the callback on the out-of-band stage.

    As their respective name suggests, those three IPIs can be sent from out-of-band context (as well as in-band), by calling the irq_send_oob_ipi() service.


    void irq_send_oob_ipi(unsigned int ipi, const struct cpumask *cpumask)

  • ipi

    The IPI number to send. There are only three legit values for this argument: either RESCHEDULE_OOB_IPI, TIMER_OOB_IPI or CALL_FUNCTION_OOB_IPI. This is a low-level service with not much parameter checking, so any other value is likely to cause havoc.

  • cpumask

    A CPU bitmask specifying the target CPU(s) which should receive the IPI. The current CPU is silently excluded from this mask, so the calling CPU cannot send an IPI to itself using this call.

  • In order to receive these IPIs, an out-of-band handler must have been set for them, mentioning the [IRQF_OOB flag]({{ < relref “#request-oob-irq” >}}).

    irq_send_oob_ipi() serializes callers internally so that it may be used from either stages: in-band or out-of-band.


    Injecting an IRQ event for the current CPU

    In some very specific cases, we may need to inject an IRQ into the pipeline by software as if such hardware event had happened on the current CPU. irq_inject_pipeline() does exactly this.


    int irq_inject_pipeline(unsigned int irq)

  • irq

    The IRQ number to inject. A valid interrupt descriptor must exist for this interrupt.

  • irq_inject_pipeline() fully emulates the receipt of a hardware event, which means that the common interrupt pipelining logic applies to the new event:

    • first, any out-of-band handler is considered for delivery,

    • then such event may be passed down the pipeline to the common in-band handler(s) in absence of out-of-band handler(s).

    The pipeline priority rules apply accordingly:

    • if the caller is in-band, and an out-of-band handler is registered for the IRQ event, and the out-of-band stage is unstalled, the execution stage is immediately switched to out-of-band for running the later, then restored to in-band before irq_inject_pipeline() returns.

    • if the caller is out-of-band and there is no out-of-band handler, the IRQ event is deferred until the in-band stage resumes execution on the current CPU, at which point it is delivered to any in-band handler(s).

    • in any case, should the current stage receive the IRQ event, the virtual interrupt state of that stage is always considered before deciding whether this event should be delivered immediately to its handler by irq_inject_pipeline() (unstalled case), or deferred until the stage is unstalled (stalled case).

    This call returns zero on successful injection, or -EINVAL if the IRQ has no valid descriptor.

    If you look for a way to schedule the execution of a routine in the in-band interrupt context from the out-of-band stage, you may want to consider the extended irq_work API which provides a high level interface to this feature.


    Direct logging of an IRQ event

    Sometimes, running the full interrupt delivery logic irq_inject_pipeline() implements for feeding an interrupt into the pipeline may be overkill when we may make assumptions about the current execution context, and which stage should handle the event. The following fast helpers can be used instead in this case:


    void irq_post_inband(unsigned int irq)

  • irq

    The IRQ number to inject into the in-band stage. A valid interrupt descriptor must exist for this interrupt.

  • This routine may be used to mark an interrupt as pending directly into the current CPU’s log for the in-band stage. This is useful in either of these cases:

    • you know that the out-of-band stage is current, therefore this event has to be deferred until the in-band stage resumes on the current CPU later on. This means that you can simply post it to the in-band stage directly.

    • you know that the in-band stage is current but stalled, therefore this event can’t be immediately delivered, so marking it as pending into the in-band stage is enough.

    Interrupts must be hard disabled in the CPU before calling this routine.


    void irq_post_oob(unsigned int irq)

  • irq

    The IRQ number to inject into the out-of-band stage. A valid interrupt descriptor must exist for this interrupt.

  • This routine may be used to mark an interrupt as pending directly into the current CPU’s log for the out-of-band stage. This is useful in only one situation: you know that the out-of-band stage is current but stalled, therefore this event can’t be immediately delivered, so marking it as pending into the out-of-band stage is enough.

    Interrupts must be hard disabled in the CPU before calling this routine. If the out-of-band stage is stalled as expected on entry to this helper, then interrupts must be hard disabled in the CPU as well anyway.


    Extended IRQ work API

    Due to the NMI-like nature of interrupts running out-of-band code from the standpoint of the main kernel, such code might preempt in-band activities in the middle of a critical section. For this reason, it would be unsafe to call any in-band routine from an out-of-band context.

    However, we may schedule execution of in-band work handlers from out-of-band code, using the regular irq_work_queue() and irq_work_queue_on() services which have been extended by the IRQ pipeline core. A work request is scheduled from the out-of-band stage for running on the in-band stage on the issuing/requested CPU as soon as the out-of-band activity quiesces on this processor. As its name implies, the work handler runs in (in-band) interrupt context.

    The interrupt pipeline always uses a synthetic IRQ as the notification signal for the IRQ work machinery, instead of an architecture-specific interrupt vector. This special IRQ is labeled in-band work when reported by /proc/interrupts. irq_work_queue() may invoke the work handler immediately only if called from the in-band stage with hard irqs on. In all other cases, the handler execution is deferred until the in-band log is synchronized.


    Synthetic IRQs

    The pipeline introduces an additional type of interrupts, which are purely software-originated, with no hardware involved. These IRQs can be triggered by any kernel code. A synthetic IRQ (aka SIRQ) is inherently a per-CPU event. Because the common pipeline flow applies to synthetic interrupts, it is possible to attach such interrupt to out-of-band and/or in-band handlers, just like device interrupts.

    A synthetic interrupt abide by the normal rules with respect to interrupt masking: such IRQ may be deferred until the stage it should be handled from is unstalled.

    Synthetic interrupts and softirqs differ in essence: the latter only exist in the in-band context, and therefore cannot trigger out-of-band activities. Synthetic interrupts used to be called virtual IRQs (or virq for short) by the legacy I-pipe implementation, Dovetail’s ancestor; such rename clears the confusion with the way abstract interrupt numbers defined within interrupt domains may be called elsewhere in the kernel code base (i.e. virtual interrupts too).

    Allocating a new synthetic interrupt

    Synthetic interrupt vectors are allocated from the synthetic_irq_domain, using the irq_create_direct_mapping() routine.

    A synthetic interrupt handler can be installed for running on the in-band stage upon a scheduling request (i.e. being posted) from an out-of-band context as follows:

    #include <linux/irq_pipeline.h>
    
    static irqreturn_t sirq_handler(int sirq, void *dev_id)
    {
    	do_in_band_work();
    
    	return IRQ_HANDLED;
    }
    
    static struct irqaction sirq_action = {
            .handler = sirq_handler,
            .name = "In-band synthetic interrupt",
            .flags = IRQF_NO_THREAD,
    };
    
    unsigned int alloc_sirq(void)
    {
    	unsigned int sirq;
    
    	sirq = irq_create_direct_mapping(synthetic_irq_domain);
    	if (!sirq)
    		return 0;
    	
    	setup_percpu_irq(sirq, &sirq_action);
    
    	return sirq;
    }
    

    A synthetic interrupt handler can be installed for running from the out-of-band stage upon a trigger from an in-band context as follows:

    static irqreturn_t sirq_oob_handler(int sirq, void *dev_id)
    {
    	do_out_of_band_work();
    
    	return IRQ_HANDLED;
    }
    
    unsigned int alloc_sirq(void)
    {
    	unsigned int sirq;
    
    	sirq  = irq_create_direct_mapping(synthetic_irq_domain);
    	if (!sirq)
    		return 0;
         
    	ret = __request_percpu_irq(sirq, sirq_oob_handler,
                                       IRQF_OOB,
                                       "Out-of-band synthetic interrupt",
                                       dev_id);
    	if (ret) {
            	irq_dispose_mapping(sirq);
    		return 0;
    	}
    
    	return sirq;
    }
    

    Scheduling a SIRQ from the in-band stage

    The execution of sirq_handler() in the in-band context can be scheduled (or posted) from the out-of-band context in two different ways:

    • using the common injection service:
    	irq_inject_pipeline(sirq);
    
    • using the lightweight injection method (requires interrupts to be disabled in the CPU):
    	unsigned long flags = hard_local_irqsave();
    	irq_post_inband(sirq);
    	hard_local_irqrestore(flags);
    

    Assuming that no interrupt may be pending in the event log for the out-of-band stage at the time this code runs, the second method relies on the invariant that in a pipeline interrupt model, IRQs pending for the in-band stage will have to wait for the out-of-band stage to quiesce before they can be handled. Therefore, it is pointless to check for synchronizing the interrupts pending for the in-band stage from the out-of-band stage, which the irq_inject_pipeline() service would do systematically. irq_post_inband() simply marks the event as pending in the event log of the in-band stage for the current CPU, then returns. This event would be played as a result of synchronizing the log automatically when the current CPU switches back to the in-band stage.

    It is also valid to post a synthetic interrupt to be handled on the in-band stage from an in-band context, using irq_inject_pipeline(). In such a case, the normal rules of interrupt delivery apply, depending on the state of the virtual interrupt disable flag for the in-band stage: the IRQ is immediately delivered, with the call to irq_inject_pipeline() returning only after the handler has run.

    Triggering a SIRQ from the out-of-band stage

    Conversely, the execution of sirq_handler() on the out-of-band stage can be triggered from the in-band context as follows:

    	irq_inject_pipeline(sirq);
    

    Since the out-of-band stage has precedence over the in-band stage for execution of any pending event, this IRQ is immediately delivered, with the call to irq_inject_pipeline() returning only after the handler has run.

    It is also valid to post a synthetic interrupt to be handled on the out-of-band stage from an out-of-band context, using irq_inject_pipeline(). In such a case, the normal rules of interrupt delivery apply, depending on the state of the virtual interrupt disable flag for the out-of-band stage.

    Calling irq_post_oob(sirq) from the in-band stage to trigger an out-of-band event is most often not the right way to do this, because this service would not synchronize the interrupt log before returning. In other words, the sirq event would still be pending for the out-of-band stage despite the fact that it should have preempted the in-band stage before returning to the caller.


    Last modified: Tue, 29 Oct 2024 14:58:31 +0100