media: chips-media: wave5: Add hrtimer based polling support

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Add support for starting a polling timer in case an interrupt is not
available. This helps to keep the VPU functional in SoCs such as AM62A,
where the hardware interrupt hookup may not be present due to an SoC errata
[1].

The timer is shared across all instances of encoders and decoders and is
started when the first instance of an encoder or decoder is opened and
stopped when the last instance is closed, thus avoiding per instance
polling and saving CPU bandwidth. As VPU driver manages this instance
related tracking and synchronization, the aforementioned shared timer
related polling logic is implemented within the VPU driver itself. This
scheme may also be useful in general too (even if irq is present) for
non-realtime multi-instance VPU use-cases (for e.g 32 instances of VPU
being run together) where system is running already under high interrupt
load and switching to polling may help mitigate this as the polling thread
is shared across all the VPU instances.

Hrtimer is chosen for polling here as it provides precise timing and
scheduling and the API seems better suited for periodic polling task such
as this. As a general rule of thumb,

Worst case latency with hrtimer = Actual latency (achievable with irq)
+ Polling interval

NOTE (the meaning of terms used above is as follows):
- Latency: Time taken to process one frame
- Actual Latency : Time taken by hardware to process one frame and signal
it to OS (i.e. if latency that was possible to achieve if irq line was
present)

There is a trade-off between latency and CPU usage when deciding the value
for polling interval. With aggressive polling intervals (i.e. going with
even lesser values) the CPU usage increases although worst case latencies
get better. On the contrary, with greater polling intervals worst case
latencies will increase although the CPU usage will decrease.

The 5ms offered a good balance between the two as we were able to reach
close to actual latencies (as achievable with irq) without incurring too
much of CPU as seen in below experiments and thus 5ms is chosen as default
polling interval.

- 1x 640x480@25 Encoding using different hrtimer polling intervals [2]
- 4x 1080p30 Transcode (File->decode->encode->file) irq vs polling
comparison [3]
- 1x 1080p Transcode (File->decode->encode->file) irq vs polling comparison
[4]
- 1080p60 Streaming use-case irq vs polling comparison [5]
- 1x 1080p30 sanity decode and encode tests [6]

The polling interval can also be changed using vpu_poll_interval module
param in case user want to change it as per their use-case requirement
keeping in mind above trade-off.

Parse the irq number and if not present, initialize the hrtimer and the
polling worker thread before proceeding with v4l2 device registrations.

Based on interrupt status, we use a worker thread to iterate over the
interrupt status for each instance and send completion event as being done
in irq thread function.

Move the core functionality of the irq thread function to a separate
function wave5_vpu_handle_irq so that it can be used by both the worker
thread when using polling mode and irq thread when using interrupt mode.

Protect the hrtimer access and instance list with device specific mutex
locks to avoid race conditions while different instances of encoder and
decoder are started together.

[1] https://www.ti.com/lit/pdf/spruj16
(Ref: Section 4.2.3.3 Resets, Interrupts, and Clocks)
[2] https://gist.github.com/devarsht/ee9664d3403d1212ef477a027b71896c
[3] https://gist.github.com/devarsht/3a58b4f201430dfc61697c7e224e74c2
[4] https://gist.github.com/devarsht/a6480f1f2cbdf8dd694d698309d81fb0
[5] https://gist.github.com/devarsht/44aaa4322454e85e01a8d65ac47c5edb
[6] https://gist.github.com/devarsht/2f956bcc6152dba728ce08cebdcebe1d

Signed-off-by: Devarsh Thakkar <devarsht@ti.com>
Tested-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>

authored by

Devarsh Thakkar and committed by

Hans Verkuil 2 years ago ed7276ed 4cece764

+126 -46

5 changed files

expand all

drivers

media

platform

chips-media

wave5

wave5-helper.c

wave5-vpu-dec.c

wave5-vpu-enc.c

wave5-vpu.c

wave5-vpuapi.h

+15 -2

drivers/media/platform/chips-media/wave5/wave5-helper.c

··· 52 52 char *name) 53 53 { 54 54 struct vpu_instance *inst = wave5_to_vpu_inst(filp->private_data); 55 + struct vpu_device *dev = inst->dev; 56 + int ret = 0; 55 57 56 58 v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx); 57 59 if (inst->state != VPU_INST_STATE_NONE) { 58 60 u32 fail_res; 59 - int ret; 60 61 61 62 ret = close_func(inst, &fail_res); 62 63 if (fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING) { ··· 72 71 } 73 72 74 73 wave5_cleanup_instance(inst); 74 + if (dev->irq < 0) { 75 + ret = mutex_lock_interruptible(&dev->dev_lock); 76 + if (ret) 77 + return ret; 75 78 76 - return 0; 79 + if (list_empty(&dev->instances)) { 80 + dev_dbg(dev->dev, "Disabling the hrtimer\n"); 81 + hrtimer_cancel(&dev->hrtimer); 82 + } 83 + 84 + mutex_unlock(&dev->dev_lock); 85 + } 86 + 87 + return ret; 77 88 } 78 89 79 90 int wave5_vpu_queue_init(void *priv, struct vb2_queue *src_vq, struct vb2_queue *dst_vq,

+12 -1

drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c

··· 1810 1810 v4l2_fh_add(&inst->v4l2_fh); 1811 1811 1812 1812 INIT_LIST_HEAD(&inst->list); 1813 - list_add_tail(&inst->list, &dev->instances); 1814 1813 1815 1814 inst->v4l2_m2m_dev = inst->dev->v4l2_m2m_dec_dev; 1816 1815 inst->v4l2_fh.m2m_ctx = ··· 1865 1866 } 1866 1867 1867 1868 wave5_vdi_allocate_sram(inst->dev); 1869 + 1870 + ret = mutex_lock_interruptible(&dev->dev_lock); 1871 + if (ret) 1872 + goto cleanup_inst; 1873 + 1874 + if (dev->irq < 0 && !hrtimer_active(&dev->hrtimer) && list_empty(&dev->instances)) 1875 + hrtimer_start(&dev->hrtimer, ns_to_ktime(dev->vpu_poll_interval * NSEC_PER_MSEC), 1876 + HRTIMER_MODE_REL_PINNED); 1877 + 1878 + list_add_tail(&inst->list, &dev->instances); 1879 + 1880 + mutex_unlock(&dev->dev_lock); 1868 1881 1869 1882 return 0; 1870 1883

+12 -1

drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c

··· 1554 1554 v4l2_fh_add(&inst->v4l2_fh); 1555 1555 1556 1556 INIT_LIST_HEAD(&inst->list); 1557 - list_add_tail(&inst->list, &dev->instances); 1558 1557 1559 1558 inst->v4l2_m2m_dev = inst->dev->v4l2_m2m_enc_dev; 1560 1559 inst->v4l2_fh.m2m_ctx = ··· 1727 1728 } 1728 1729 1729 1730 wave5_vdi_allocate_sram(inst->dev); 1731 + 1732 + ret = mutex_lock_interruptible(&dev->dev_lock); 1733 + if (ret) 1734 + goto cleanup_inst; 1735 + 1736 + if (dev->irq < 0 && !hrtimer_active(&dev->hrtimer) && list_empty(&dev->instances)) 1737 + hrtimer_start(&dev->hrtimer, ns_to_ktime(dev->vpu_poll_interval * NSEC_PER_MSEC), 1738 + HRTIMER_MODE_REL_PINNED); 1739 + 1740 + list_add_tail(&inst->list, &dev->instances); 1741 + 1742 + mutex_unlock(&dev->dev_lock); 1730 1743 1731 1744 return 0; 1732 1745

+83 -42

drivers/media/platform/chips-media/wave5/wave5-vpu.c

··· 26 26 const char *fw_name; 27 27 }; 28 28 29 + static int vpu_poll_interval = 5; 30 + module_param(vpu_poll_interval, int, 0644); 31 + 29 32 int wave5_vpu_wait_interrupt(struct vpu_instance *inst, unsigned int timeout) 30 33 { 31 34 int ret; ··· 43 40 return 0; 44 41 } 45 42 46 - static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id) 43 + static void wave5_vpu_handle_irq(void *dev_id) 47 44 { 48 45 u32 seq_done; 49 46 u32 cmd_done; ··· 51 48 struct vpu_instance *inst; 52 49 struct vpu_device *dev = dev_id; 53 50 54 - if (wave5_vdi_read_register(dev, W5_VPU_VPU_INT_STS)) { 55 - irq_reason = wave5_vdi_read_register(dev, W5_VPU_VINT_REASON); 56 - wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR, irq_reason); 57 - wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1); 51 + irq_reason = wave5_vdi_read_register(dev, W5_VPU_VINT_REASON); 52 + wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR, irq_reason); 53 + wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1); 58 54 59 - list_for_each_entry(inst, &dev->instances, list) { 60 - seq_done = wave5_vdi_read_register(dev, W5_RET_SEQ_DONE_INSTANCE_INFO); 61 - cmd_done = wave5_vdi_read_register(dev, W5_RET_QUEUE_CMD_DONE_INST); 55 + list_for_each_entry(inst, &dev->instances, list) { 56 + seq_done = wave5_vdi_read_register(dev, W5_RET_SEQ_DONE_INSTANCE_INFO); 57 + cmd_done = wave5_vdi_read_register(dev, W5_RET_QUEUE_CMD_DONE_INST); 62 58 63 - if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) || 64 - irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) { 65 - if (seq_done & BIT(inst->id)) { 66 - seq_done &= ~BIT(inst->id); 67 - wave5_vdi_write_register(dev, W5_RET_SEQ_DONE_INSTANCE_INFO, 68 - seq_done); 69 - complete(&inst->irq_done); 70 - } 59 + if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) || 60 + irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) { 61 + if (seq_done & BIT(inst->id)) { 62 + seq_done &= ~BIT(inst->id); 63 + wave5_vdi_write_register(dev, W5_RET_SEQ_DONE_INSTANCE_INFO, 64 + seq_done); 65 + complete(&inst->irq_done); 71 66 } 72 - 73 - if (irq_reason & BIT(INT_WAVE5_DEC_PIC) || 74 - irq_reason & BIT(INT_WAVE5_ENC_PIC)) { 75 - if (cmd_done & BIT(inst->id)) { 76 - cmd_done &= ~BIT(inst->id); 77 - wave5_vdi_write_register(dev, W5_RET_QUEUE_CMD_DONE_INST, 78 - cmd_done); 79 - inst->ops->finish_process(inst); 80 - } 81 - } 82 - 83 - wave5_vpu_clear_interrupt(inst, irq_reason); 84 67 } 68 + 69 + if (irq_reason & BIT(INT_WAVE5_DEC_PIC) || 70 + irq_reason & BIT(INT_WAVE5_ENC_PIC)) { 71 + if (cmd_done & BIT(inst->id)) { 72 + cmd_done &= ~BIT(inst->id); 73 + wave5_vdi_write_register(dev, W5_RET_QUEUE_CMD_DONE_INST, 74 + cmd_done); 75 + inst->ops->finish_process(inst); 76 + } 77 + } 78 + 79 + wave5_vpu_clear_interrupt(inst, irq_reason); 85 80 } 81 + } 82 + 83 + static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id) 84 + { 85 + struct vpu_device *dev = dev_id; 86 + 87 + if (wave5_vdi_read_register(dev, W5_VPU_VPU_INT_STS)) 88 + wave5_vpu_handle_irq(dev); 86 89 87 90 return IRQ_HANDLED; 91 + } 92 + 93 + static void wave5_vpu_irq_work_fn(struct kthread_work *work) 94 + { 95 + struct vpu_device *dev = container_of(work, struct vpu_device, work); 96 + 97 + if (wave5_vdi_read_register(dev, W5_VPU_VPU_INT_STS)) 98 + wave5_vpu_handle_irq(dev); 99 + } 100 + 101 + static enum hrtimer_restart wave5_vpu_timer_callback(struct hrtimer *timer) 102 + { 103 + struct vpu_device *dev = 104 + container_of(timer, struct vpu_device, hrtimer); 105 + 106 + kthread_queue_work(dev->worker, &dev->work); 107 + hrtimer_forward_now(timer, ns_to_ktime(vpu_poll_interval * NSEC_PER_MSEC)); 108 + 109 + return HRTIMER_RESTART; 88 110 } 89 111 90 112 static int wave5_vpu_load_firmware(struct device *dev, const char *fw_name, ··· 213 185 } 214 186 dev->product = wave5_vpu_get_product_id(dev); 215 187 188 + dev->irq = platform_get_irq(pdev, 0); 189 + if (dev->irq < 0) { 190 + dev_err(&pdev->dev, "failed to get irq resource, falling back to polling\n"); 191 + hrtimer_init(&dev->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); 192 + dev->hrtimer.function = &wave5_vpu_timer_callback; 193 + dev->worker = kthread_create_worker(0, "vpu_irq_thread"); 194 + if (IS_ERR(dev->worker)) { 195 + dev_err(&pdev->dev, "failed to create vpu irq worker\n"); 196 + ret = PTR_ERR(dev->worker); 197 + goto err_vdi_release; 198 + } 199 + dev->vpu_poll_interval = vpu_poll_interval; 200 + kthread_init_work(&dev->work, wave5_vpu_irq_work_fn); 201 + } else { 202 + ret = devm_request_threaded_irq(&pdev->dev, dev->irq, NULL, 203 + wave5_vpu_irq_thread, IRQF_ONESHOT, "vpu_irq", dev); 204 + if (ret) { 205 + dev_err(&pdev->dev, "Register interrupt handler, fail: %d\n", ret); 206 + goto err_enc_unreg; 207 + } 208 + } 209 + 216 210 INIT_LIST_HEAD(&dev->instances); 217 211 ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev); 218 212 if (ret) { ··· 255 205 dev_err(&pdev->dev, "wave5_vpu_enc_register_device, fail: %d\n", ret); 256 206 goto err_dec_unreg; 257 207 } 258 - } 259 - 260 - dev->irq = platform_get_irq(pdev, 0); 261 - if (dev->irq < 0) { 262 - dev_err(&pdev->dev, "failed to get irq resource\n"); 263 - ret = -ENXIO; 264 - goto err_enc_unreg; 265 - } 266 - 267 - ret = devm_request_threaded_irq(&pdev->dev, dev->irq, NULL, 268 - wave5_vpu_irq_thread, IRQF_ONESHOT, "vpu_irq", dev); 269 - if (ret) { 270 - dev_err(&pdev->dev, "Register interrupt handler, fail: %d\n", ret); 271 - goto err_enc_unreg; 272 208 } 273 209 274 210 ret = wave5_vpu_load_firmware(&pdev->dev, match_data->fw_name, &fw_revision); ··· 289 253 static void wave5_vpu_remove(struct platform_device *pdev) 290 254 { 291 255 struct vpu_device *dev = dev_get_drvdata(&pdev->dev); 256 + 257 + if (dev->irq < 0) { 258 + kthread_destroy_worker(dev->worker); 259 + hrtimer_cancel(&dev->hrtimer); 260 + } 292 261 293 262 mutex_destroy(&dev->dev_lock); 294 263 mutex_destroy(&dev->hw_lock);

drivers/media/platform/chips-media/wave5/wave5-vpuapi.h

··· 756 756 u32 product_code; 757 757 struct ida inst_ida; 758 758 struct clk_bulk_data *clks; 759 + struct hrtimer hrtimer; 760 + struct kthread_work work; 761 + struct kthread_worker *worker; 762 + int vpu_poll_interval; 759 763 int num_clks; 760 764 }; 761 765