Introduction
In the first part, I stopped at the point where the RK3128 box could boot and run well enough to be useful as a small Linux machine. That was already good progress, but it still felt like a half-finished project. An old TV box that can only boot an old vendor kernel is interesting for a weekend; an old TV box that can run a newer kernel and a real application is much more fun.
This time I want to push it a bit further by upgrading the kernel to 6.6.89 and seeing what the box is actually capable of. The practical goal is simple: make it useful for something more decent than just dropping into a shell and feeling satisfied that it still works. I do not expect miracles from an RK3128, but if it can run something like NanoClaw or PicoClaw reliably, then it is already much more than e-waste sitting in a drawer.
Like the first part, this is mainly a technical note for myself, not a installation guide. However, you may find it useful if you want to install or rebuild the kernel yourself.
TL;DR:
- Kernel release here: https://github.com/chieunhatnang-personal/linux-kernel-6.6-rk3128-tvbox/releases
- Armbian 26.2 image release here: https://github.com/chieunhatnang-personal/RK3128-Linux-SupportingScripts/releases/tag/kernel-6.6-armbian-26-v1.0
- How to install - the same as 4.4: https://chieunhatnang.de/p/building-armbian-for-rockchip-rk3128/#installation
Building kernel 6.6.89
Like with 4.4.194, I started from Rockchip’s vendor kernel tree: https://github.com/rockchip-linux/kernel/tree/develop-6.6. I did not start from mainline, because for this class of old TV box I wanted the path with fewer surprises. Rockchip had already carried a lot of SoC and board support there, so it was a better base for getting the box working first and cleaning things up later. Even RK322x, which is a bit newer and has had more community work around it, still ends up carrying both current and legacy kernel lines in Armbian instead of having one clean answer for everything. That was enough warning for me that RK3128 would probably be easier on Rockchip’s patched tree first.
My build environment was:
Ubuntu 20.04Python 3- C compiler:
gcc-arm-10.3-2021.07-x86_64-arm-none-linux-gnueabihf
I took that toolchain from YuzukiHD’s bundle here: https://github.com/YuzukiHD/sunxi-bsp-toolchains/releases. The important part I want to remember is the compiler version itself: GCC 10.3. Old vendor kernel trees are still mostly a C portability problem, so when something suddenly stops building, the compiler version is one of the first things worth checking.
I also wrote a small build script so I would not have to remember the same build flags every time: https://github.com/chieunhatnang-personal/RK3128-Linux-SupportingScripts/blob/master/Kernel/6.6.89/build_kernel.sh
The following parts describe how I made the individual devices and subsystems work on kernel 6.6.89. Compared with the old 4.4.194 work, this was mostly porting known fixes forward, so it was still tedious, but much more straightforward than the first time. Some devices like eMMC, OTG, SD card even work like a charm on the first time I ported them.
CPU
CPU DVFS on 6.6.89 was easier than Wi-Fi, but it was still worth writing down because the failure mode was misleading. The board could boot and even feel mostly fine, then randomly hang while switching between light load and heavier work. At first glance that looked like “some cpufreq governor problem” again. It was not.
The first useful split was to stop testing only in dynamic mode. I wanted to know whether the board was unstable at the top frequency itself, or only during transitions. The quickest commands that mattered were:
|
|
For static testing, the only mode I really needed to save was “pin everything at the highest OPP and see if it survives”:
|
|
That test was important because it showed the top end itself was not the problem. The board could sit at 1200 MHz and stay stable. The crashes happened when the kernel was allowed to step the CPU down again. So the real bug was not “RK3128 cannot do 1.2 GHz on mainline”. The real bug was that some of the lower OPPs were still too aggressive for RK3128.
The next useful thing to check was whether the kernel was already telling me that the OPP table and regulator did not agree:
|
|
That led back to the same lesson I had already learned on the old 4.4 tree: RK3128 is similar enough to RK322x to tempt me into copying DVFS data, but not similar enough that I can trust the lower voltages blindly. The default rk3128.dtsi table in this 6.6 tree still carried leakage-based variants and lower voltage points that were fine on paper for some chips, but not stable enough on this board.
So for rk3128-linux.dts, I stopped trying to be clever and made the board-specific table explicit:
- keep the CPU rail on the PWM regulator
- remove leakage-based voltage binning
- use fixed RK3128-safe voltages for every OPP
- drop the awkward
696 MHzpoint
The DTS changes that mattered were:
|
|
The CPU regulator itself also had to match that table. The useful part of the regulator node was simply that it stayed in a sane RK3128 range instead of the looser old vendor assumptions:
|
|
After that, dynamic mode finally behaved like it should. The board could idle low, climb under load, and return without the random freezes I had been seeing before. For quick dynamic testing, these were enough:
|
|
One small practical trap was still userspace. Just like on the old system, cpufrequtils could override whatever I thought the kernel default was. That is worth checking before blaming the DTS again:
|
|
For this board, the useful boot-time setting stayed simple:
|
|
So the conclusion I want to keep for myself is:
- use static
1200 MHzfirst to prove whether the top OPP is actually stable - if static mode is fine but dynamic mode hangs, suspect the lower OPP voltages first
- on RK3128, fixed voltages were more reliable than leakage-binned ones
- the stable
6.6.89CPU table for this board is216 / 408 / 600 / 816 / 1008 / 1200 MHzwith the raised voltages above - once that table is correct, both dynamic and fixed-frequency modes are stable enough to stop thinking about CPU and move on to the more annoying parts
RAM
RAM tuning on 6.6.89 ended up being less straightforward than CPU DVFS. On the old 4.4.194 tree, I already had a set of fixed DDR overlays, so my first instinct was to carry the same idea forward: add more fixed frequencies and pick one per board. That turned out to be only half true.
The first thing worth remembering is that DMC must actually be enabled. If it is disabled in the live DT, the board simply reports 0 MHz RAM and there is no devfreq device to inspect. The minimal useful DTS change on this board was just:
|
|
With only that change, 6.6.89 already behaved much better than I first thought. I originally assumed the kernel would need a fixed dram_freq or a board-specific overlay just to get sensible RAM stepping, but that was wrong. On RK3128, simply enabling dmc is enough to let the firmware and driver settle on a working set of frequencies. The same kernel and the same DTB can still end up at different normal rates on different boards. On one board I saw 396 MHz, while on another one the board stayed happily at 456 MHz.
These were the commands that mattered most while checking the actual runtime behavior:
|
|
The useful part of trans_stat was not just the current frequency, but whether the board actually transitioned and where it stayed. On my stable board, 456 MHz turned out to be the important point. With the wrong voltage table, the board could boot and even look fine for a short time, then hang completely a few minutes later. Raising the DMC logic voltage made that behavior go away.
One confusing point during this work was the meaning of the frequencies in the DTS. They are controller clock values, not the marketing DDR names. So:
396 MHzis roughlyDDR3-792, effectively theDDR3-800class456 MHzis roughlyDDR3-912528 MHzis roughlyDDR3-1056, effectively theDDR3-1066class
This also explains why the table does not match the usual DDR3 names exactly. The clock list is a Rockchip DMC operating table, not a JEDEC product list.
After checking the RK3128 datasheet more carefully, I decided to stop pretending the SoC should run arbitrarily high DDR clocks just because RK322x had more entries in its vendor tree. RK3128 officially supports up to DDR3/DDR3L-1066, which means about 533 MHz controller clock. So for this 6.6.89 tree I dropped all the experimental points above 528 MHz.
Compared with the old 4.4 setup, I also removed a few frequencies that were not worth keeping:
400 MHzwas too close to396 MHzto justify another fixed overlay600/666/700/786/800 MHzwere removed because they are above RK3128’s documented memory spec
The resulting RK3128 DMC table is much smaller and easier to reason about:
300 MHz330 MHz396 MHz456 MHz528 MHz
For boards that cannot use DMC at all, I still keep a separate rk3128-dmc-disabled overlay. For boards that are stable with DMC enabled, the base DTS now does the right thing more often than not, and the fixed overlays are only there when I want to force a specific rate for testing.
The final voltage table I kept for the useful higher points was:
456 MHz->1.20 V528 MHz->1.25 V
I also raised 300 MHz to 1.05 V, because some boards rejected the old 1.025 V OPP against the logic regulator floor.
So the practical conclusion for myself is:
- do not force
dram_freqin the base RK3128 DTS unless I am testing something very specific - let firmware pick the workable rate once
dmcis enabled - keep only the DDR points that are both useful and inside the RK3128 datasheet range
- if a board hangs after enabling DMC, check the voltage table before assuming the frequency itself is impossible
GPU & VPU
GPU
I spent quite a bit of time here because “the box boots a GUI” and “the box has a usable media stack” are not the same thing at all on RK3128.
For the GPU, I decided very early that I did not want to revive the old vendor mali400 userspace stack. Rockchip’s old Mali path belongs to the same generation of BSP code as the old Android kernels: it expects a matching kernel driver, a matching binary userspace blob, and usually a matching X11/EGL setup as well. That can still be made to work on the old 4.4 world, but on 6.6.89 it is exactly the kind of dependency trap I wanted to avoid.
Lima is much less glamorous, but it matches the goal of this kernel much better:
- it is already in Mesa
- it works with the DRM stack that kernel
6.6already expects - it does not depend on a vendor userspace blob
- it is the path that Debian and Armbian can actually carry forward
So in practice, I chose the maintainable answer over the theoretically more “official” one. For this box, that meant Lima instead of chasing the legacy mali400 blob.
The runtime checks worth remembering were:
|
|
The useful part was not the benchmark number itself, but confirming that the renderer was really hardware-accelerated:
|
|
My quick glmark2 sanity check on the build scene gave:
|
|
That is not a “fast GPU” result. It only means the GPU path is alive and not falling back to software rendering. For an RK3128, that is already the important answer. The box can do basic X11 acceleration and light GLES work, but it is still an old Mali-400 class GPU. It is fine for a simple desktop or utility UI. It is not a serious modern media or browser machine.
RKMPP
The first VPU path I tried was Rockchip’s own MPP route, because on paper it should be the most natural fit for a Rockchip box. The problem is that the usable part of RKMPP is not just a kernel toggle. It also needs the matching userspace stack.
If I ever want to revisit it, the software stack I would start from is:
libmpp/ Rockchip MPP userspace- GStreamer with the Rockchip plugin, especially
mppvideodec - optionally a player built on top of that, for example
gst-play-1.0
The command worth saving for checking the plugin side is:
|
|
And the most direct playback path to test is not VLC, but a plain GStreamer pipeline:
|
|
This is the important note I want to keep for myself: if I ever try RKMPP again, I should start from GStreamer first, not from VLC. On this box and this userspace mix, VLC was not a useful diagnostic tool.
In my case, I stopped the RKMPP path because it was not stable enough to justify more time. The GStreamer Rockchip MPP plugin was crashing or hanging, and once that happens the whole stack becomes difficult to reason about: I no longer know whether the problem is in kernel media support, the plugin, the display sink, or the old Rockchip assumptions still embedded in the userspace pieces.
So my practical conclusion for RKMPP on this project was:
- it is probably the better path in theory for old Rockchip media blocks
- it needs more Rockchip-specific userspace than I wanted to carry
- on this
6.6.89setup, I did not get it to a stable playback state
Hantro / V4L2
Since RKMPP was not getting anywhere, I tried the newer stateless V4L2 path with Hantro. This was more interesting technically, because it is much closer to the Linux media stack I actually want to use on a newer kernel. Unfortunately, it was also much more fragile than I expected on RK3128.
The basic kernel config that mattered was:
|
|
That alone was not enough. On this board I also needed:
- a local GRF quirk in
drivers/media/platform/verisilicon/hantro_drv.c - matching fields in
drivers/media/platform/verisilicon/hantro.h - an overlay to describe the RK3128 VPU block in a way the Hantro driver would accept
The overlay I kept is rk3128-v4l2-hantro.dts. The important idea was not just “enable Hantro”, but also get the old vendor video blocks out of the way and describe the codec block explicitly. The final overlay also needed reset lines and a small GRF initialization hook.
The useful part to remember is not the normal kernel build itself, but the boot-time wiring after the DTBs are generated. The files that mattered were:
rk3128-linux.dtbrk3128-v4l2-hantro.dtbork3128-cma-64m.dtbo
and in /boot/armbianEnv.txt:
|
|
The cma-64m part matters. For normal headless use, I keep the base DTB lean and leave CMA at 16 MiB. But for any real video decode test, 16 MiB is too small. I ended up splitting this on purpose:
- base
rk3128-linux.dtb: lean, for server use rk3128-cma-64m.dtbo: only enable when testing video/display paths
Once the board booted with the Hantro overlay, the most useful checks were:
|
|
For decode itself, this was the cleanest test:
|
|
And for visible playback, this was the best path I found:
|
|
The important result was subtle:
- the board exposed
/dev/video0and/dev/video1 v4l2slh264decappeared normally in GStreamer- decode IRQ counters increased, so the VPU was really doing work
So the Hantro path was not fake. The decoder actually ran. But the overall playback result was still disappointing:
- the board originally hung during boot until I disabled
vpu_mmu - visible playback on X11 was still fragile
gst-play-1.0andVLCwere not reliable indicators for this path- decode could work while presentation still looked stuck
So Hantro got further than RKMPP in one specific sense: I could prove that real decode jobs reached the VPU. But it still did not become a clean “install player, open movie, done” result.
GPU & VPU conclusion
For GPU, Lima was clearly the right choice. It is maintainable, already in Mesa, and good enough for the kind of light desktop use this box can realistically handle.
For VPU, I tried both directions I could think of:
RKMPP, which should fit old Rockchip hardware better in theoryHantro, which fits the newer Linux media model better
RKMPP wanted too much vendor-specific userspace and still did not behave well. Hantro was technically more promising, but even after getting real decode activity, the visible playback path remained rough and board-specific.
So the short conclusion I want to keep for myself is this: even after trying the obvious GPU/VPU routes, video processing on RK3128 is still limited. The board is useful as a small Linux machine or headless server. It is not a box I would choose for a polished modern video playback setup.
Wifi
ESP8089
ESP8089 was the first Wi-Fi chip I wanted to bring back on kernel 6.6.89, mostly because it was already working in my 4.4.194 tree. I did not want to start from a random upstream version again. The working driver source in my old tree already contained the fixes I had collected while making the 4.4 kernel work, so the cleanest starting point was to copy that exact driver into the new kernel and then make the surrounding 6.6 platform code behave like the old Rockchip tree.
The overlay was also important. This board family can use different Wi-Fi chips, so I did not put the ESP8089 wiring into the base rk3128-linux.dts. I kept it in arch/arm/boot/dts/rockchip/overlay/rk3128-wlan-esp8089.dts instead.
The important properties are:
|
|
There were two details here that were easy to miss:
wifi_sdio_host = <&sdio>;is needed because the Rockchip RFKILL glue has to know which MMC host belongs to the Wi-Fi chip.rockchip,no-dmaengine;is only useful if thedw_mmcdriver really honors it before enabling IDMAC or another DMA engine.
The first real runtime problem was card detect and re-enumeration. ESP8089 does not behave like a simple non-removable SDIO card that appears once and stays there. During the power-up and firmware loading sequence, the chip can disappear and reappear. The old Rockchip 4.4 code had board-specific glue for this, but the 6.6 tree did not yet have the equivalent behavior.
The fix I kept was in net/rfkill/rfkill-wlan.c: parse the wifi_sdio_host phandle, resolve it to the real mmc_host, and make rockchip_wifi_set_carddetect() actually rescan or detach the SDIO host. The important debug strings to search for were:
|
|
The second runtime problem was DMA. The overlay already had rockchip,no-dmaengine, but the 6.6 dw_mmc path still enabled DMA before this property had any effect. The symptom was that the chip could enumerate, but firmware download became unstable or timed out. The fix was to make drivers/mmc/host/dw_mmc.c skip DMA setup immediately when the device has that property:
|
|
This was the missing part. After that, mmc1 stayed in PIO mode and the ESP8089 firmware download became stable.
There was one more problem after the interface looked alive: large packets were not stable. This was easy to miss because small pings, DHCP, and simple TCP connects could work. The board associated to the AP, got an address, and even looked normal in iw, but SSH from another machine could hang during key exchange:
|
|
The useful test was not just “can I ping the board”, but “can I pass full-size packets in both directions”. With the original driver behavior, small packets like 512 bytes worked, while larger packets around 1000 to 1472 bytes were very lossy or failed completely:
|
|
I first kept the machine usable with a low-MTU workaround:
|
|
That made SSH and Docker usable again, but it was not a driver fix. It only avoided the frame sizes that exposed the bug.
The better compromise was to keep 802.11n enabled, but stop using the riskier parts of the old ESP8089 HT setup on this RK3128 SDIO path. The driver changes I kept were:
- keep
disable_ht=0by default, so Wi-Fi N is still advertised - keep TX and RX
AMPDUdisabled by default - only set mac80211
AMPDU_AGGREGATIONsupport when at least one AMPDU direction is enabled - advertise conservative
HT20long-GI capability instead of the old0x116Ccapability mask - do not advertise short GI, STBC, HT40, or large A-MSDU related capability bits
- advertise
IEEE80211_HT_MAX_AMPDU_8Kinstead of16K - add an
ht_mcs_maskmodule parameter so I can test subsets ofMCS0-7without editing code - keep
disable_ht=1as an emergency fallback if a board behaves worse
The files that mattered were:
|
|
I also fixed a host-side SDIO safety issue in sdio_sif_esp.c: the temporary DMA bounce buffer used for unaligned SDIO transfers was only 2048 bytes. That was too small for a full SIP transaction, so I changed it to SIP_PKT_MAX_LEN and made oversized bounced transfers fail with -EMSGSIZE instead of copying past the temporary buffer. I do not know if that was the only cause of the large-packet problem, but it was clearly unsafe and worth fixing.
After rebuilding and booting the new kernel, I restored normal MTU before testing:
|
|
The module parameters showed the intended state:
|
|
|
|
And iw confirmed that Wi-Fi N was still active:
|
|
Example result:
|
|
The AP side later showed both directions at 65.0 MBit/s MCS 7:
|
|
|
|
The important result was not the headline link speed. It was that full-size packets became stable again at normal MTU:
|
|
This means the chip is still using 802.11n, but in the conservative single-stream HT20 long-GI mode. That tops out at 65 MBit/s for MCS7. It is lower than the 72.2 MBit/s short-GI rate I had seen before, and much lower than the marketing 300 MBit/s people associate with two-stream HT40 Wi-Fi N. For this board, 65 MBit/s with full-size packets working is the practical win.
At the end, wlan0 appeared automatically, associated with my Wi-Fi network, and the board was reachable on both LAN and Wi-Fi. The useful lesson was that the 4.4 driver source was not the hard part this time. The real porting work was restoring the Rockchip platform behavior around it: RFKILL-controlled SDIO re-enumeration, a correct wifi_sdio_host link in the overlay, and forcing the RK3128 SDIO host to PIO by making rockchip,no-dmaengine actually work in dw_mmc.
SSV6051
SSV6051 was much harder than ESP8089. The annoying part was not getting the chip to enumerate. It was that every stage looked half-correct:
- SDIO card detect worked
- the module probed
wlan0appeared- sometimes authentication even worked
but the driver still blew up later in ways that looked unrelated.
The first bad sign came very early, before association was even reliable:
|
|
That turned out to be a mac80211 integration problem, not a real memory shortage. The old driver assumptions were stale enough that the 6.6 mac80211 path did not like them. After cleaning up the build glue and adding wake_tx_queue, probe became normal.
The next failure mode was SDIO register access:
|
|
At first this produced full kernel warning backtraces because the driver wrapped these transient failures in WARN_ON(). That made the logs noisy but did not explain the cause. Two fixes were useful here:
- keep the SDIO host on
PIOwithrockchip,no-dmaengine - align the tiny SDIO register I/O buffers in
hwif/sdio/sdio.c
The second one was surprisingly important. The old code used small stack buffers for register reads and writes. On this board, that was one of the places where the 6.6 port was less tolerant than the old 4.4 environment.
I kept these commands around during this stage:
|
|
After the SDIO changes, the driver stopped exploding immediately, but the remaining failure was more confusing. The board could associate, then wpa_supplicant or NetworkManager would start a scan or reconnect path, and the driver would fall back into the same -110 storm.
The clues that mattered were:
|
|
From the logs, I found that the old driver was doing three things that were no longer safe on 6.6:
- rewriting the old
11bCCA registers during software scan - rewriting the basic rate table in
BSS_CHANGED_BASIC_RATES - allowing mac80211 off-channel retunes while already associated
The first two were inherited from the old vendor driver and made sense in the original environment, but here they destabilized the chip during scan and reconnect. The third one was more subtle: mac80211 on 6.6 was much more willing to do off-channel work than the old stack, and SSV6051 did not like being paused and retuned in the middle of that path.
So the real runtime fixes were:
- do not touch the legacy
11bCCA registers insw_scan_start() - skip the old
BSS_CHANGED_BASIC_RATESrewrite entirely - skip off-channel channel switches while already associated
That got the driver much closer, but it still was not really fixed. The most misleading stage was when association looked successful:
|
|
and then, a few seconds later:
|
|
That was the important turning point. At that point the problem was no longer ordinary scan handling. The connection was already up. The link died right after PTK/GTK install. So the next suspect was the hardware crypto path.
To confirm that, I watched both kernel and supplicant logs around the failure:
|
|
The pattern was very consistent:
- association succeeded
- PTK/GTK install completed
- beacon loss happened almost immediately
- key teardown started
- then the SDIO
-110storm began
The practical fix was to stop using the legacy hardware security path and fall back to software crypto in ssv6051-wifi.cfg:
|
|
Then reload and verify:
|
|
This was the first version that actually behaved like a working driver instead of a half-working one. The link stayed up after key install, held idle for more than a minute, and passed traffic:
|
|
So the useful lesson from SSV6051 was not “copy the driver and it works”. It was:
PIOis still required on the SDIO host- the legacy scan-time PHY tweaks are not safe on
6.6 - the old basic-rate and off-channel behavior had to be tamed for newer mac80211
- the legacy hardware crypto path was the last unstable piece, and software crypto was the practical fix
I split the final patchset like this:
|
|
Realtek 8189E/F
The Realtek RTL8189ES and RTL8189FS drivers were much less painful than SSV6051. I first copied the old 4.4 vendor drivers into the 6.6 tree just to get a baseline build, but that was only useful as a temporary checkpoint. The better result came from replacing both of them with the maintained trees from jwrdegoede:
rtl8189es: https://github.com/jwrdegoede/rtl8189ES_linuxrtl8189fs: https://github.com/jwrdegoede/rtl8189ES_linux/tree/rtl8189fs
After dropping those into my rockchip_wlan tree, the in-tree build fixes were fairly ordinary: Kconfig cleanup, include path fixes, and a few adjustments for the kernel 6.6 build environment. The only non-obvious runtime problem was that rtl8189fs created an extra virtual interface such as wlan1, which was not useful on this box and only made the logs noisier.
The part worth remembering was forcing the driver to build as a single-interface device:
|
|
Then the checks I kept using were:
|
|
That was enough to confirm that the extra interface was gone and only wlan0 remained.
Once that was fixed, RTL8189FS worked well enough without much drama. The board associated normally, traffic was stable, and I was seeing roughly 25 Mbps, which is already more than good enough for this old RK3128 box. I also disabled the very noisy Realtek debug logging afterwards, but that was cleanup rather than a functional fix. The useful lesson here was simply that for kernel 6.6, the maintained jwrdegoede trees were a much better base than trying to carry the old 4.4 vendor port forward.
RKNAND improvement
The rknand work was one of those parts where the system looked usable at first, but not trustworthy enough to boot from NAND for real. The board could boot from USB rescue mode, /proc/rknand did not show a rapidly growing bad block count, and my simple rknand_test.sh checks all came back good. But after putting the system on rknand_root, installing a few normal packages such as git and ntfs-3g, and rebooting, the filesystem would sometimes come back asking for fsck.
That was the wrong kind of failure: not an obvious “NAND is dead” failure, but a quiet reliability problem. The FTL counters looked mostly calm:
|
|
The first improvement I tried earlier was a bad_nand mode, but in practice it did not explain the corruption. The more useful direction was to stop guessing and make the block driver more honest about what the kernel was asking it to do.
The first real clue came when formatting rknand_root as ext4 failed. The kernel reported something like:
|
|
At first that looked like a failed write to sector 0. But /proc/rknand still did not show an FTL write/program failure. That mismatch mattered. If the FTL did not report a write error, then the “sector 0 write error” might be a block-layer error being reported in a misleading way.
So I added temporary-but-useful counters and request logging to drivers/rk_nand/rk_nand_blk.c:
- total read/write/flush/discard counters from the Linux block driver
- FTL read/write/discard error counters
- missing-device, bounds, access, and unsupported-operation counters
- ratelimited request logging with operation, flags, sector, length, partition offset, capacity, and FTL return code
After booting that instrumented kernel, the failure became clear:
|
|
The actual failing request was not a data write. It was a pure block flush request. On kernel 6.6, the block layer can send a flush request without req->part. My first 6.6 port had mapped requests back to the NAND device through req->part->bd_disk. That worked for normal read/write requests, but failed for pure flush requests. The driver returned BLK_STS_IOERR, and the block layer surfaced that as a confusing sector-0 write error.
The fix was to resolve the disk from req->q->disk when req->part is not available, and to handle REQ_OP_FLUSH before doing any sector math:
|
|
Then a pure flush becomes exactly what it should be: write back the Rockchip FTL cache and return success.
I also advertised writeback-cache support to the block layer:
|
|
That is important because filesystems need a real flush path if the FTL keeps data in its own cache. After this change, rknand_root showed:
|
|
|
|
One related trap was discard. The old driver advertised discard and passed REQ_OP_DISCARD into FtlDiscard(). I do not trust that path yet. The FTL discard behavior is old Rockchip code, and my corruption symptoms were already too close to “the FTL accepted something but did not preserve what I expected”. So for now I intentionally hide discard from filesystems:
|
|
With that, ext4 may print:
|
|
That warning is acceptable. It is much better than silently exercising an FTL discard path I have not validated. I also changed unexpected discard requests to return BLK_STS_NOTSUPP instead of passing them into the FTL anyway.
The commands that stayed useful during this work were:
|
|
After the fix, the quick checks looked like this:
|
|
The important parts there are write back for the cache mode and 0 for both discard limits. That means the kernel knows the device needs flushes, while filesystems should not issue discard to this driver.
The driver-side counters also started giving useful boot-session information:
|
|
And the boot log became boring in the right way. The filesystem can still mention discard, because my root mount options still had discard, but there was no longer a fake sector-0 flush failure:
|
|
For formatting tests:
|
|
After the flush fix, I kept hardening the driver in small, separate changes. The important ideas were:
- pure flush requests should only call
rk_ftl_cache_write_back()once, not once forREQ_PREFLUSHand again forREQ_OP_FLUSH - the defensive bounds check should validate the full request length, not only the current segment length
del_gendisk()needsdisk->queueto remain valid on kernel6.6, so the driver must not setgd->queue = NULLbefore calling itkthread_run()for the GC thread must be checked, otherwise the driver can expose disks without the background FTL maintenance thread/proc/rknandshould not write directly intoseq_fileinternals; the FTL dump is now staged into a temporary buffer and emitted withseq_write()
One small note about /proc/rknand: it now mixes two different kinds of counters. The lines from the Rockchip FTL dump still use the old typo:
|
|
Those values come from rknand_proc_ftlread() inside the FTL blob. They look like lifetime or FTL-maintained accumulated stats. The driver-side counters I added are separate and start from boot:
|
|
That distinction is useful. The FTL values tell me about long-term NAND/FTL history. The driver values tell me what this Linux boot has submitted through the block layer.
After a few days of normal use, this version looks much better. I am not calling the NAND story finished forever, but the failure mode that made ext4 formatting and normal package installation unreliable is fixed. The next thing I would still be careful about is discard. Until I have a dedicated discard stress test and a way to prove the old FTL path is safe, keeping discard disabled is the right tradeoff for this box.
Building Armbian image
Like with Armbian 22.02, I again reused an RK322x image instead of building a full Armbian release from scratch. The goal was the same as before: keep the userspace that already works, then replace the board-specific pieces one by one. The archived RK322x images I used as references were here:
- https://k-space.ee.armbian.com/archive/rk322x-box/archive/
- https://mirror.yandex.ru/mirrors/armbian/archive/rk322x-box/archive/
The base image I picked was Armbian_24.2.5_Rk322x-box_bookworm_current_6.6.22_minimal.img.xz.
That image already contains its own RK322x U-Boot in the first 4 MiB. I did not want that, because this board still needs the RK3128-specific boot chain (Loader, parameter, U-Boot, Trust) described earlier. So I treated the Armbian image only as a root filesystem donor and stripped it down to a filesystem image.
I wrote a small helper script for that workflow here: https://github.com/chieunhatnang-personal/RK3128-Linux-SupportingScripts/blob/master/Armbian/24/rk3128-armbian-cooking.sh
|
|
The c mode mounts the image, strips the first 4 MiB if needed, copies qemu-arm-static, and drops me into a chroot. Everything below in this section is done inside that chroot unless noted otherwise.
Removing the RK322x kernel and updating the base system
Because I reused the RK322x image as the starting point, the system still identified itself as rk322x-box and still carried the RK322x kernel packages. For the 6.6.89 setup, that was the first thing to clean up, otherwise a later apt upgrade could easily reinstall the wrong kernel or boot packages.
The quick check I kept using was:
|
|
The packages I wanted to get rid of were:
linux-u-boot-rk322x-box-currentlinux-image-current-rockchiplinux-dtb-current-rockchiparmbian-bsp-cli-rk322x-box-current
Before removing the BSP package, I downloaded a copy of it so I could rebuild it later under the RK3128 name:
|
|
To make sure the stock RK322x kernel packages would never come back during upgrades, I pinned them:
|
|
After that, I upgraded the base system and trimmed the APT metadata a bit so the image would stay smaller:
|
|
Installing the RK3128 kernel packages
For this kernel tree, I built Debian packages directly from my helper script:
|
|
At the time of this build, the useful outputs were:
linux-libc-dev_6.6.89-rk3128-47_armhf.deblinux-image-6.6.89-rk3128+_6.6.89-rk3128-47_armhf.deblinux-headers-6.6.89-rk3128+_6.6.89-rk3128-47_armhf.deb
After copying those three .deb files into /root inside the image, I installed them with:
|
|
Repacking the BSP package for RK3128
The remaining RK322x-specific package was armbian-bsp-cli-rk322x-box-current. That package is small, but it matters because it owns the first-login experience, board identity, and part of the boot configuration. I did not want the image to keep pretending it was an RK322x box, so I rebuilt that package as armbian-bsp-cli-rk3128-box-current.
First, unpack it:
|
|
Then I changed the package identity and replaced the RK322x-specific payload pieces. The files that mattered were:
DEBIAN/controlDEBIAN/preinstDEBIAN/postinstDEBIAN/postrmDEBIAN/changelogrootfs/etc/armbian-releaserootfs/etc/armbian.txtrootfs/usr/share/armbian/boot.cmdrootfs/usr/share/armbian/armbianEnv.txt
The payload changes worth remembering were:
|
|
The important identity values became:
|
|
I also changed rootfs/usr/lib/armbian/armbian-hardware-optimization so BOOT_SOC=rk3128 goes through its own optimization path instead of reusing the RK322x IRQ-affinity tuning. That old tuning was specific enough that I did not want it to run on RK3128 by accident.
After changing the payload, I rebuilt md5sums, restored the control files into the package tree, and built the new .deb:
|
|
I also renamed the documentation directory to rootfs/usr/share/doc/armbian-bsp-cli-rk3128-box-current and removed the original copied RK322x .deb from the package payload, just to keep the package metadata sane.
Other tweaks
The other cleanup I wanted to keep was forcing the image back to a predictable English UTF-8 locale:
|
|
Finalizing the rootfs image
After leaving the chroot shell, I just unmounted the image and shrank it again:
|
|
The result is a plain filesystem image, not a full disk image. That is intentional. It gets written to a partition, while the RK3128 boot pieces still get written separately as described earlier.
Conclusion
At this point, the board is running a quite decent 6.6.89 kernel together with a much more up-to-date Armbian userspace. That alone already makes it far more useful than the earlier “it boots, so I guess it works” stage.
More importantly, the system is not just newer on paper. After fixing the board-specific pieces one by one, it is also quite stable in normal use. For an RK3128 TV box this old, that is already a good result.
Resources
I published all of my work on github:
- Kernel sources forked from Rockchip with my modification to work on RK3128: https://github.com/chieunhatnang-personal/linux-kernel-6.6-rk3128-tvbox/
- Supporting scripts like building script, making boot SD card: https://github.com/chieunhatnang-personal/RK3128-Linux-SupportingScripts