86 posts from 2026
decided to take a break and document properly some workflows and stuff that i've been working on lately:
newsonapaper - daily printed out newspaper from RSS feeds (this is more challenging than you think, mainly arranging feeds and images dynamically and responsively on a paper is HARDD)
rssitall - fetch via html & playwright content from a page (for the previous proj), was going to go the way of using an LLM but the playwright fallback is suprisingly competent
scroll - uses a receipt printer to print out RSS feeds as they come in (summaries), or just print crosswords/game on demand, can also be linked to gmail to auto print mail as they come in, not much use but just fun, supports webhooks too
time4LLMs - read a bunch of research on giving temporal senses to LLMs, implemented some entropy based backtracking (if entropy too high, then we delete the last few tokens generated, apply a bias, and the continue from some point), adding pondering |PAUSE| tokens, allowing models themselves to erase tokens&backtract, KV-cache aging (recent context dominates, old details fade unless rementioned, model experiences "time passing")
my science fair project that i have not started on, idea is for a 3d printer that print it's own buildplate, or lowk i could just do an non planar 3d printing algorithim, i think that might easier to do with the limited time i have left
playing with Moiré/Barrier Grid animations, trying to make a 0-9 digit counter that works by using a series of masks that when rotated reveal a digit (and making a customizable generator that allows you to display X distinct images
i also want to send codex xhigh on yabai and see if it can fix the mac issue where theres no way for AX or window managers to distinguish tabs vs windows in a mac os native tab app
and then we have the stuff that i've been working on for what seems like forever at this point:
Excalidash, a whiteboard for excalidraw - just finished adding sharing permissions and multiuser support, next steps are focusing on performance, making storing images more efficient, migrating from SQLite -> PostgreS+Redis, blah blah blah
Modelr - the easy to use image -> 3d workflow for beginners. the app is just a mess, i think it's better now, but i haven't had much time to play with it now. the DX on it is horrible since i've neglected to do a lot of DX work. it could be ready for a beta now, i've been using it for a lot of stuff and it's been great.
my website/domain -> my web blog is still not open source and is a blatant copy of simon willison's which i never seem to attribute properly. i want to setup coolify but i think thats overcomplicated, i dont deploy via GUI anyways. but i do need to create a skill for codex to deploy my websites when i have ideas that i need to ship. for static sites i use vercel and for things that need a backend i tell codex on my server to setup the caddy/authentik if needed whenver.
ouroboros -> a mac os audio routing alternative to loopback, not much work on it since i havent had the need for awhile
a mac audio recorder app with transcription and exporting so i can close my mac (while ofc forcing it not to sleep) and record lectures and then summarize them
a mac app that does better than my raycast extension doorstopper at preventing my mac from going to sleep/lock when my lid is closed
picoAgent -> a picoCTF solver/assistant workflow that auto sets up and scaffolds environments for you (e.g. downloads, extracts disk images/zips, scaffolds request code) to make it easy for you to get started on picoCTF problems. working a full agentic version that will solve problems agenticaly in parallel and handle errors nicely by sending alerts to you, but i dont have time for this and is not in spirit of the competition im guessing, plus doing problems is just fun
running an LLM on a calculator was a project i did in like two days when i had school off and never picked it back up. its basically done i just need to document a lot of the stuff and maybe make a video or post about it. theres some cool serial communication capabilities that you could do with it like peer to peer chatting. but everyone on r/TiNSPIRE is just asking "is there a single .tns file i can use to use chatgpt on my calculator no glue no broax"
there's this guy in my reddit DMs asking if i ever got SAM3D working (i've had it done since December but i really dont like SAM that much, Hunyuan 2.1/2mini is much better and runs natively on AS) but i've been putting off on responding to him idk why i will after i post this and another guy in my DMs that I forgot what he was asking
oh and Writeahead, an open source alternative to Cotypist. Writeahead is predictive AI autocomplete that (should) work everywhere you type. it used to be my benchmark for how well LLMs could reverse engineer complex private APIs and handle flakiness across apps and it seems Codex 5.3 high has mostly solved it, though not completely, but across most text fields it completes fine. just never had the time to finish the actual, text completion part yk...
also have some ideas that i havent touched yet:
- making qr codes scannable in barcode format
- irl noise cancellation environment, or like a more localized env where you shout into a hollow open tube and theres a speaker on the other side that cancels it out
- better eye tracking for the web, or just an eye tracking app for documents to see where people look or where you subconsciously focus on the most
- a little yubikey nano format device that plugs into the usb port and has an LED ring around it and functions also as a security key but also has a microphone so it can record stuff while my mac's lid is closed and can indicate status of whatever's running on my mac or battery as its charging with lid closed?? just sounds cool
- program that does PCB art generation where u use SMD to make artwork by using different colored components, could even be tactile
- laser toolhead attachment where you can go over a 3d printed surface with lasers and fuse the layers better or just have a smoother surface????? theres a lot of non planar work i wanna do i just dont have time for all this
I have built a insane amount of stuff in the past three days, most still incomplete, but the dopamine rush is unreal. it feels like a good measurement of the advancement of technology has always been how much it increases the rate at which we can deliver dopamine to end users.
mb promised release sunday uhm hopefully can get it out today
Thought it was network issues, spacing issues and font differences that were causing the blank white lines but it turns out the default was set to epocs transport and not cups-png???
drawing and crosshatching both have their advantages. but drawing is much more consistent across many more images.
definitely needed xhigh for this
giving up on crosshatching, too inconsistent and not detailed. i think i will do pencil drawing style now, downsizing to 600 dots resolution will cut computation time by a lot hopefully, right now averaging 5+ seconds per image
getting some banding issues, either A. i broke the printer or B. im still hitting thermal limits, judging by the banding at the bottom i fear it might be A....
also spent an INSANE amount of time trying to figure out why it would suddenly drop connection when printing images. originally thought it was that the processor wasnt powerful enough to handle pure bitmaps or that the images exceeded memory or something so i had to find the driver for this printer to get CUPS/PDF/PNG support instead of just RAW and then rasterizetopos working and even tried different cables and suspected it was an power issue... but no, it dropped connections midway because if you try to activate all of its heaters/lasers it would self shutdown.... wonderful
you can just build things...faster???
was struggling with applying crosshatch to images (for a random side project where my thermal printer would be overloaded when printing a lot of black areas, so i needed to lighten them up), and just had spark you know, give it a shot, did it in less than the time it took for me to upload an image to a website and see how the website did it. insane.
picoCTF 2023 was something else. Unforgotten Bits is a truly evil problem.
haha
for reference i solved picoMini (13 questions vs 41 for picoCTF) in 4hrs with no AI assistance. would have ranked 4th globally by time solved but they rank by absolute time and not relative diff between starting competition and solving (i did not start right when problems dropped). i did get faster at solving easy problems but I think the majority of that 4x speedup is attributed to codex.
ran it on a few more problems, and holy crap its insane, and yes, i checked, it didnt just search up writeups online (explicitly told it that it was forbidden, and i think that was enough). there were a few problems that it abandoned (5) out of 39 total problems given.
ran the harness autonomously on pico bank, a problem that took me ~2hrs to solve, with the first 30-1hr just setting up an android virtual machine and a packet inspector on it. codex did static analysis and got the flag in less than 20 minutes, incredible. (agentic tools in general, not sure if it is a codex vs claude etc distinction)
working alongside codex 5.3 xhigh in a custom harness with specific CTF skills, managed to solve the entirety of picoCTF 2025 in 6 hours, including time needed to write the harness and skills (beating first place record of nearly 12 hours with a team of five (humans). writeup coming after the upcoming comp.
a short outline: treat codex as a research intern, got it started on the hardest problems at the start, ask it to brainstorm some possible triaging methods. in the meantime i cleared all easy/half of medium in ~2 hours. then, asked it for some help on the tougher medium questions, finished them in another 45 minutes. The last 1-2 hours were focused on the hardest hard problems (secure email service and ricochet). Codex needed serious guidance in ricochet, did not understand the crypto at all (alongside bad spatial understanding, could not navigate the maze). for these harder ones, treated it as mainly a code writer (tell it to implement algorithms) since it can type much much faster than me.
i think picoCTF 2026 will have much more visual based reasoning and interactives (similar to ricochet) to prevent people just blatantly using agentic tools to solve all these problems. can comfortably say that an commonly avaliable model like GLM 5 or G3F can 100% solve all the easy/low-medium questions (as long as it isn't used in antigravity).
if we wanted to be boring and not having any human in the loop, am confident it could solve the entire thing in a (slightly) modified harness in less than 2.5 hrs, it would have solved the easy problems wayy faster than me. give it playwright (or even, just manually feed the problems in), and game over. or even better, give it the easy problems and the human manually works on the hard problems from the start <- this might be the best strategy.
PCB gen for art? interesting idea
PairVPN is magical and amazing software. Shame that it's not OSS and you have to coordinate with someone else's server to negotiate and connect your devices
slight improvement!
really impressed with CODEX plus use limits. having a hard time exhausting the 5 hour limit (2-3 sessions running for 6+ hours today continously). the weekly limit though...
hosted version of excalidash when?? might consider it! convex backend should be good enough. definitely need to create the pointer-image system
codex is so much better than opus its not even close
can confirm, 5.3 still sucks at UI compared to base gemini 3 even with frontend design skills
really liking the codex app, it's a bit laggy, but still very useful, i think i prefer it over CLI tools now...ironic
also done on the convex migration (1hr 45min). also looks promising
5.2 high managed to get the PRs done! let's see how good it is
i have to say i still much prefer CLI tools over GUI/IDEs for agentic workflows? i totally believe that GUIs will be the future for harnesses like CC and Codex, but something about just being able to open a new tab or sigkill the current instance and just be in a terminal just feels like i have more control. theirs nothing i can do in a IDE/GUI than hitting stop, it just feels helpless (yes i can open a terminal, but it doesnt feel the same)
letting it run overnight, lets see if it can close all my PRs haha
(also testing out the codex app)
something you would never see Opus saying
5.2 high migrating everything to convex 🤩 (its been 1.5hrs so far)
i think it will work first try
really proud of the settings page preview animation that shows a sloppy stylization of the drawn path, really cool
just had 5.2 high code up a openclaw alternative.
i just want it to:
- respond to my signal messages
- keep a memory file (might be use implementing semantic search or vector based search)
- heartbeat, i.e. agent can set a alarm for it to wake up, default 30minutes wake up and do stuff
- shell access
- ability to edit itself/it's own config (temp migration of the agent onto another place, and then the agent is free to edit it's config/harness, and then agent gets transferred back to its own harness)
- browser use
will test it out tomorrow!
well, we got there. not bad, only 20 minutes left
been using 5.2 high for a day
1. the CLI is SOOO much better than CC and Gemini its not even close. super fast and responsive. does lack a lot of features like ctrl+o to inspect, etc, etc.
2. as expected, reasons A LOT more than gemini or opus models, average session is ~2-3min for a quick fix and easily 30+ minutes for multi stage changes, not a quick model. having it refactor an entire website from firebase to convex right now, running non stop for nearly an hour now.
3. with 3-4 sessions I'm not hitting my hourly limit at all? i'm sure i could hit it if I tried, but surprising as this is just the $20 tier.
4. context comopression is nicely done and doesnt leave me waiting for 2 minutes while opus tries to compact its tiny context
so far, everything it changed at least builds and is solid so far, definitely prefer over opus.
just bought GPT $20/mo plan (free first month). the 5hr limits are surprisingly fine, already 30% done with the weekly cap. don't have access to 5.3 on the $20/mo.
after using 4.6 for a bit, it feels like opus is finally catching up to GPT level reasoning, definitely is a long reasoning model. never had opus sustain a session for over 20+ minutes before.
restarted my server and forgot to update my bind mount so my auth was down all day 😦
quite dissapointed with openclaw. does not live up to the hype. one of the buggiest UI's I've ever used and BY FAR the most confusing setup process of any app or program. everything is vibe coded to oblivion and it doesnt seem like any work was made to atleast review the code. it would be easier and better to just code your own assistant and give it the exact tools it wants. might just go and do that.
might have just been the model. switched to gemini 3 flash ~120TPS vs ~30TPS for Opus. feels much better
like what??...
just installed openclaw. it has to be one of the WORST put together tool I have every used (worse than antigravity-tools, which is actually somewhat usefull). too many configuration options. are we all installing the same openclaw?? no way im getting any work done in this interface. chat lags out half the time, maybe it'll get better
Critical server infrastructure went down bc I unplugged the charging cable and exhausted the UPS (the battery)
had a creative name :)
this is hilarious. my charger is only 86W, but my laptop can draw 120W+ no problem. so running training for an extended period means that it slowly drains the laptop as it's plugged in, starting off the day with 8% isn't great
pretty spot on
I wager 80% word accuracy and 95% character accuracy by the time I wake up (in 6hrs) (~16min/epoch)
seriously, claude??
epoch 1/50 btw. was going to train on colab but realized i'm faster than an A100 training locally even with CPU loss calculations. this is going to take ~10ish hours, just so I can predict words by drawing strokes 🥲
genuinely wtf how does it know??
🙂finding ways to improve the algorithim, but already it is quite unbelievable (blind swiping, algo does not see keyboard, only stroke shape, has to match using DTW to guessed words)
sentence mode implemented!
added a WPM counter for physical and actual speed b/c why not. next step is to add support for pausing and continuing to type setnences
i'm addicted to this
one of the coolest things i've built in a while
swipetype.zimengxiong.com
had to rewrite it in rust b/c python was too slow! also so we can compile to WASM and serve as an static HTML site. only downside is you have to download ~4MB of word lists on startup, so I can prob host the word list on github or something so I dont incur additoinal server costs.
using dynamic time warping, we can compare the path taken to the optimal path for a word and generate the predicted word list! soo cool
gemini conductor is criminally underrated, nothing comes close to the quality of output you get when the agent asks you for clarifications (and then listens to them)
what if we had swipe typing but for computers?? /conductor:setup is getting tired of me
let's see if CC+conductor can one shot this
thanks claude! for switching my "static" site to SSR so that the client has to 1/ wait for the backend to fetch data from convex 2/ get data from the backend then finally 3/ fetch data again from convex by itself to populate the site...
adding onto why agent orchestrators are dumb, this is probably the best article i've read on using agents by peter steinberger: steipete.me/posts/just-talk-to-it. the idea is to stop trying to abstract things away thinking if you give the agent more instructions, more tools, more capabilities it will magically get better. i like to talk to discuss back and forth with claude like an actual person before asking it to change any code.
agent orchestrators are a dumb idea, especially those that move you to a webUI and abstract everything away. if all you do is build very simple websites and whatnot sure, it might work for you. but if your actually doing challenging things where you want to be involved in, orchestrators provide terrible insight to agentic actions and dont allow you to intervene easily, because that's their job as abstractors. until models get better, i'll still be in the terminal or IDE alongside claude.
every day that I'm on reddit.com/r/macapps I'm surprised how oblivious people are and how much market there is for crappy tools for which there are MUCH better OS alternatives, never thought the technical barrier for installing apps off github or setting up a docker server would be so high.
i can't be calling it easy b/c if it was easy everyone would do, much respect for somehow finding out about this product market fit and get people to pay for this?? for an Tauri app with a PYTHON backend charging $25 for a PDF CONVERTOR and IMAGE CONVERSION TOOLS??
PDF and image conversion has been built into macOS since macOS 10 that WAS RELEASED in 2001), and there are much better more mature alternatives like Stirling PDF
if you want experience build something small really really well, or build something very novel scrappily (people are willing to overlook friction if your thing does something nothing else does), but PLEASE don't build something extremely common poorly, thats just sloppy.
i dont really see the use in clawdbot. what could i possibly want it to do? email? i get a notification every time one comes in and if its not important i just dont respond to it. having it scroll twitter or reddit?? whats the fun in that, scrolling is the fun part. main thing i can think of is using it as a claude code or agentic coding orchestrator, might give it a shot sometime.
really tempted to just drop yabai for a week and see how that goes. do i really need a window manager? i've been starting to use tabs in ghostty more and more thanks to claude code (one tab for Claude to use, one for myself). might go back to rectangle and see how that goes. will miss a lot of things. actually, i could just disable window managing but still keep space management and moving windows around?? as if every window was always toggled float.
every time i see someone w/o a window manager manually dragging their windows around i feel bad for them, i have not dragged a window in years
arxiv.org/pdf/2510.15061
this could be done much more effectively with diffusion models i think. just like how image models are trained by removing noise from a picture of just noise, we can train a model to identify what to remove from generated slop compared to a not so sloppy model to remove slop from text.
always been surprising that LoRAs haven't gone mainstream for LLMs like it did for image gen models. I guess it make sense b/c there are just too many different model architectures out there for a standard LoRA, unlike image gen where there is just mainly SD1.5 finetunes. might try to build a framework for training an tone/voice styling LoRA to mimic speech style.
to do this, we need instruction/response pairs. its much harder to get instructions than responses, so we can use a small model to generate instructions for a given snippet of text, i.e. training pair: (instruction, raw_text), and then train on a base model or something...
arxiv.org/pdf/2402.04401v1
arxiv.org/pdf/2407.18078v1
arxiv.org/pdf/2410.12757v1
arxiv.org/pdf/2402.01618
what language you prefer is starting to matter less than what language models are good at. you can ship so much faster by just choosing a framework that the models know better than one you know better. convex is gold for this because it's schema and config lives within your project and frontend, literally deployed a full event registration website in under an hour with oauth, asynchronous payment handling, admin tools, waitlist and pricing/discount engine using convex+react. insane times we live in.
there is a lack of harnesses and tools for models to use wrt 3D Modeling and PCB Design frankly b/c models aren't strong enough for them yet. 3D work and PCB design requires HUGE amounts of spatial understanding, and right now, the model with the best spatial reasoning (leagues ahead) fails to even call functions half the time correctly
arxiv.org/pdf/2412.07825
one of the major downsides with niche benchmarks is that they get outdated so fast and become meaningless (unless you decide to run it yourself) - and the authors ofc don't have the time/resources to constantly keep their benchmarks up to date
trying to get g3p to design a YubiKey 5C Nano clone. not even close. not sure if a skill would help here or we just need more powerful models
sonnet 4.5 is one of the laziest models i've ever seen, rivaled only by gemini 3 flash. tried to move my entire database to some other random deployment it found in the projects directory.
somehow still passed USACO despite their servers going down. my last two submissions were graded ~15 minutes AFTER contest end. one 100% passed the other TLE'd on 50% of the them, estimated ~833/1000
USACO needs better servers. Cannot access or submit any problems at all. Website does not load. Even when I could, grading was taking 10-15 minutes per submission. Not passing this one 😦
last day of Claude Max sub (expiring midnight today)... time to give ChatGPT Pro a shot? not sure how Opus 4.5 compares to 5.2 high/xhigh. ChatGPT Pro might be worth it just for access to 5.2 Pro though. Will consider options after usaco tomorrow, have to get up earlyish.
my linear algebra class often makes us name theorem numbers by theorem - i often need a fast way to search semantically 70+ PDFs and look up theorem numbers and proofs, might put claude onto this and see how it does, make a quick website
have yet to find an LLM that can accurately replicate style and voice given a large corpus of text. why is noone making the equivalent of voice cloning for TTS models for LLMs? seems obvious. tried gemini 3 pro, opus 4.5 and kimi 2.5 thinking. I would say kimi was the closest, but still far from passable.
based on some searches, it seems Trutone is doing this, but they are still on a waitlist basis.
i hate how github doesnt send you any emails for new issues created on your repositories. i understand for large repos they only want assigned memebers tagged, but i feel the default behavior for all repos (most on Github are small) should be to send emails on issue creation. that aside, let's see if i can close all 7 issues + 5 PRs (excluding 2 dependabot) today
it is totally possible to get addicted to programming. past week i have not done any schoolwork before 11PM, just coding. as a result i have too many assignments due tomorrow, including expiring extensions that i asked for so i could setup my sever 😦
14h39m on Ghostty today alone...
someone should make a screen time tracker for the terminal and see how long you spend on certain commands, not sure how you would hook into it though? might be cool to explore
just setup backup for the blog and feed to export convex data and commit to github 🙂. if only...if only.
opus made light work of integrating convex. now both the frontend and CMS interface with convex directly using the userQuery hooks so that changes in convex are reflected *immediately* in the frontend, including post creation. absolutely sick and saves compute time.
one bug that i inadvertently found out is that unpublished posts are still accessible via their slugs. it's fine i guess, likelihood of guessing a slug is low and so are the stakes.
love not having to wait for vercel to build after each git push or bandwidth limits on convex for serving media
thinking about the new CMS system. i jokingly teased github's handling of images in markdown in my blog post about it. but I think that is the optimal solution. i really dont care where my files are and this is the closest to containing them within a markdown file as encoding them in b64. i think i will implement such a solution - uploading them to convex file storage and just replacing it with an URL in the editor.
nothing i can do about the blog getting deleted, i have all the images so I guess i will try to recreate the blog posts retroactively?? but thats a weekend project.
just migrated the blog (blog.zimengxiong.com) to use a local convex instance instead of serving a static site from vercel. this means no need to rebuild the site when i make edits to the content since it fetches from convex each time. live at blogv2.zimengxiong.com for now. working on a better cms that uses authentik for auth instead of its own github oauth implementation that blog.zimengxiong.com/admin uses.
already seeing performance improvements from faster image loading. would love to do this on the hosted SaaS convex but their bandwidth and storage limits are too low (0.5/1GB) respectively for multimedia.
wanted to give gemini 3 pro another chance, asked it to do some edits on a simple convex project. it somehow managed to deploy new schema (overwriting) a COMPLETELY UNRELATED convex project convex-feed.zimengxiong.com (sound familiar??) 4+ years of posts are just gone... completely on me for not having backups for stuff like this, but yeah, back to Opus 4.5 it is
will see what I can do to fix this mess
if you are not using --dangerously-skip-permissions your wasting untapped potential. letting agents run in the background is the entire point of them. if you are worried setup a sandbox with Mac's sandbox-exec or some form of seatbelt, most linux systems have one.
Porting WeDLM to MLX
Tencent's WeDLM paper reports 3x speedup over vLLM using diffusion-style parallel decoding—multiple tokens generated per forward pass while maintaining KV cache compatibility. This post documents an independent port of WeDLM to Apple's MLX framework, covering the architecture decisions, performance optimizations, and implementation details from a week of development on an M4 Max.
[... 6,963 words]