Saturday, 31 January 2026
thanks claude! for switching my "static" site to SSR so that the client has to 1/ wait for the backend to fetch data from convex 2/ get data from the backend then finally 3/ fetch data again from convex by itself to populate the site...
adding onto why agent orchestrators are dumb, this is probably the best article i've read on using agents by peter steinberger: steipete.me/posts/just-talk-to-it. the idea is to stop trying to abstract things away thinking if you give the agent more instructions, more tools, more capabilities it will magically get better. i like to talk to discuss back and forth with claude like an actual person before asking it to change any code.
agent orchestrators are a dumb idea, especially those that move you to a webUI and abstract everything away. if all you do is build very simple websites and whatnot sure, it might work for you. but if your actually doing challenging things where you want to be involved in, orchestrators provide terrible insight to agentic actions and dont allow you to intervene easily, because that's their job as abstractors. until models get better, i'll still be in the terminal or IDE alongside claude.
every day that I'm on reddit.com/r/macapps I'm surprised how oblivious people are and how much market there is for crappy tools for which there are MUCH better OS alternatives, never thought the technical barrier for installing apps off github or setting up a docker server would be so high.
i can't be calling it easy b/c if it was easy everyone would do, much respect for somehow finding out about this product market fit and get people to pay for this?? for an Tauri app with a PYTHON backend charging $25 for a PDF CONVERTOR and IMAGE CONVERSION TOOLS??
PDF and image conversion has been built into macOS since macOS 10 that WAS RELEASED in 2001), and there are much better more mature alternatives like Stirling PDF
if you want experience build something small really really well, or build something very novel scrappily (people are willing to overlook friction if your thing does something nothing else does), but PLEASE don't build something extremely common poorly, thats just sloppy.
i dont really see the use in clawdbot. what could i possibly want it to do? email? i get a notification every time one comes in and if its not important i just dont respond to it. having it scroll twitter or reddit?? whats the fun in that, scrolling is the fun part. main thing i can think of is using it as a claude code or agentic coding orchestrator, might give it a shot sometime.
really tempted to just drop yabai for a week and see how that goes. do i really need a window manager? i've been starting to use tabs in ghostty more and more thanks to claude code (one tab for Claude to use, one for myself). might go back to rectangle and see how that goes. will miss a lot of things. actually, i could just disable window managing but still keep space management and moving windows around?? as if every window was always toggled float.
every time i see someone w/o a window manager manually dragging their windows around i feel bad for them, i have not dragged a window in years
arxiv.org/pdf/2510.15061
this could be done much more effectively with diffusion models i think. just like how image models are trained by removing noise from a picture of just noise, we can train a model to identify what to remove from generated slop compared to a not so sloppy model to remove slop from text.
always been surprising that LoRAs haven't gone mainstream for LLMs like it did for image gen models. I guess it make sense b/c there are just too many different model architectures out there for a standard LoRA, unlike image gen where there is just mainly SD1.5 finetunes. might try to build a framework for training an tone/voice styling LoRA to mimic speech style.
to do this, we need instruction/response pairs. its much harder to get instructions than responses, so we can use a small model to generate instructions for a given snippet of text, i.e. training pair: (instruction, raw_text), and then train on a base model or something...
arxiv.org/pdf/2402.04401v1
arxiv.org/pdf/2407.18078v1
arxiv.org/pdf/2410.12757v1
arxiv.org/pdf/2402.01618
what language you prefer is starting to matter less than what language models are good at. you can ship so much faster by just choosing a framework that the models know better than one you know better. convex is gold for this because it's schema and config lives within your project and frontend, literally deployed a full event registration website in under an hour with oauth, asynchronous payment handling, admin tools, waitlist and pricing/discount engine using convex+react. insane times we live in.
there is a lack of harnesses and tools for models to use wrt 3D Modeling and PCB Design frankly b/c models aren't strong enough for them yet. 3D work and PCB design requires HUGE amounts of spatial understanding, and right now, the model with the best spatial reasoning (leagues ahead) fails to even call functions half the time correctly
arxiv.org/pdf/2412.07825
one of the major downsides with niche benchmarks is that they get outdated so fast and become meaningless (unless you decide to run it yourself) - and the authors ofc don't have the time/resources to constantly keep their benchmarks up to date
trying to get g3p to design a YubiKey 5C Nano clone. not even close. not sure if a skill would help here or we just need more powerful models
sonnet 4.5 is one of the laziest models i've ever seen, rivaled only by gemini 3 flash. tried to move my entire database to some other random deployment it found in the projects directory.
somehow still passed USACO despite their servers going down. my last two submissions were graded ~15 minutes AFTER contest end. one 100% passed the other TLE'd on 50% of the them, estimated ~833/1000