Profile: karlicoss - Hacker Neue

karlicoss

Joined Aug 25, 2016 3,247 karma

Trapped in meatspace. Trying to break out.

I build things:

https://beepb00p.xyz https://github.com/karlicoss https://twitter.com/karlicoss

karlicoss May 22, 2025 parent

Seems like official export [0] has tags and annotations along with timestamps. However in case you'd like more structured full data from the API (instead of a mess with csv + json?), you can use my tool [1] to export it. Here's example of its output [2]
[0] https://getpocket.com/export
[1] https://github.com/karlicoss/pockexport?tab=readme-ov-file#s...
[2] https://github.com/karlicoss/pockexport/blob/master/example-...
karlicoss Apr 14, 2025 parent

datetime handling can absolutely be a hot spot, especially if you're parsing or formatting them. Even for relatively simple things like "parse a huge csv file with dates into dataclasses".
In particular, default implementation of datetime in cpython is a C module (with a fallback to pure python one) https://github.com/python/cpython/blob/main/Modules/_datetim...
Not saying it's necessarily justified in case of this library, but if they want to compete with stdlib datetime in terms of performance, some parts will need to be compiled.
23 points Jul 25, 2024

The Decline of Mobile Development

17 comments karlicoss donnfelker.com
karlicoss Jan 23, 2024 parent

I argued that point in my article some time ago https://beepb00p.xyz/configs-suck.html also HN discussion at the time news.ycombinator.com/item?id=22787332
207 points Dec 17, 2023

Fsearch, a fast file search utility for Unix-like systems

95 comments karlicoss github.com
karlicoss Oct 14, 2023 parent
I was annoyed by cron/fcron limitations and figured systemd is the way go because of its flexibility and power, but also was annoyed about manually managing tons of unit files. So I wrote a tool with a config that looks kinda like a crontab, but uses systemd (or launchd on mac) behind the scenes: https://github.com/karlicoss/dron#what-does-it-do
E.g. a simplest job definition looks like this
```
  job(every(mins=10), 'ping https://beepb00p.xyz', unit_name='ping-beepb00p')
```
But also it's possible to add more properties, e.g. arbirary systemd properties, or custom notification backends (e.g. telegram message or desktop notification)
Since it's python, I can reuse variables, use for loops, import jobs from other files (e.g. if there are shared jobs between machines), and check if it's valid with mypy.
Been using this for years now, and very happy, one of the most useful tools I wrote.
It's a bit undocumented and messy but if you are interested in trying it out, just let me know, I'm happy to help :)
karlicoss Oct 2, 2023 parent

Another thing I noticed is that homebrew python was noticeably slower on M2 comparing to the pyenv one. I imagine homebrew compiles it with too generic flags to support wide range of macs.
karlicoss Sep 12, 2023 parent

It sparks a discussion around knowledge management, that's kinda nice :)
karlicoss May 22, 2023 parent

Or you can use pipx, it deals with all the virtualenv business behind the scenes
karlicoss May 9, 2023 parent

FWIW, never happened to me with Synching, but either way it's best to separate sync software and backup software. Sync might work flawlessly, but the user might make a mistake which would quickly wipe the files across all devices. I use borg for backups, highly recommend
karlicoss May 9, 2023 parent

I sync about two hundred thousands of files without any problems, especially if your aren't changing them all the time. The only issue I can imagine is initial sync with so many files might take a while, even if total size isn't huge. As for several terabytes, for me it's a bit less than a terabyte in total for all synced folders, but also can't see why it wouldn't scale up. I guess similar, initial hashing might take some time, but otherwise it should handle it well.
karlicoss May 9, 2023 parent

not sure what happens if you commit on two devices before syncing, but the "worst" that happened to me is I get an index conflict, in which case it's easily fixed by 'git reset'
karlicoss Jan 12, 2023 parent

fuzzing?
karlicoss Dec 31, 2022 parent

I'm rooting to get access to my own data (typically in sqlite databases in protected /data/data partition). Then I feed it into HPI (Human Programming Interface) [1], and from that it gets into my plaintext search system [2] or promnesia [3]
[1] https://github.com/karlicoss/HPI#readme
[2] https://beepb00p.xyz/pkm-search.html#personal_information
[3] https://beepb00p.xyz/promnesia.html
karlicoss Dec 30, 2022 parent

First, big respect for working on software for so many years!
My question is what data format is it using? I found some examples here [1], but looks like it's a custom binary format?
Is there a functionality to auto-export (e.g. on save) to plaintext (xml/json/whatever), so I could hook TreeSheets files to other apps? I appreciate it would be lossy, but even a tree/graph structure with text nodes would be good.
E.g. I'm a big fan of using plaintext search over all of my personal data/information, even in siloed apps [2]
[1] https://github.com/aardappel/treesheets/tree/master/TS/examp...
[2] https://beepb00p.xyz/pkm-search.html#personal_information
karlicoss Dec 28, 2022 parent

Hey, it's a bit dated, I've been meaning to update the post for a while, but haven't had time.
I actually bought a remarkable 2 since, but I didn't really end up using it much. IIRC main reason was that annotations are a custom format, and they are basically drawings with highlighter (as opposed to plaintext). I think there were some projects to match them against books and try to extract text, but it didn't work reliably for me. I may be wrong though, maybe things changed since.
That said if you install Koreader on it, then you get proper annotations, and I've been meaning to try to incorporate them in my flow.
karlicoss Dec 28, 2022 parent

Yep, thanks! I really need to update the post with zotero
karlicoss Dec 8, 2022 parent

Some time ago I wanted the best bits from both worlds:
- from cron: specifying all jobs in one file instead of scattering it across dosens of unit files. In 90% of cases I just want a regular schedule and the command, that's it
- from systemd: mainly monitoring and logging. But also flexible timers, timeouts, resource management, dependencies -- for the remaining 10% of jobs which are a little more complicated
So I implemented a DSL which basically translates a python spec into systemd units -- that way I don't have to remember systemd syntax and manually manage the unit files. At the same time I benefit from the simplicity of having everything in one place.
An extra bonus is that the 'spec' is just normal python code
- you can define variables/functions/loops to avoid copy pasting
- you can use mypy to lint it before applying the changes
- I have multiple computers that share some jobs, so I simply have a 'common.py' file which I import from `computer1.py` and `computer2.py` -- the whole thing is very flexible.
You can read more about it here:
- https://beepb00p.xyz/scheduler.html
- https://github.com/karlicoss/dron#what-does-it-do
I've been using this tool for several year now, with hundreds of different jobs across 3 computers, and it's been working perfectly for me. One of the best quality of life improvements I've done for my personal infrastructure.
2 points Oct 23, 2022

Dates and Times and Types in Python

0 comments karlicoss glyph.im
karlicoss Oct 21, 2022 parent

Yes.
- you use an actual programming language (which you're likely to already know) instead of desperately figuring out how to simulate a for loop in yaml or to replace a part of a string, or whatnot
- you have all Python static analysis tools at your disposal. mypy/pylint etc can make your deploys less error prone
- you can easily implement a custom DSL in python, so deploys ends up more declarative and with less copy pasting
- it's easy to implement custom arbitrary operations (since you're working in python) -- and again you can reuse variables, etc instead of having to use some weird templating or pass them in command line
karlicoss Aug 5, 2022 parent

httpa://grep.app often helps for obscure code searches on GitHub
292 points Jun 28, 2022

Add-on support in new Firefox for Android (2021)

212 comments karlicoss mozilla.org
karlicoss May 25, 2022 parent
as a workaround I'm using a keyword search bookmark in firefox mapped to 'g'
```
     https://www.google.com/search?q=%s&tbs=li:1
```
karlicoss May 22, 2022 parent

yep, can confirm, it basically filters out any notebook output from version control (while keeping it intact in the notebook file itself). This works seamlessly with diffing, committing, staging, etc.
karlicoss May 16, 2022 parent

I stopped worrying and in most cases just write configs for my tools in python. Then you can just import/exec it and you're done. I can use all the operations/primitives I already know in Python: string interpolation & operations, loops, Pathlib, imports etc. I can use mypy and all the other existing linting tools to make sure my configuration is correct without having to write a custom linter (and basically reimplement 10% of mypy).
Shameless plug if you wanna read a longer analysis: https://beepb00p.xyz/configs-suck.html
A great example is pyinfra https://github.com/Fizzadar/pyinfra#readme Think Ansible but instead of YAML you write Python. It provides a set of primitives/DSL and some rules you need to adhere to, but otherwise you just write regular python code.
karlicoss Apr 29, 2022 parent

If you want a TLDR/teaser, watch for a minute from https://youtu.be/oI_X2cMHNe0?t=653 . It's fascinating :)
karlicoss Apr 29, 2022 parent

Oh nice, I like it!
So it basically automates detecting useful bits for a particular URL, but it's kind of time consuming and flaky. It could be very helpful to populate the 'rules' database though, and then this database could be shared with other people so they don't have to scrape.
I guess when I said ML (or preferably some fuzzy algorithm/heuristic), I was referring to generifying rules so they also work on the sites not in the rules database. If humans can detect garbage in the URL looking at a few examples, the computer can too :)
karlicoss Apr 29, 2022 parent

Yeah, sadly, to get the canonical attribute, you need to fetch the URL first (which is slow and wasteful). Also sometimes canonical would still be different on the desktop and mobile version of the site, so it still has to be normalised after that
karlicoss Apr 29, 2022 parent

your comments are always such an inspiration to read :) thank you!
karlicoss Apr 29, 2022 parent

It's kind of tricky to do in general case, e.g. even hackernews is keeping meaningful semantic information in id= query parameter.
Because of that it ultimately needs to a site-specific database/algorithm, perhaps with a fallback to the default behaviour like simply cleaning up the most common garbage like (_encoding/usg/etc). I suspect it's possible to use some sort of machine learning to guess the meaningful parts of the URL path/query/fragments, but even for that we need some human curation for the training set. I wish we could collaborate on a shared database/library for that, have sketched some ideas/applications/prior art here: https://beepb00p.xyz/exobrain/projects/cannon.html
I started thinking about it since I have a similar problem in Promnesia (https://github.com/karlicoss/promnesia#readme), a knowledge management tool I'm working on. Ideally I want to normalise URLS, so they address the exact bit of information, and nothing more.

This user hasn’t submitted anything.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous