- The outage happened in Aug 19th-20th, China is directly linked to the misconfiguration of GFW. It's government directed, not so-called "technical problem".
See https://gfw.report/blog/gfw_unconditional_rst_20250820/en/
- I see 3-4% of 1GB ram usage for caddy only. Note that in my case caddy serves as reverse proxy and there is also very little traffic.
- Well, I use Arch Linux and the caddy package from pacman just works. You may checkout https://github.com/caddyserver/xcaddy for custom caddy build.
Besides, I don't use wildcard certs. I use caddy to reverse proxy a number of self-hosting things, and manually assign domain names to each of them. Caddy can handles many certs just fine.
- Thank you Let's Encrypt, together with the acme.sh , caddy and the whole ecosystem for TLS.
You simply cannot emphasize the information security enough if all your Internet traffic is audited, censored and manipulated by a number of adversaries supported by (authoritarian) governments and what not.
- > From their project page:
> We analyze over 1,100 deep neural networks—including 500 Mistral-7B LoRAs and 500 Vision Transformers. We provide the first large-scale empirical evidence that networks systematically converge to shared, low-dimensional spectral subspaces, regardless of initialization, task, or domain.
I instantly thought of muon optimizer which provides high-rank gradient updates and Kimi-k2 which is trained using muon, and see no related references.
The 'universal' in the title is not that universal.
- Salute to anyone like Xu Qinxian who refused to give up their own moral principals when facing inhumane commands from psychopath leaders like Deng Xiaoping, Mao Zedong, and so on. Also, for anyone who obeyed inhumane commands like mindless shells, you'll eat what you grow.
Note: Historical records reveal that the people behind the coordination of the Tiananmen Massacre (which this post is talking about) is Deng Xiaoping.
- For arXiv papers, I prefer HTML format much more than PDF format.
Compared to PDF format, HTML format is much more accessible because of browsers. Basically I can reuse my browser extensions to do anything I like without hassle, like translation, note taking, sending texts to LLMs, and so on.
For now, arXiv offers two HTML services: the default one in https://arxiv.org/html/xxxx.xxxxx , and the alternative one in https://ar5iv.labs.arxiv.org/html/xxxx.xxxxx , here 'x' is a placeholder for a number or digit.
The most glaring problem of the default HTML service is the coverage of papers. Sometimes it just doesn't work, e.g., https://arxiv.org/html/2505.06708 . The solution may be switch to alternative HTML service, e.g., https://ar5iv.labs.arxiv.org/html/2505.06708 .
Note that alternative HTML service also has coverage problem. Sometimes both HTML services fail, e.g. https://arxiv.org/abs/2511.22625 .
- Thank you Mistral for releasing new small parameter-efficient (aka dense) models.
- Wow. This CEO effectively says that he loves AI things in Windows therefore all Windows users who hate AI things should just get fucked and suffer from his decisions.
- This repo is valuable for local LLM users like me.
I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.
For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.
As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.
- After opened https://projecteuler.net/ I got
403 Forbidden Request forbidden by administrative rules.
Note: I didn't know and open this website until now.
- In China, we call "gig workers" "flexable workers" (灵活就业), which are very similar to the unemployed. The reason is that the low, zero, or even negative walfare imposed by the Chinese goverment on all tax payers makes neither of them a viable option for a living.
Want some evidence? "Marriage market" (婚恋市场) in China is a gruesme battlefield for at least 20 years. Chinese people can fake their political stance, financial stance, or even marriage status. However, when it comes to the standard of choosing partners for life, they cannot fake it because it means so much to them. Ask them what they think about both "gig workers" and "the umeployed", and try to find any difference, if any. The "chain of contempt" (歧视链) in the "Marriage market" is a relatively good measure of what people really think in China.
- Just like force pushing Manifest v3 on Chrome/Chromium, this is a step towards 'more security', from mouthpieces of Google.
Note that 'security' here is only for Google itself, for users it's an utterly different thing, e.g., inconvenience, censorship, etc..
- I mean, yeah. From the Table 9: Hallucination evaluations in GPT-OSS model card [1], GPT-OSS-20b/120b have accuracy of 0.067/0.168 and hallucination rate of 0.914/0.782 separately, while o4-mini has accuracy of 0.234 and hallucinate rate of 0.750. These numbers simply mean that GPT-OSS models have little real world knowledge, and they hallucinate hard. Note that little real world knowledge has always been a "feature" of the Phi-LLM series because of the "safety" (for large companies), or rather, "censorship" (for users) requirements.
In addition, from Table 4: Hallucination evaluations in OpenAI o3 and o4-mini System Card [2], o3/o4-mini have accuracy of 0.49/0.20 and hallucination rate of 0.51/0.79.
In summary, there is a significant real world knowledge gap between o3 and o4-mini, and another significant gap between o4-mini and GPT-OSS. Besides, the poor real world knowledge exhibited in GPT-OSS is aligned with the "feature" of Phi-LLM series.
[1] https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7... [2] https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f372...
- Super shallow (24/36 layers) MoE with low active parameter counts (3.6B/5.1B), a tradeoff between inference speed and performance.
Text only, which is okay.
Weights partially in MXFP4, but no cuda kernel support for RTX 50 series (sm120). Why? This is a NO for me.
Safety alignment shifts from off the charts to off the rails really fast if you keep prompting. This is a NO for me.
In summary, a solid NO for me.
- Comments below is from the perspective of an arch Linux user, not maintainer or authors of some software.
When installing softwares on arch Linux, first searching for official packages provided by Arch Linux maintainers, then official installation methods approved by authors of the software, or AURs which do the installation in the exact way as the authors of the software describe.
A search on the default installation method of Firefox and librewolf package on arch Linux is listed below.
If AUR is required to install a package, note that AURs are not trusted by default because not all AURs are not maintained by trusted users. Always check the source file and the installation method documented in PKGBUILD. Don't do the installation until EVERY line in the PKGBUILD is reasonable.
- This is an open weight model, which is in contrast with closed-source models.
However, 1t parameters makes it nearly impossible for local inference, let alone fine-tuning.
- Two questions:
Which version of chrome is the first to implement these headers?
What are the potential effects of these headers on chromium forks, e.g. ungoogled chromium?
- Not to defend chrome or chromium, there is a way for chrome users to use manifest v2 in version 138 and above. See the link below.
https://github.com/uBlockOrigin/uBlock-issues/discussions/29...
For me, I choose not to manually update my ungoogled chromium to version 138 and above.
- Below are my comments on Magistral small (not medium).
24B size is good for local inference.
As a model outputting long "reasoning" traces (~10k tokens), 40k context length is a little concerning.
Where are the results of normal benchmarks, e.g., MMLU/pro, IFEval and such.
Still, thank you Mistral team for releasing this model with Apache 2.0.
- One problem with this paper is that authors didn't conduct experiments on popular LLMs from Qwen and Mistral. Why?
- About <10B LLMs, yes it's not that good. However, <10B is a range that allows many people to do their own tweaking and fine-tuning.
- For jailbreak, you can have a test on this.
https://github.com/elder-plinius/L1B3RT4S/blob/main/ALIBABA....
- For a local LLM, you can't really ask for a certain performance level, it is what it is.
Instead, you can ask for the architecture, be it dense or MoE.
Besides, let's assume the best open weight LLM for now is deepseek r1, is it practical for you to run r1 locally? If not, r1 means nothing to you.
Maybe r1 will be surpassed by llama 4 behemoth. Is it practical for you to run behemoth locally? If not, behemoth also means nothing to you.
- YMMV.
Parameter efficiency is an important consideration, if not the most important one, for local LLMs because of the hardware constraint.
Do you guys really have GPUs with 80GB VRAM or M3 ultra with 512GB rams at home? If I can't run these ultra large MoEs locally, then these models mean nothing to me. I'm not a large LLM inference provider after all.
What's more, you also lose the opportunities to fine-tune these MoEs when it's already hard to do inference with these MoEs.
- For ultra large MoEs from deepseek and llama 4, fine-tuning on these models is becoming increasingly impossible for hobbyists and local LLM users.
Small and dense models are what local people really need.
Although benchmaxxing is not good, I still find this release valuable. Thank you Qwen.
- Yeah. I know the bitter lesson.
For neutral networks, on one hand, larger size generally indicates higher performance upper limit. On the other hand, you really have to find ways to materialize these advantages over small models, or larger size becomes a burden.
However, I'm talking about local usage of LLMs instead of production usage, which is severely limited by GPUs with low VRAM. You literally cannot run LLMs beyond a specific size.
- More on the accessibility problem, even a request from a Meta engineer was rejected. Is that normal?
See https://huggingface.co/spaces/meta-llama/README/discussions/...
- People who downvoted this comment, do you guys really have GPUs with 80GB VRAM or M3 ultra with 512GB rams at home?
Well, as a local VRAM libertarian, to manually prune the safety alignment part of a 500B LLM for it to run on 1GB RAM or VRAM is definitely a lifetime goal for me.