https://hnup.date/
- 4 points
- 3 points
- 4 points
- Sorry, just saw this.
I absolutely agree, but it's really stubborn with the flowery language. I tried adding things like "DO NOT USE EMPTY PHRASES LIKE 'EVER-EVOLVING TECH LANDSCAPE'!!!!!" to the prompt, but it just can't resist.
I want to give the whole system an overhaul, maybe newer models are better at this. Or maybe a second LLM pass to de-flowerize (lol) the language.
- Impressive, might use this for https://hnup.date
- Very nice!
> I have a mechanism to quickly delete problem submissions.
Did you build a male genitalia swastika classifier like the fish guy? (What a sentence)
- I'm building an app for language learning with Youtube. I realized that yt probably has the largest collection of spoken language that ever existed, so I wanted to make it accessible, especially on mobile.
I'm focusing on Chinese (Mandarin) right now, because that's what I've been learning, and the language learning community on reddit likes it too. But other languages are also available.
Link: https://lingolingo.app
- I tried it on a M1 Pro MBP using Docker. It's quite slow (no MPS) and there are no timestamps in the resulting transcript. But the basics are there. Truncated output:
Fetching video metadata... Downloading from YouTube... Generating transcript using medium model... === System Information === CPU Cores: 10 CPU Threads: 10 Memory: 15.8GB PyTorch version: 2.7.1+cpu PyTorch CUDA available: False MPS available: False MPS built: False Falling back to CPU only Model stored in: /home/app/.cache/whisper Loading medium model into CPU... 100%|| 1.42G/1.42G [02:05<00:00, 12.2MiB/s] Model loaded, transcribing... Model size: 1457.2MB Transcription completed in 468.70 seconds === Video Metadata === Title: 厨师长教你:“酱油炒饭”的家常做法,里面满满的小技巧,包你学会炒饭的最香做法,粒粒分明! Channel: Chef Wang 美食作家王刚 Upload Date: 20190918 Duration: 5:41 URL: https://www.youtube.com/watch?v=1Q-5eIBfBDQ === Transcript === 哈喽大家好我是王刚本期视频我跟大家分享... - So you're saying that they should analyze both audio and video to increase the quality of the captions, if the video has hard-coded captions? I guess that's possible, just a question of effort vs. payoff.
Inaccurate auto-captions for videos with hard coded captions probably isn't a big enough pain to warrant big investments?
- I'm working on an app that's based around youtube videos for language learning. I had to solve the same problem of youtube automatically changing the audio track to match the device locale.
Even thought about making a spin off app with only the no-translate feature, that simply always uses the original title and audio. I guess revanced can do this too, but maybe there's enough people who don't use revanced, or don't know about this feature. Thoughts?
- That's an interesting space to explore! I'm wondering about the baseline in the benchmarks. Which prompts did you use for those? I'm asking because some of the resulting prompts seem fairly generic, and I'm wondering if you could just blanket add them to each prompt and also see an improvement. Things like "Identify the question (what are you trying to find?)".
In the same vein, wouldn't it be interesting to measure which part of the prompt most contributed to better solving the problem? Surely some parts will be just noise and can be trimmed away.
Also wondering what this does, since the model probably won't (can't?) actually read the problem multiple times:
> Read the problem carefully (multiple times). - I was curious if the sensor would pick up other things like trees or other cyclist, but it seems like they accounted for that:
> We then log a sensor events [sic] if the majority of cells in the sensor frame agree to the same value within a threshold parameter [...]. This ensures that sensor events are only logged when large objects like cars block the sensor’s field-of-view , i.e., one or more small objects like branches or distance pedestrians in the sensor’s field-of-view will not trigger this condition. While there is no guarantee that this approach strictly identifies cars, we empirically saw during testing that passing cyclists and pedestrians rarely satisfied this condition at the typical passing distance due to the wide field-of-view of the VL53L8.
Also interesting that it's quite cheap to build:
> The whole system can cost less than $25 [...]
From the paper https://dl.acm.org/doi/10.1145/3706598.3713325
- 97 points
- Demo of the pronunciation feature (currently only in the extension): https://youtu.be/d42i4httuao
- 1 point
- Saw this on HN a while ago [1], really eye-opening: https://www.calcalistech.com/ctechnews/article/b1a1jn00hc
> The first sales come from the loyal CISOs who work with the fund.
> This "loyalty program" - which encourages deepening the relationship between the CISO and a party other than his employer - is seen by many in the industry as a red line crossed by Ra'anan and Cyberstarts.
> Cyberstarts vehemently denies [...] and claims that CISOs were never remunerated for purchasing the products of the portfolio companies.
https://github.com/PyroMikeGit/SuperKaizoIronMON