Comment by walrus01 - Hacker Neue

walrus01 Jun 20, 2025 parent

Is there a figure somewhere on how many TB of images this will produce per day when running in automated sky survey mode?

Jgrubb Jun 20, 2025

> The vast archive, growing by 20 terabytes each night, will after 1 year contain more optical astronomy data than that produced by all previous telescopes combined.

9dev Jun 20, 2025

73 PB over the full runtime of the survey. That’s a nice new datacenter filled to the brim with images.

jankeymeulen Jun 20, 2025

Storage densities these days are kinda amazing, it's not that much of a datacenter. Assuming you chunk it with triple redundancy, that's 220k TB raw. 10k 22 TB disks, you put them in one of those 4U 50 disk storage pods. 200 pods, 10 of those in a rack with some space left for a switch and power, so that's only 20 racks.

LiamPowell Jun 20, 2025

Only 50 disks? WD sells a 102 disk 4u box and Seagate sells a 106 disk one.

jl6 Jun 20, 2025

Probably about 10 racks if using dense HDDs.

blop Jun 20, 2025

I found this pdf presentation with lots of great technical details about data management and a devops infra oriented view of this telescope: https://ci-compass.org/assets/602137/2025jan23_cicompass_rub...

Worth a read for the devops guys around here!

  - about 20TB per day, around 100PB expected for the whole survey
  - 0.5PB ceph cluster for local data
  - workloads on 20 nodes kubernetes cluster/argocd
  - physical infra managed with puppet/ansible
  - 100Gbs(+40Gs backup) fiber connection to US-based datacenter for further processing

newpavlov Jun 20, 2025

I wonder if they could reduce the data size at rest by using specialized compressing techniques. Your probably could build an averaged "model" of the sky observed by the telescope (probably with account for stellar parallax and bright planets) and store only compressed diffs, not full images.

But I guess, since storage is relatively cheap, it's simply impractical to bother with such complexity.

xhkkffbf Jun 20, 2025

There's quite a bit of black out there. That should compress easily.

newpavlov 5 days ago

The usual lossless image compression algorithms is the given. I am talking about compressing it further since the telescope observes the same (or largely overlapping) patches of the sky and the most significant signal is stars, which are more or less "constant". At the very least, they probably could use the lossless "animation" compression algorithms like APNG or FLIF for consequent images of the same sky patch.

aragilar 4 days ago

Look up fpack and funpack.

blop 5 days ago

actually the telescope devops guys were hiring a couple years ago on HN: https://www.hackerneue.com/item?id=38101085 :-D

Melatonic Jun 20, 2025

Insanity - love it

cycomanic Jun 20, 2025

If you think this is insanity I encourage you to look up the expected data to come out of the SKA. Even after several processing steps they expect several hundred PB/year (the raw data which is not being archived is several orders of magnitude more). That is only SKA-low I think for SKA-mid we are talking Exabyte/year. I recall that their chief scientist said once they are operational they will process more data than google and facebook combined.

cb321 Jun 20, 2025

Yeppers: https://en.wikipedia.org/wiki/Square_Kilometre_Array

In-page search for "data challenges". Pfew, that's a lot of data.

ethan_smith Jun 20, 2025

The Rubin Observatory will generate approximately 20TB of raw image data per night, with an annual data production of about 15PB for the 10-year survey.

This item has no comments currently.