walrus01 parent
Is there a figure somewhere on how many TB of images this will produce per day when running in automated sky survey mode?
> The vast archive, growing by 20 terabytes each night, will after 1 year contain more optical astronomy data than that produced by all previous telescopes combined.
73 PB over the full runtime of the survey. That’s a nice new datacenter filled to the brim with images.
Storage densities these days are kinda amazing, it's not that much of a datacenter. Assuming you chunk it with triple redundancy, that's 220k TB raw. 10k 22 TB disks, you put them in one of those 4U 50 disk storage pods. 200 pods, 10 of those in a rack with some space left for a switch and power, so that's only 20 racks.
I found this pdf presentation with lots of great technical details about data management and a devops infra oriented view of this telescope: https://ci-compass.org/assets/602137/2025jan23_cicompass_rub...
Worth a read for the devops guys around here!
- about 20TB per day, around 100PB expected for the whole survey
- 0.5PB ceph cluster for local data
- workloads on 20 nodes kubernetes cluster/argocd
- physical infra managed with puppet/ansible
- 100Gbs(+40Gs backup) fiber connection to US-based datacenter for further processing
I wonder if they could reduce the data size at rest by using specialized compressing techniques. Your probably could build an averaged "model" of the sky observed by the telescope (probably with account for stellar parallax and bright planets) and store only compressed diffs, not full images.
But I guess, since storage is relatively cheap, it's simply impractical to bother with such complexity.
There's quite a bit of black out there. That should compress easily.
The usual lossless image compression algorithms is the given. I am talking about compressing it further since the telescope observes the same (or largely overlapping) patches of the sky and the most significant signal is stars, which are more or less "constant". At the very least, they probably could use the lossless "animation" compression algorithms like APNG or FLIF for consequent images of the same sky patch.
actually the telescope devops guys were hiring a couple years ago on HN: https://www.hackerneue.com/item?id=38101085 :-D
Insanity - love it
If you think this is insanity I encourage you to look up the expected data to come out of the SKA. Even after several processing steps they expect several hundred PB/year (the raw data which is not being archived is several orders of magnitude more). That is only SKA-low I think for SKA-mid we are talking Exabyte/year. I recall that their chief scientist said once they are operational they will process more data than google and facebook combined.
Yeppers: https://en.wikipedia.org/wiki/Square_Kilometre_Array
In-page search for "data challenges". Pfew, that's a lot of data.