hleath [at] archil.com
- This is a super interesting product, guys. I get that agents aren't great for everything right now, but I'd expect that they'll continue to improve over time (like everything in the LLM space).
How do you see the product evolving as agents become better and better?
- Shot you an email about how we can potentially help you with this.
- The root cause here is just that managing any kind of storage service is instantly painful. The property of "not losing data" means that you are sort of required to always be doing something in order to keep it healthy.
- I believe this is also changing with instances that now allow you to adjust the ratio of throughput on the NIC that's dedicated to EBS vs. general network traffic (with the intention, I'm sure, that people would want more EBS throughput than the default).
- I’d be happy to chat more about your needs and try to help recommend a path forward. Feel free to shoot me an email at the address in my profile.
- 2 points
- > As a secondary, I wonder if it's possible to actively use a SQLite interface against a database file on S3, assuming a single server/instance is the actual active connection.
You could achieve this today using one of the many adapters that turn S3 into a file system, without needing to wait for any SQLite buy in.
- S3 Mountpoint is exposing a POSIX-like file system abstraction for you to use with your file-based applications. Foyer appears to be a library that helps your application coordinate access to S3 (with a cache), for applications that don't need files and you can change the code for.
- Storage Gateway is an appliance that you connect multiple instances to, this appears to be a library that you use in your program to coordinate caching for that process.
- These are, effectively, different use cases. You want to use (and pay for) Express One Zone in situations in which you need the same object reused from multiple instances repeatedly, while it looks like this on-disk or in-memory cache is for when you may want the same file repeatedly used from the same instance.
- Yes, definitely. S3 has a time to first byte of 50-150ms (depending on how lucky you are). If you're serving from memory that goes to ~0, and if you're serving from disk, that goes to 0.2-1ms.
It will depend on your needs though, since some use cases won't want to trade off the scalability of S3's ability to serve arbitrary amounts of throughput.
- Woah buddy, I worked with Andy for years and this is not my experience. Moving a large product like S3 around is really, really difficult, and I've always thought highly of Andy's ability to: (a) predict where he thought the product should go, (b) come up with novel ways of getting there, and (c) trimming down the product to get something in the hands of customers.
Also, did you create this account for the express purpose of bashing Andy? That's not cool.
- Some quick questions that came up in the last post, that I wanted to go ahead and address:
How are you different than existing products like S3 Mountpoint, S3FS, ZeroFS, ObjectiveFS, JuiceFS, and cunoFS?
Archil is designed to be a general-purpose storage system to replace networked block storage like EBS or Hyperdisk with something that scales infinitely, can be shared across multiple instances, and synchronizes to S3. Existing adapters that turn S3 into a file system are either not POSIX-compliant (such as Mountpoint for S3, S3FS, or GoofyFS), do not write data to the S3 bucket in its native format (such as JuiceFS, ObjectiveFS – preventing use of that data directly from S3), or are not designed for a fully-managed one-click set up (such as cunoFS). We have massive respect for folks who build these tools, and we’re excited that the data market is large enough for all of us to see success by picking different tradeoffs.
What regions can I launch an Archil disk in?
We’re live in 3 regions in AWS (us-east-1, us-west-2, and eu-west-1) and 1 region in GCP (us-central1). Today, we’re also able to deploy directly into on-premises environments and smaller GPU clouds. Reach out if you’re interested in an on-premises deployment (hleath [at] archil.com).
Can I mount Archil from a Kubernetes cluster?
Yes! We have a CSI driver that you can use to get ReadWriteOnce and ReadWriteMany volumes into your Kubernetes cluster.
What performance benchmarks can you share?
We generally don’t publish specific performance benchmarks, since they are easy to over-index on and often don’t reflect how real-world applications run on a storage system. In general, Archil disks provide ~1ms latency for hot data and can, by default, scale up to 10 GiB/s and tens of thousands of IOPS. Contact me at hleath [at] archil.com, if you have needs that exceed these numbers.
What happens if your caching layer goes down before a write is synchronized to S3?
Our caching layer is, itself, highly-durable (~5 9s). This means that once a write is accepted into our layer, there are no individual components (such as an instance or an AZ) failure which would cause us to lose data.
What are you planning next for Archil?
By moving away from NFS and using our new, custom protocol, we have a great foundation for the performance work that we’re looking to accomplish in the next 6 months. In the short-term, we plan to launch: one-click Lustre-like scale-out performance (run hundreds of GiB/s of throughput and millions of IOPS without provisioning), the ability to synchronize data from non-object storage sources (such as HuggingFace), and the ability to use multiple data sources on a single disk.
How can I learn more about how the new protocol works?
We’re planning on publishing a bunch more on the protocol in the coming weeks, stay tuned!
- 19 points
- Basically, we are building this at Archil (https://archil.com). The reason these things are generally super expensive is that it’s incredibly hard to build.
- My (limited) understanding was that the industry previously knew that it was unsafe to share GPUs between tenants, which is why the major cloud providers only sell dedicated GPUs.
- This is actually not the case. The TLS stream ensures that the packets transferred between your machine and S3 are not corrupted, but that doesn't protect against bit-flips which could (though, obviously, shouldn't) occur from within S3 itself. The benefit of an end-to-end checksum like this is that the S3 system can store it directly next to the data, validate it when it reads the data back (making sure that nothing has changed since your original PutObject), and then give it back to you on request (so that you can also validate it in your client). It's the only way for your client to have bullet-proof certainty of integrity the entire time that the data is in the system.
- Thanks! Under the hood, when you mount an Archil volume, you connect to a fleet of instances that we're managing with SSD drives attached, which cache reads+writes before hitting the underlying data in your S3 bucket.
- Hey, I'm Hunter -- the founder of Archil. I'll be around in the comments to answer any questions that people have about the platform, or how things have changed since the Fall.
This actually makes a ton of sense to me in lots of the LLM contexts (e.g. seeing how we are starting to prefer having LLMs write one-off scripts to do API calls rather than just pointing them at problems and having them try it directly).
Thanks!