Preferences

avaer
Joined 1,829 karma
https://x.com/aitheologian Upstreet, Moemate, Webaverse, M3-org, ex Magic Leap, Microsoft, Webflow, Supermedium

  1. > Ads provide more insights to user real needs than anything else

    How?

    You can get insights into user behavior without ads, and I'm sure Apple is doing that already.

    Doesn't making good products that people want give more insight to user needs? Who wants ads?

  2. If someone is socializing in VRChat it would follow they are able to be social. So a bit of a non sequitur.

    Would it be more accurate to say you have a categorical disdain for the way they are socializing? Why do you think that is (other than the obvious stuff, which seems to be more an anonymous internet thing than anything particular to VRChat)?

    I'm genuinely curious because I see this attitude a lot and I don't understand it.

  3. The best I've seen so far is Marble from World Labs, though that gives you a full 360 environment and takes several minutes to do so.
  4. It's very refreshing to see political news be about how someone misplaces their letters.
  5. It makes your picture 3D. The "photorealistic" part is "it's better than these other ways".
  6. Right.

    I just want to emphasize that this is not a NERF where the model magically produces an image from an angle and then you ask "ok but how did you get this?" and it throws up its hands and says "I dunno, I ran some math and I got this image" :D.

  7. There's nothing "hidden" about the 3d represenation. It's a point cloud (in meters) with colors, and a guess at the the "camera" that produced it.

    (I am oversimplifying).

  8. Is there a link with some sample gaussian splat files coming from this model? I couldn't find it.

    Without that that it's hard to tell how cherry-picked the NVS video samples are.

    EDIT: I did it myself, if anyone wants to check out the result (caveat, n=1): https://github.com/avaer/ml-sharp-example

  9. [Notwithstanding Poe's Law] I chuckled at the satire, but you could just as easily use AI to make the code suck more to pass the filters.

    I don't know the solution to this; you have a similar problem in education where students generate their homework with AI, teachers grade it with AI, and you have to question whether the institution makes sense anymore.

  10. Does anyone have any prior art for an MCP server "message bus" with an agent framework like Mastra?

    E.g. suppose I want my agent to operate as Discord bot listening on channel via an MCP server subscribed to the messages. i.e. the MCP server itself is driving the loop, not the framework, with the agent doing the processing.

    I can see how this could be implemented using MCP resource pubsub, with the plugin and agent being aware of this protocol and how to pump the message bus loop, but I'd rather not reinvent it.

    Is there a standard way of doing this already? Is it considered user logic that's "out of scope" for the MCP specification?

    EDIT: added an example here https://github.com/avaer/mcp-message-bus

  11. At what point did we stop calling them phones?
  12. This looks a fine-tune of the classic zero123 (https://github.com/cvlab-columbia/zero123) I’m excited to check out the quality improvements.

    Though 3d model synthesis is one use case, I found the less advertised base reprojection model to be more useful for gamedev at the moment. You can generate a multiview spritesheet from an image, and it’s fast enough for synthesis during a gameplay session. I couldn’t get a good quality/time balance to do the same with the 3d models, and the lack of mesh rigging or animation combined with imperfections in a fully 3d model tends to break the suspension of disbelief compared to what players are used to for full 3d. I’m sure this will change as the tech develops and we layer more AI on top (automatic animation synthesis is an active research area).

    If you’re interested in this you might also want to check out deforum (https://github.com/deforum-art/deforum-stable-diffusion) which provides even more powerful camera controls on top of stable diffusion designed for full scenes rather than single objects.

  13. How does one apply for a job with the the internal A16Z teams experimenting with this?
  14. I ran this for most of today in the background and I have some thoughts:

    The quality is good and it's giving you all of the maps (as well as the .blends!). It seems great for its stated goal of generating ground truth for training.

    However, it's very slow/CPU bound (go get lunch) so probably doesn't make sense for applications with users behind the computer in the current state.

    Additionally, the .blend files are so unoptimized that you can't even edit them on a laptop with texturing on. The larger generations will OOM a single run on a reasonably beefy server. To be fair, these warnings are in the documentation.

    With some optimization (of the output) you could probably do some cool things with the resulting assets, but I would agree with the authors the best use case is where you need a full image set (diffuse, depth, segmentation) for training, where you can run this for a week on a cluster.

    To hype this up as No Man's Sky is a stretch (NMS is a marvel in its own right, but has a completely different set of tradeoffs).

    EDIT: Although there are configuration files you can use to create your own "biomes", there is no easy way to control this with an LLM. Maybe you might be able to hack GPT-4 functions to get the right format for it to be accepted, but I wouldn't expect great results from that technique.

  15. Definitely hard to keep up with the tech, even if you're deep in it.

    I presented a 3D gameplay hack of this at the recent Blockade meetup: https://youtu.be/TfRJeedTeOs

    The metric depth model I used (ZoeDepth) is quite new -- most previous models were inverse relative depth, with poor scaling properties, especially for artistic worlds.

    But now there is a much better depth model coming from Intel called Depth Fusion which they are adding to the Blockade API and also open sourcing (!)...

    Also worth checking out what's possible with SD ControlNet: https://twitter.com/BlockadeLabs/status/1634578058287132674

  16. For a couple of years I've been compiling an (admittedly worse) fuzzy-parsed version of this with GPT-3 (and friends) to generate and play out executable Javascript lore in an RPG engine.

    So, I am very surprised I'm only hearing about Inform now -- clearly not hanging out in the right circles.

    Does anyone have any recommendations for keeping abreast of projects like this so we can leverage the best of open source and not reinvent the wheel?

    I just found out about NarraScope and I hope to attend the next one. Is there other things I should check out? Twitter firehose doesn't work too well...

This user hasn’t submitted anything.