Preferences

mfrye0
Joined 461 karma
michael [at] savvyiq [dot] ai

  1. I was looking for a version of a proxy that could maximize throughput to each LLM based on its limits. Basically max requests and input/output tokens per second.

    I couldn't find something, so I rolled a version together based on redis and job queues. It works decently well, but I'd prefer to use something better if it exists.

    Does anyone know of something like this that isn't completely over engineered / abstracted?

  2. Thanks for this. That makes sense.
  3. I've been keeping an eye on this space for awhile as it matures a bit further. There's been a number of startups that have popped up around this - apart from Temporal and DBOS, Hatchet.run looked interesting.

    I've been using BullMQ for awhile with distributed workers across K8 and have hacked together what I need, but a lightweight DAG of some sort on Postgres would be great.

    I took a brief look at your docs. What would you say is the main difference of yours vs some of the other options? Just the simplicity of it being a single sql file and a sdk wrapper? Sorry if the docs answer this already - trying to take a quick look between work.

  4. I was going to say the same. We're using binary vectors in prod as well. Makes a huge difference in the indexes. This wasn't mentioned once in the article.
  5. Sorry, habit. I've been debating on exposing these publicly, but they're expensive to create. We have a public interactive demo here for now: https://savvyiq.ai/products/entity-hierarchy

    Here's the live mermaid editor version for the Ikea example: https://mermaid.live/edit#pako:eNqNkV9PwjAUxb9KcxPfRrO17E_3Y...

  6. I noticed the template too. Someone mentioned recently that's actually a good risk signal - scammers often use the same site structure across domains.

    On the research, you're absolutely right. It fits that sweet spot where it's just easy / boring / tedious enough to automate with the current generation of LLMs.

  7. Hey, good question. Yeah, we're aware of many companies that can have thousands of subsidiaries.

    We found that going both up the chain and down / sideways in the chain was too much for the agent to handle - they are two distinct operations. The #1 use case was customers trying to understand the ultimate parent, so we decided to focus on that first.

    We have something roughly working on subsidiaries, but it's not ready for prime time yet. It would likely be a separate API.

  8. Thank you! Yeah, I know it's a fine balance between shipping too early and having a buggy product vs too late. I'm shipping as fast as I can now to keep that tight feedback loop going.
  9. Interesting. So in that case, who had the biggest pain point? The sales team from being held up, or the research time under pressure to deliver?

    We actually haven't explored the big consulting firms yet. I know they are often contracted to do this sort of research for companies.

  10. Thanks. I've been heads down on this for some time and don't have the huge network to share this with.

    Regarding the product, I'm open to any and all feedback for your use case. Trying to follow the adage of "if you are not embarrassed by the first version of your product, you’ve launched too late".

    And wow - just read your reddit post. I'm going to look into that for us too.

  11. Interesting. That's actually where we started. We were doing automated research on vendors from a TPRM perspective and looking for data points around organizational security / reputation. Examples - if the company had been hacked before / how they responded, do they have a CISO, nth party vendors, are they SOC2 / FedRAMP certified, etc. Basically, predictors of risk / stability.

    We realized the underlying business graph was the bottleneck though, so that's been our focus for some time. With that in place, we're now coming full circle on the risk research standpoint.

    On your comment about confidence / liability, we're actually having conversations around that now and getting feedback. First step is exposing all the research and evidence directly to build trust, which is what we're doing now for the new corporate hierarchy system.

  12. Good idea! I picked a random California Ikea entity (IKEA US RETAIL LLC) and ran it through the system. Here's the output - current goal is to get to ultimate parent.

    ## Summary IKEA US RETAIL LLC is a limited liability company. It is wholly owned by IKEA Holding U.S., Inc., and ultimately controlled by Stichting INGKA Foundation, a Dutch foundation that owns Ingka Group.

    ## Graph

      graph TD
        e2[IKEA Property, Inc.]-->e1[IKEA US RETAIL LLC]
        e3[IKEA Holding U.S., Inc.]-->e1[IKEA US RETAIL LLC]
        e4[Ingka Holding B.V.]-->e3[IKEA Holding U.S., Inc.]
        e4[Ingka Holding B.V.]-->e4[Ingka Holding B.V.]
        e5[Stichting INGKA Foundation]-->|100%, 1982|e4[Ingka Holding B.V.]
    
    This is the permalink to the deep research result: https://savvyiq.ai/playground/entity-hierarchy/siq_31ro4EDce...
  13. If you do this sort of thing often, I'd love to chat further. I'm basically trying to automate this sort of manual research around companies with a library of deep research APIs.

    Had a show HN last week that seemed to go under the radar: https://www.hackerneue.com/item?id=45671087

    We launched corporate hierarchy research and working on UBO now. From the corporate hierarchy standpoint, it looks like the Delaware entity fully owns the Estonian entity. Auto generated mermaid diagram from the deep research:

      graph TD
        e1[BuildJet, Inc.]-->|100%, 2022-12-16|e2[Buildjet OÜ]
  14. Since it's quiet here, figured I'd share what the API actually spits out. Here's MG Motor's ownership chain (this is just the Mermaid diagram field - we return a bunch of other stuff too):

      graph TD
        e2[SAIC MOTOR UK HOLDING CO., LTD.]-->|2005-02-15|e1[MG MOTOR UK LTD]
        e1[MG MOTOR UK LTD]-->|2018|e7[MG Sales Centre Limited]
        e4[SAIC Motor Corporation Limited]-->e2[SAIC MOTOR UK HOLDING CO., LTD.]
        e4[SAIC Motor Corporation Limited]-->e3[SAIC MOTOR INTERNATIONAL UK LTD]
        e5[Shanghai Automotive Industry Corporation Group]-->|62.69%|e4[SAIC Motor Corporation Limited]
        e6[Shanghai State-owned Assets Supervision and Administration Commission Shanghai SASAC]-->e5[Shanghai Automotive Industry Corporation Group]
    
    You can copy/paste that into any Mermaid renderer to see it visually. Pretty wild how a British car brand ends up tracing back to Shanghai's government.

    Happy to run lookups for other companies if anyone's curious what their ownership looks like!

  15. I can confirm on the performance benefits. I wanted to start with uuidv7 for a new DB earlier this year, so I put together a function to use in the meantime. Once the function is available natively, we'll just migrate to use it instead.

    For anyone interested:

    CREATE FUNCTION uuidv7() RETURNS uuid AS $$ -- Get base random UUID and overlay timestamp select encode( set_bit( set_bit( overlay(uuid_send(gen_random_uuid()) placing substring(int8send((extract(epoch from clock_timestamp())*1000)::bigint) from 3) from 1 for 6), 52, 1), -- Set version bits to 0111 53, 1), 'hex')::uuid; $$ LANGUAGE sql volatile;

  16. I'm working on Plaid / Perplexity for business data.

    The basic idea is that integrating business data into a B2B app or AI agent process is a pain. On one side there's web data providers (Clearbit, Apollo, ZoomInfo) then on the other, 150 year old legacy providers based on government data (D&B, Factset, Moody's, etc). You'd be surprised to learn how much manual work is still happening - teams of people just manually researching business entities all day.

    At a high level, we're building out a series of composable deep research APIs. It's built on a business graph powered by integrations to global government registrars and a realtime web search index. Our government data index is 265M records so far.

    We're still pretty early and working with enterprise design partners for finance and compliance use cases. Open to any thoughts or feedback.

    https://savvyiq.ai

  17. Adding on to the other comments here about Next.js vs Remix.js.

    We had to choose a framework for our new app last year and were researching the current state of things. Next.js was / is by far the most popular, but also had some of the worst feedback and caution to stay away. Remix isn't perfect, but I appreciate less abstractions and working with simple request / response structures.

    Also, a warning for those hiring for frontend / fullstack roles:

    Over the years when hiring for roles for X frontend framework, we would constantly find "experts" in framework X that would really impress us. Whether it was React, Angular, Vue, Remix, etc. Then after moving forward, we found they didn't know core JS fundamentals and were basically useless beyond the framework.

  18. Great write up. I worked at a payment processor awhile back, so I really appreciate the complexity of the problem you're tackling.

    How are you doing the legal entity resolution? In your example - connecting "AMZN" to "Amazon.com, Inc."

    We happen to be solely working on this problem, as it's ridiculously complicated by itself from an global perspective. We're still in stealth and preparing to share something ourselves on HN shortly, but just curious how you're approaching it since your pipeline is already doing a ton.

  19. Hey George and Alex. This looks awesome. We're working on something similar, but for all of the businesses in the world: https://savvyiq.ai. We're international and have 265M+ entities in the system. We're actually preparing to do our own formal share on HN shortly.

    We're working with enterprise customers now that want to use our system to dedupe all their gnarly business data, ground it to real legal entities, enrich it with base insights, then are asking for further data points more from a risk and due diligence standpoint.

    Product information has come up repeatedly, but as you clearly know, that is a beast in itself that I don't think we'll ever tackle. For context, I helped build out the product data infra at https://www.wiser.com, and I'm not inclined to spend my time categorizing and building the taxonomy for pots, pans, and towels again.

    I'm going to try out the product and happy to chat further if you think there's an opp to collaborate in some way. My email is in my profile.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal