Preferences

pamelafox
Joined 2,064 karma
Principal Cloud Advocate @ Microsoft, focusing on Python! Formerly @ UC Berkeley CS, Khan Academy, Coursera, Google. More about me at www.pamelafox.org

  1. I'm on the Python advocacy team at Microsoft, so I've been experimenting a bit with the new framework. It works pretty well, and is comparable to Langchainv1 and Pydantic-AI, but has tighter integrations with Microsoft-specific technologies. All the frameworks have very similar Agent() interfaces as well as graph-based approaches (Workflow, Langgraph, Graph).

    I have a repository here with similar examples across all those frameworks: https://github.com/Azure-Samples/python-ai-agent-frameworks-...

    I started comparing their features in more details in a gist, but it's WIP: https://gist.github.com/pamelafox/c6318cb5d367731ce7ec01340e...

    I can flesh that out if it's helpful. I find it fascinating to see where agent frameworks converge and diverge. Generally, the frameworks are converging, which is great for developers, since we can learn a concept in one framework and apply it to another, but there are definitely differences as you get into the edge cases and production-level sophistication.

  2. Yes, AI Search has a new agentic retrieval feature that includes synthetic query generation: https://techcommunity.microsoft.com/blog/azure-ai-foundry-bl... You can customize the model used and the max # of queries to generate, so latency depends on those factors, plus the length of the conversation history passed in. The model is usually gpt-4o or gpt-4.1 or the -mini of those, so it's the standard latency for those. A more recent version of that feature also uses the LLM to dynamically decide which of several indices to query, and executes the searches in parallel.

    That query generation approach does not extract structured data. I do maintain another RAG template for PostgreSQL that uses function calling to turn the query into a structured query, such that I can construct SQL filters dynamically. Docs here: https://github.com/Azure-Samples/rag-postgres-openai-python/...

    I'll ask the search about SPLADE, not sure.

  3. I believe that Azure AI Search currently uses lucene for BM25, hnswlib for vector search, and the Bing re-ranking model for semantic ranking. (So, no, it does not, though features are similar)
  4. I know :( But I think vector DBs and vector search got so hyped that people thought you could switch entirely over to them. Lots of APIs and frameworks also used "vector store" as the shorthand for "retrieval data source", which didn't help.

    That's why I write blog posts like https://blog.pamelafox.org/2024/06/vector-search-is-not-enou...

  5. Do you mean that you're using the Copilot indexer for Sharepoint docs? https://learn.microsoft.com/en-us/microsoftsearch/semantic-i...

    AI Search team's been working with the Sharepoint team to offer more options, so that devs can get best of both worlds. Might have some stuff ready for Ignite (mid November).

  6. At Microsoft, that's all baked into Azure AI Search - hybrid search does BM25, vector search, and re-ranking, just with setting booleans to true. It also has a new Agentic retrieval feature that does the query rewriting and parallel search execution.

    Disclosure: I work at MS and help maintain our most popular open-source RAG template, so I follow the best practices closely: https://github.com/Azure-Samples/azure-search-openai-demo/

    So few developers realize that you need more than just vector search, so I still spend many of my talks emphasizing the FULL retrieval stack for RAG. It's also possible to do it on top of other DBs like Postgres, but takes more effort.

  7. I'd like to know as well, so that I can set up a caterpillar cam.
  8. Love this! Relatedly, does anyone have a suggestion for an outdoor solar-powered web camera that I could point at the critters in my garden? I'd love to stream a MonarchCam or MantisCam some day.
  9. Ooo bobcats! I live in the bay area near Tilden Park, and I spent a while on iNaturalist trying to figure out where the bobcats hang out, as my 6 year old is very interested in wild cats. I realized sadly that bobcats are usually out at morning/evening, when we are not in the parks. Still used the bobcat stalking as an excuse to take a walk in Tilden today though.

    What's your approach to finding the bobcat locations for your shot?

  10. I like this point, for people hiring DevRel:

    "Look in your community. Find users of your product or users of your competitor’s product. "

    I'm a current DevRel-er myself, and someone recently reached out looking to fill a DevRel role. I told them that I wouldn't actually be a good fit for their product (a CLI tool, and I'm not as die-hard of a CLI user as other devs), and suggested they look within their current user community. That's not always possible, especially for new products, but if a tool is sufficiently used, it's really nice to bring in someone who's genuinely used and loved the product before starting the role.

    My hiring history:

    * Google Maps DevRel, 2006-2011: I first used Google Maps in my "summer of mashups", just making all kinds of maps, and even used it in a college research project. By the time I started the role, I knew the API quite well. Still had lots to learn in the GIS space, as I was coming from web dev, but at least I had a lot of project-based knowledge to build on.

    * Microsoft, 2023-present: My experience was with VS Code and GitHub, two products that I used extensively for software dev. Admittedly, I'd never used Azure (only Google App Engine and AWS) so I had to train up on that rapidly. My experience with the other clouds has helped me with this MS cloud fortunately.

  11. It was fun! Now we still see Wave-iness in other products: Google Docs uses the Operational Transforms (OT) algorithm for collab editing (or at least it did, last I knew), and non-Google products like Notion, Quip, Slack, Loop from Microsoft, all have some overlap.

    We struggled with having too many audiences for Wave - were we targeting consumer or enterprise? email or docs replacement? Too much at once.

    The APIs were so dang fun though.

  12. Hm, I didn't work on the frontend but I don't particularly remember griping..GWT had been around for ~5 years at that point, so it wasn't super new: https://en.wikipedia.org/wiki/Google_Web_Toolkit

    I always personally found it a bit odd, as I preferred straight JS myself, but large companies have to pick some sort of framework for websites, and Google already used Java a fair bit.

  13. I was on the Wave team! Our servers didn't have enough capacity, we launched too soon. I was managing the developer-facing server for API testing, and I had to slowly let developers in to avoid overwhelming it.
  14. How do you determine if the tools access private data? Is it based solely on their tool description (which can be faked) or by trying them in a sandboxed environment or by analyzing the code?
  15. I am giving it a go for parenting advice- “My 5 year old is suddenly very germ concious. Doesnt want to touch things, always washing hands. Do deep research, is this normal?” https://chatgpt.com/share/68be1dbd-187c-8012-98d7-83f710b12b...

    The results look reasonable? It’s a good start, given how long it takes to hear back from our doctor on questions like this.

  16. Both humans and coding agents have their strengths and weaknesses, but I've been appreciating help from coding agents, especially with languages or frameworks where I have less expertise, and the agent has more "knowledge", either in its weights or in its ability to more quickly ingest documentation.

    One weakness of coding agents is that sometimes all it sees are the codes, and not the outputs. That's why I've been working on agent instructions/tools/MCP servers that empower it with all the same access that I have. For example, this is a custom chat mode for GitHub Copilot in VS Code: https://raw.githubusercontent.com/Azure-Samples/azure-search...

    I give it access to run code, run tests and see the output, run the local server and see the output, and use the Playwright MCP tools on that local server. That gives the agent almost every ability that I have - the only tool that it lacks is the breakpoint debugger, as that is not yet exposed to Copilot. I'm hoping it will be in the future, as it would be very interesting to see how an agent would step through and inspect variables.

    I've had a lot more success when I actively customize the agent's environment, and then I can collaborate more easily with it.

  17. When you describe subagents, are those single-tool agents, or are they multi-tool agents with their own ability to reflect and iterate? (i.e. how many actual LLM calls does a subagent make?)
  18. I ran bulk evaluations on a RAG scenario and wrote-up the results - discovered interesting differences (gpt-5 loves lists, smart quotes, and admitting it doesn't know).
  19. I just ran evaluations of gpt-5 for our RAG scenario and was pleasantly surprised at how often it admitted “ I don’t know” - more than any model I’ve eval’d before. Our prompt does tell it to say it doesnt know if context is missing, so that likely helped, but this is the first model to really adhere to that.
  20. We use text-embedding-3-large, with both quantization and MRL reduction, plus oversampling on the search to compensate for the compression.
  21. I am testing out gpt-5-mini for a RAG scenario, and I'm impressed so far.

    I used gpt-5-mini with reasoning_effort="minimal", and that model finally resisted a hallucination that every other model generated.

    Screenshot in post here: https://bsky.app/profile/pamelafox.bsky.social/post/3lvtdyvb...

    I'll run formal evaluations next.

  22. I would argue that we're passing on 5% of life to the machines, not 100%. By the time bedtime has rolled around, my kids have been home for 5 hours - we have already spent hours reading, playing, parkour'ing, role-playing, painting, inventing, slime'ing, etc. We do manage to often tell a story ourselves (last night, we made the kids tell it!), but I am not going to judge a parent (or myself) for deciding to delegate a fraction of creative energy to a machine.

    I was 100% against screens when first having a kid, but now I'm content with kids getting a spectrum of entertainment styles, and for parents to get a break every so often.

  23. Gemini wrote that whole story with a short prompt about a "King Dragon that farts". I assure you that our actual improv'd story is far superior in plot points.

    And yes, I was confused too as to how farting would clear away fog.

  24. Lol, yes, the dragon's torso turned into a man. That man does show up earlier in the story - I think perhaps the model so closely associates dragon stories with stories of men, it just desperately wanted to add one in? The text itself never actually mentions the man/dragon/torso.

    If Gemini added a reflection step to its book drawing routine, I think the model could easily notice the errors, and generate images to correct them - the errors do not seem unsurmountable.

    Given that, I'm assuming Amazon is or will soon be filled with decently illustrated somewhat amusing stories.

  25. Lol, I just tried to get it to draw the story about King Dragon farting, but it could not come up with a picture of a dragon farting - it turned it into fire coming from its mouth instead! It's too far outside its training data.

    Link: https://g.co/gemini/share/188609ce3e1f

  26. I think it'd be amazing if I had the energy to make up improv bedtime stories every night. (We have a "King Dragon" improv series happening lately, which involves a lot of farts)

    BUT, I don't always have that energy, and I already spend hours a day reading stories to my kids, so I am okay with them spending some fraction of time hearing stories from robots/screens/etc. (Lately, it's "Hey Google, tell a story" if mommy is too busy to read)

    I hope we never stop paying amazing children's book illustrators though! I have so many books where I marvel at each page and the ingenuity of the illustrative style.

  27. Yep, 20B model, via Ollama: ollama run gpt-oss:20b

    Screenshot here with Ollama running and asitop in other terminal:

    https://bsky.app/profile/pamelafox.bsky.social/post/3lvobol3...

  28. I ran it via Ollama, which I assume uses the best way. Screenshot in my post here: https://bsky.app/profile/pamelafox.bsky.social/post/3lvobol3...

    I'm still wondering why my MPU usage was so low.. maybe Ollama isn't optimized for running it yet?

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal