Preferences

Rutledge
Joined 176 karma

  1. Scorecard | Founding Engineer, Founding UX Designer, Founding GTM | SF, CA ONSITE | Full-time

    Scorecard is building the leading platform for testing, evaluating, and monitoring AI applications. We help teams ship reliable AI products faster—from prototype to production. Our customers include developers and enterprises building with LLMs who need confidence their AI agents perform as expected.

    We recently raised $3.75M in seed funding from Kindred Ventures, Neo, and angels from OpenAI, Google, and Meta: https://www.businessinsider.com/scorecard-raises-millions-ki...

    See how we're helping enterprises like Thomson Reuters ensure their AI agents are production-ready: https://www.thomsonreuters.com/en-us/posts/innovation/from-t...

    Tech Stack: Full TypeScript w/Next.js, Express, React, PostgreSQL, and agents like Claude Code/Gemini review.

    We're an early-stage, fast-growing team tackling the most pressing problems in the AI reliability space. If you're excited about being a founding team member at a company defining how the industry evaluates and optimizes AI systems, we'd love to hear from you.

    Open Roles:

    - Founding Software Engineer: Build the core platform that helps developers test and evaluate AI agents at scale - Founding UX Designer: Design intuitive experiences that make complex AI evaluation accessible to all developers - Founding GTM: Help define and execute our go-to-market strategy as we scale with customers

    Learn more and apply: jobs@scorecard.io w/ subject 'HN'

  2. I call them 'CLI agents'!
  3. Here's the image from Wayback: https://web.archive.org/web/20250625051706/https://blog.goog...

    The biggest diffs from Claude code (the current champion): 1. Generous free tier (60 RPM!) 2. Open Source Apache (Standard after OAI Codex did the same)

  4. Hi HN- we're excited to launch the first remote MCP server for claude.ai and cursor for LLM evaluation. Would love your thoughts and feedback :)
  5. This initiative is designed to be community-driven, so we're looking forward to your feedback on what agent benchmarking needs exist in your domains. While starting with legal AI, we plan to expand across industries where benchmarks for AI agents evaluation are needed.
  6. Yes quite helpful- thanks for explaining and will try it out!
  7. The concurrent request handling seems great for our AI eval workloads, where we're waiting for LLM API calls and DB operations but curious how Vercel handles potential noisy neighbor issues when one request consumes excessive CPU/memory?

    Disclosure: CEO of Scorecard- AI eval platform, current Vercel customer. Intrigued since most of our time serverless time is spent waiting for model responses, but cautious about 'magic' solutions.

  8. This is great :) and pretty impressive that it was possible in coda!
  9. New chapter in the AI arms race
  10. +1 on data labeling platform: https://web.archive.org/web/20230403164757/https://feather.o...

    It's been around and used since 2022. It's an site for SME to write code data: https://www.semafor.com/article/01/27/2023/openai-has-hired-...

  11. ChatGPT now learns about users with a RAG system. This is the first step towards an OpenAI assistant: https://help.openai.com/en/articles/8590148-memory-in-chatgp...

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal