We are building a benchmark dataset to evaluate AI models on professional document understanding and instruction following within the Engineering & Built Environment domain.
Tasks consist of complex, multi-step requests grounded in real-world workspace files (technical drawings, project specifications, engineering reports), web search, and code execution — each paired with a clearly defined ground truth output and an objective evaluation rubric. You will be responsible for authoring tasks that test an AI's ability to interpret engineering documentation, follow multi-step instructions, and produce precise, well-structured outputs.
We expect a minimum commitment of 15–20 hours per week.
Ideal Candidates
Ideal candidates have 3+ years of hands‐on experience in one or more of the following sub-domains:
* Mechanical engineering
* Civil engineering
* Industrial engineering
* Architecture
#J-18808-Ljbffr