State-of-the-art performance, unprecedented secrecy
Research PaperIntroduced GPT-4, a large multimodal model accepting text and image inputs, achieving human-level performance on many professional exams (bar exam 90th percentile) while revealing almost nothing about its architecture, training data, or size.
GPT-4 accepted both text and image inputs (output was text only). This was OpenAI's first production multimodal model.
GPT-4 scored in the 90th percentile on the bar exam (vs. GPT-3.5's 10th percentile), passed the USMLE, and achieved strong scores on the GRE, SATs, and AP exams.
The "Technical Report" contained no information about: model size, architecture details, training data, compute used, hardware, or training methodology. OpenAI cited competitive pressure and safety concerns.
The paper detailed extensive red-teaming and safety mitigations, including working with external organizations (e.g., ARC for evaluating autonomous replication).