Andrej Karpathy가 말하는 AI 코딩의 현실: Claude와 함께한 몇 주간의 경험 Andrej Karpathy on AI-Assisted Coding: Lessons from Weeks of Using Claude

7 분 소요

Andrej Karpathy의 Claude 코딩 경험이 보여주는 것: 에이전트 코딩의 현실

Andrej Karpathy가 2024년 11월에는 80% 수동 코딩 + 자동완성, 20% 에이전트 코딩이었다가 12월에는 이 비율이 완전히 역전됐다고 밝혔다. Tesla AI 총괄, OpenAI 창립멤버 출신의 이 사람이 “Claude로 코딩하고 있다”고 공개적으로 말한다는 건 단순한 트렌드가 아니다. 우리가 코드를 작성하는 방식 자체가 근본적으로 바뀌고 있다는 신호다.

에이전트 코딩이란 무엇인가

전통적인 AI 코딩 어시스턴트와 에이전트 코딩의 차이는 명확하다. Copilot 같은 자동완성 도구는 당신이 타이핑하는 다음 몇 줄을 예측한다. 에이전트 코딩은 다르다. 목표를 주면 AI가 스스로 파일을 탐색하고, 코드를 작성하고, 테스트를 실행하고, 에러를 수정한다.

# 전통적인 자동완성: 한 줄씩 제안
def calculate_total(items):
    return sum(item.price for item in items)  # AI가 이 줄 제안

# 에이전트 코딩: 전체 기능 구현을 위임
# "장바구니 할인 시스템 만들어줘" → AI가 알아서:
# 1. 기존 코드베이스 분석
# 2. 할인 로직 설계
# 3. 테스트 코드 작성
# 4. 에러 수정까지 자동 수행

Karpathy가 말하는 80% 에이전트 코딩은 이런 의미다. 코드 한 줄 한 줄 직접 쓰는 게 아니라, 무엇을 만들고 싶은지 설명하고 AI가 구현하는 걸 감독한다.

Claude Code의 작동 방식

Claude Code(이전 Claude CLI)는 터미널에서 실행되는 에이전트형 코딩 도구다. 프로젝트 디렉토리를 컨텍스트로 읽고, 파일을 생성/수정하며, 쉘 명령어도 실행한다.

# Claude Code 실행 예시
$ claude

> 이 프로젝트의 인증 시스템에 2FA 추가해줘

Claude: 프로젝트 구조를 분석하겠습니다...
- src/auth/ 디렉토리에 기존 인증 로직 발견
- JWT 기반 인증 사용 중
- 2FA를 위해 TOTP 라이브러리 추가 권장

[파일 생성] src/auth/totp.py
[파일 수정] src/auth/login.py
[실행] pip install pyotp qrcode

핵심은 컨텍스트다. Claude는 전체 프로젝트를 이해한 상태에서 작업한다. 단순히 “2FA 코드 생성해줘”가 아니라 기존 코드베이스에 통합되는 형태로 결과물을 낸다.

실제 워크플로우 변화

Karpathy의 비율 역전은 실제로 어떤 의미일까? 내 경험을 기반으로 재구성해보면:

2024년 11월 (80% 수동):

기능 설계 (수동)
코드 작성 (수동 + Copilot 자동완성)
디버깅 (수동)
테스트 작성 (수동)
리팩토링 (수동)

2024년 12월 (80% 에이전트):

기능 설계 (수동 - 자연어로)
코드 작성 (Claude에게 위임)
코드 리뷰 (수동 - 중요!)
미세 조정 (수동 또는 Claude에게 재요청)
테스트 (Claude가 작성, 본인이 검증)

여기서 중요한 건 “코드 리뷰”가 새로운 핵심 스킬이 됐다는 점이다. 직접 코드를 짜는 시간은 줄었지만, AI가 생성한 코드가 올바른지 판단하는 능력이 더 중요해졌다.

에이전트 코딩의 실제 한계점

솔직히 말하면, 에이전트 코딩이 만능은 아니다. Karpathy 같은 전문가도 여전히 20%는 수동으로 작업한다는 게 그 증거다.

# 에이전트가 잘하는 것
- 보일러플레이트 코드 생성
- 표준 패턴 구현 (CRUD, REST API)
- 테스트 케이스 작성
- 문서화
- 리팩토링

# 에이전트가 어려워하는 것
- 복잡한 비즈니스 로직 (도메인 지식 필요)
- 성능 크리티컬한 최적화
- 레거시 시스템과의 복잡한 통합
- 보안에 민감한 코드 (검증 필수)

특히 프로덕션 코드에서는 AI 생성 코드를 그대로 쓰면 안 된다. 최근 본 실제 사례: Claude가 생성한 인증 코드가 로직은 맞았지만, timing attack에 취약한 문자열 비교를 사용했다.

# Claude가 생성한 코드 (취약점 있음)
def verify_token(user_token, stored_token):
    return user_token == stored_token  # timing attack 가능

# 수동으로 수정한 코드
import hmac
def verify_token(user_token, stored_token):
    return hmac.compare_digest(user_token, stored_token)  # 상수 시간 비교

Best Practices: 에이전트 코딩을 제대로 활용하는 법

프롬프트 엔지니어링은 코딩 스킬이다

좋은 프롬프트와 나쁜 프롬프트의 결과물 차이는 극명하다.

# 나쁜 프롬프트
"로그인 기능 만들어줘"

# 좋은 프롬프트
"JWT 기반 로그인 API를 만들어줘.
- POST /api/auth/login 엔드포인트
- 이메일/비밀번호 검증
- bcrypt로 비밀번호 해시 비교
- 액세스 토큰(15분) + 리프레시 토큰(7일) 발급
- 기존 src/models/user.py의 User 모델 사용
- 에러 응답은 RFC 7807 형식으로"

작은 단위로 요청하고 검증하라

한 번에 전체 기능을 요청하지 마라. 모듈 단위로 나누고 각각 검증해야 한다.

컨텍스트 관리가 핵심이다

CLAUDE.md 같은 프로젝트 설명 파일을 잘 작성해두면 AI의 이해도가 급격히 올라간다.

내 솔직한 견해

Karpathy의 경험이 모든 개발자에게 적용되지는 않는다. 그는 AI 분야 최고 전문가 중 하나고, 에이전트에게 정확한 지시를 내릴 수 있는 깊은 기술적 배경이 있다.

그러나 방향성은 명확하다. 2025년에는 에이전트 코딩이 표준이 될 것이다. 지금 배우지 않으면 뒤처진다. 코드를 직접 타이핑하는 능력보다 AI가 생성한 코드를 평가하고 개선하는 능력이 시니어 개발자를 정의하게 될 것이다.

그렇다고 코딩 기초가 필요 없어지는 건 아니다. 오히려 반대다. AI가 생성한 코드가 맞는지 틀린지 판단하려면, 먼저 본인이 그 코드를 이해할 수 있어야 한다. 에이전트 코딩은 초보자를 전문가로 만들어주지 않는다. 전문가를 더 빠르게 만들어줄 뿐이다.

What Andrej Karpathy’s Claude Coding Experience Reveals: The Reality of Agentic Coding

Andrej Karpathy revealed that in November 2024, his workflow was 80% manual coding + autocomplete and 20% agentic coding, but by December, this ratio had completely reversed. When someone who led Tesla AI and co-founded OpenAI publicly says “I’m coding with Claude,” it’s not just a trend. It’s a signal that the way we write code is fundamentally changing.

What Is Agentic Coding

The difference between traditional AI coding assistants and agentic coding is clear. Autocomplete tools like Copilot predict the next few lines you’re typing. Agentic coding is different. You give it a goal, and the AI autonomously explores files, writes code, runs tests, and fixes errors.

# Traditional autocomplete: suggests line by line
def calculate_total(items):
    return sum(item.price for item in items)  # AI suggests this line

# Agentic coding: delegate entire feature implementation
# "Build a shopping cart discount system" → AI autonomously:
# 1. Analyzes existing codebase
# 2. Designs discount logic
# 3. Writes test code
# 4. Fixes errors automatically

This is what Karpathy means by 80% agentic coding. Instead of writing code line by line, you describe what you want to build and supervise the AI’s implementation.

How Claude Code Works

Claude Code (formerly Claude CLI) is an agentic coding tool that runs in the terminal. It reads your project directory as context, creates/modifies files, and executes shell commands.

# Claude Code execution example
$ claude

> Add 2FA to this project's authentication system

Claude: Analyzing project structure...
- Found existing auth logic in src/auth/ directory
- Currently using JWT-based authentication
- Recommending TOTP library for 2FA

[Creating] src/auth/totp.py
[Modifying] src/auth/login.py
[Running] pip install pyotp qrcode

The key is context. Claude works with an understanding of the entire project. It doesn’t just “generate 2FA code”—it produces results that integrate with your existing codebase.

How the Actual Workflow Changes

What does Karpathy’s ratio reversal actually mean? Reconstructing from my experience:

November 2024 (80% Manual):

Feature design (manual)
Code writing (manual + Copilot autocomplete)
Debugging (manual)
Test writing (manual)
Refactoring (manual)

December 2024 (80% Agentic):

Feature design (manual - in natural language)
Code writing (delegated to Claude)
Code review (manual - critical!)
Fine-tuning (manual or re-request to Claude)
Testing (Claude writes, you verify)

The important point here is that “code review” has become the new core skill. Time spent writing code directly has decreased, but the ability to judge whether AI-generated code is correct has become more important.

Real Limitations of Agentic Coding

Let’s be honest—agentic coding isn’t a silver bullet. The fact that even an expert like Karpathy still works 20% manually is proof of that.

# What agents do well
- Generating boilerplate code
- Implementing standard patterns (CRUD, REST API)
- Writing test cases
- Documentation
- Refactoring

# What agents struggle with
- Complex business logic (requires domain knowledge)
- Performance-critical optimization
- Complex integration with legacy systems
- Security-sensitive code (verification required)

Especially in production code, you shouldn’t use AI-generated code as-is. A real case I saw recently: Claude’s authentication code had correct logic but used string comparison vulnerable to timing attacks.

# Code generated by Claude (has vulnerability)
def verify_token(user_token, stored_token):
    return user_token == stored_token  # vulnerable to timing attack

# Manually corrected code
import hmac
def verify_token(user_token, stored_token):
    return hmac.compare_digest(user_token, stored_token)  # constant-time comparison

Best Practices: How to Properly Leverage Agentic Coding

Prompt Engineering Is a Coding Skill

The difference in output between good and bad prompts is stark.

# Bad prompt
"Create a login feature"

# Good prompt
"Create a JWT-based login API.
- POST /api/auth/login endpoint
- Email/password validation
- Compare password hash with bcrypt
- Issue access token (15min) + refresh token (7 days)
- Use existing User model from src/models/user.py
- Error responses in RFC 7807 format"

Request in Small Units and Verify

Don’t request an entire feature at once. Break it into modules and verify each one.

Context Management Is Key

Well-written project description files like CLAUDE.md dramatically improve AI understanding.

My Honest Take

Karpathy’s experience won’t apply to every developer. He’s one of the top experts in AI and has the deep technical background to give agents precise instructions.

However, the direction is clear. In 2025, agentic coding will become the standard. If you don’t learn it now, you’ll fall behind. The ability to evaluate and improve AI-generated code will define senior developers more than the ability to type code directly.

That doesn’t mean coding fundamentals become unnecessary. Quite the opposite. To judge whether AI-generated code is right or wrong, you need to understand that code first. Agentic coding doesn’t turn beginners into experts. It just makes experts faster.

Twitter Facebook LinkedIn

울이