4 분 소요


Series Introduction

  1. Part 1: OpenTelemetry Instrumentation
  2. Part 2: Distributed Tracing Across Microservices (Current)
  3. Part 3: Structured Logging with Correlation IDs
  4. Part 4: Metrics and Alerting with Prometheus/Grafana
  5. Part 5: Debugging Production Issues with Observability Data

What is Distributed Tracing?

Distributed tracing visualizes the entire path of a request as it passes through multiple services.

User Request
    │
    ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ API Gateway │────▶│Order Service│────▶│Payment Svc  │
│   Span A    │     │   Span B    │     │   Span C    │
└─────────────┘     └──────┬──────┘     └─────────────┘
                          │
                          ▼
                   ┌─────────────┐
                   │Inventory Svc│
                   │   Span D    │
                   └─────────────┘

Trace Context Structure

W3C Trace Context Standard

traceparent: 00-{trace-id}-{span-id}-{trace-flags}
tracestate: vendor1=value1,vendor2=value2

Example:

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE
  • trace-id: 32-character hex identifying the entire trace
  • span-id: 16-character hex identifying the current span
  • trace-flags: 01 = sampled

Practical Distributed Tracing Implementation

Multi-Service Architecture

# docker-compose.yml
version: '3.8'
services:
  api-gateway:
    build: ./api-gateway
    ports:
      - "8080:8080"
    environment:
      - OTEL_SERVICE_NAME=api-gateway
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317

  order-service:
    build: ./order-service
    ports:
      - "8081:8081"
    environment:
      - OTEL_SERVICE_NAME=order-service
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317

  payment-service:
    build: ./payment-service
    ports:
      - "8082:8082"
    environment:
      - OTEL_SERVICE_NAME=payment-service
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317

  inventory-service:
    build: ./inventory-service
    ports:
      - "8083:8083"
    environment:
      - OTEL_SERVICE_NAME=inventory-service
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317

  jaeger:
    image: jaegertracing/all-in-one:1.53
    ports:
      - "16686:16686"
      - "4317:4317"
    environment:
      - COLLECTOR_OTLP_ENABLED=true

API Gateway

@RestController
@RequestMapping("/api")
class GatewayController(
    private val orderServiceClient: OrderServiceClient,
    private val tracer: Tracer
) {
    @PostMapping("/orders")
    fun createOrder(@RequestBody request: CreateOrderRequest): ResponseEntity<OrderResponse> {
        val span = tracer.spanBuilder("gateway.createOrder")
            .setSpanKind(SpanKind.SERVER)
            .setAttribute("http.method", "POST")
            .setAttribute("http.route", "/api/orders")
            .startSpan()

        return try {
            span.makeCurrent().use {
                val order = orderServiceClient.createOrder(request)
                span.setAttribute("order.id", order.id)
                ResponseEntity.created(URI.create("/api/orders/${order.id}")).body(order)
            }
        } catch (e: Exception) {
            span.recordException(e)
            span.setStatus(StatusCode.ERROR)
            throw e
        } finally {
            span.end()
        }
    }
}

Order Service Client (Context Propagation)

@Component
class OrderServiceClient(
    private val webClient: WebClient,
    private val openTelemetry: OpenTelemetry
) {
    fun createOrder(request: CreateOrderRequest): OrderResponse {
        return webClient.post()
            .uri("/orders")
            .bodyValue(request)
            .headers { headers ->
                // Inject Trace Context
                openTelemetry.propagators.textMapPropagator.inject(
                    Context.current(),
                    headers
                ) { carrier, key, value ->
                    carrier?.set(key, value)
                }
            }
            .retrieve()
            .bodyToMono(OrderResponse::class.java)
            .block()!!
    }
}

Order Service

@RestController
@RequestMapping("/orders")
class OrderController(
    private val orderService: OrderService,
    private val tracer: Tracer,
    private val openTelemetry: OpenTelemetry
) {
    @PostMapping
    fun createOrder(
        @RequestBody request: CreateOrderRequest,
        @RequestHeader headers: HttpHeaders
    ): ResponseEntity<OrderResponse> {
        // Extract parent Context
        val parentContext = openTelemetry.propagators.textMapPropagator.extract(
            Context.current(),
            headers
        ) { carrier, key -> carrier?.getFirst(key) }

        val span = tracer.spanBuilder("order.create")
            .setParent(parentContext)
            .setSpanKind(SpanKind.SERVER)
            .startSpan()

        return try {
            span.makeCurrent().use {
                val order = orderService.createOrder(request)
                ResponseEntity.ok(OrderResponse(order))
            }
        } finally {
            span.end()
        }
    }
}

@Service
class OrderService(
    private val orderRepository: OrderRepository,
    private val paymentClient: PaymentClient,
    private val inventoryClient: InventoryClient,
    private val tracer: Tracer
) {
    @Transactional
    fun createOrder(request: CreateOrderRequest): Order {
        // Check inventory
        val inventorySpan = tracer.spanBuilder("order.checkInventory")
            .setSpanKind(SpanKind.CLIENT)
            .startSpan()

        try {
            inventorySpan.makeCurrent().use {
                inventoryClient.checkAndReserve(request.items)
            }
        } finally {
            inventorySpan.end()
        }

        // Save order
        val saveSpan = tracer.spanBuilder("order.save")
            .setAttribute("db.system", "postgresql")
            .startSpan()

        val order = try {
            saveSpan.makeCurrent().use {
                orderRepository.save(Order.create(request))
            }
        } finally {
            saveSpan.end()
        }

        // Process payment
        val paymentSpan = tracer.spanBuilder("order.processPayment")
            .setSpanKind(SpanKind.CLIENT)
            .startSpan()

        try {
            paymentSpan.makeCurrent().use {
                paymentClient.charge(order.customerId, order.totalAmount)
            }
        } finally {
            paymentSpan.end()
        }

        return order
    }
}

Span Hierarchy

Parent-Child Relationships

Trace: abc123
│
├── Span A: gateway.createOrder (Root Span)
│   │
│   └── Span B: order.create (Child of A)
│       │
│       ├── Span C: order.checkInventory (Child of B)
│       │   │
│       │   └── Span E: inventory.reserve (Child of C)
│       │
│       ├── Span D: order.save (Child of B)
│       │
│       └── Span F: order.processPayment (Child of B)
│           │
│           └── Span G: payment.charge (Child of F)
@Service
class BatchOrderProcessor(
    private val tracer: Tracer
) {
    fun processBatch(orders: List<Order>) {
        val batchSpan = tracer.spanBuilder("batch.process")
            .startSpan()

        try {
            batchSpan.makeCurrent().use {
                orders.parallelStream().forEach { order ->
                    val orderSpan = tracer.spanBuilder("batch.processOrder")
                        .addLink(batchSpan.spanContext)  // Connect with link
                        .setAttribute("order.id", order.id)
                        .startSpan()

                    try {
                        orderSpan.makeCurrent().use {
                            processOrder(order)
                        }
                    } finally {
                        orderSpan.end()
                    }
                }
            }
        } finally {
            batchSpan.end()
        }
    }
}

Sampling Strategies

Head-based Sampling

Sampling decision made at request start:

@Configuration
class SamplingConfig {

    @Bean
    fun sdkTracerProvider(): SdkTracerProvider {
        return SdkTracerProvider.builder()
            .setSampler(
                Sampler.parentBased(
                    Sampler.traceIdRatioBased(0.1)  // 10% sampling
                )
            )
            .build()
    }
}

Tail-based Sampling (OTel Collector)

Sampling decision made after request completion:

# otel-collector-config.yaml
processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      - name: errors-policy
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: slow-traces-policy
        type: latency
        latency:
          threshold_ms: 1000
      - name: probabilistic-policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling, batch]
      exporters: [otlp/jaeger]

Using Jaeger UI

service=order-service operation=order.create minDuration=100ms

Service Dependency Graph

You can visualize service dependencies through the System Architecture tab in Jaeger UI.

Performance Analysis

  • Critical Path analysis
  • Time comparison between Spans
  • Bottleneck identification

Span Attributes Best Practices

Semantic Conventions

// HTTP related
span.setAttribute(SemanticAttributes.HTTP_METHOD, "POST")
span.setAttribute(SemanticAttributes.HTTP_URL, "/api/orders")
span.setAttribute(SemanticAttributes.HTTP_STATUS_CODE, 200)

// Database related
span.setAttribute(SemanticAttributes.DB_SYSTEM, "postgresql")
span.setAttribute(SemanticAttributes.DB_OPERATION, "SELECT")
span.setAttribute(SemanticAttributes.DB_STATEMENT, "SELECT * FROM orders WHERE id = ?")

// Messaging related
span.setAttribute(SemanticAttributes.MESSAGING_SYSTEM, "kafka")
span.setAttribute(SemanticAttributes.MESSAGING_DESTINATION, "order-events")
span.setAttribute(SemanticAttributes.MESSAGING_OPERATION, "publish")

Custom Attributes

// Business context
span.setAttribute("order.id", orderId)
span.setAttribute("customer.tier", "premium")
span.setAttribute("order.item_count", items.size.toLong())
span.setAttribute("order.total_amount", totalAmount.toDouble())

Error Tracking

try {
    processOrder(order)
} catch (e: PaymentException) {
    span.setStatus(StatusCode.ERROR, "Payment processing failed")
    span.recordException(e, Attributes.builder()
        .put("exception.escaped", false)
        .put("payment.error_code", e.errorCode)
        .build()
    )
    throw e
}

Summary

Key aspects of distributed tracing:

Item Description
Trace Context W3C standard for context propagation between services
Span Hierarchy Parent-child relationships express request flow
Sampling Head/Tail based for cost optimization
Attributes Follow Semantic Conventions

In the next post, we’ll cover structured logging and Correlation IDs.

댓글남기기