Problem
A microservice application running in Kubernetes with Istio service mesh encountered a connectivity issue between services. Specifically:
- Requests to existing resources (/resources/1) failed
- Requests to non-existing resources (/resources/133) worked correctly
- Error message: “upstream connect error or disconnect/reset before headers. reset reason: connection termination”
Solution
After thorough troubleshooting, the root cause was identified in the gateway application’s code:
- The gateway application was directly returning the ResponseEntity from an external API call without modification:
@GetMapping("/some/path")
public ResponseEntity<Response> getResponse(@RequestParam(name = "id", required = false) Integer id) {
return externalApiClient.get(id); // Directly returning ResponseEntity from Feign client
}
- This approach corrupted the response headers expected by Istio-proxy (Envoy).
- The solution involved wrapping the response body with a newly created Response object:
@GetMapping("/some/path")
public ResponseEntity<Response> getResponse(@RequestParam(name = "id", required = false) Integer id) {
return ResponseEntity.ok(externalApiClient.get(id).getBody()); // Creating a fresh ResponseEntity
}
Key points:
- Directly proxying external request responses can cause issues with service mesh components
- Wrapping external API responses in custom Response objects resolves header corruption issues
- Thorough network sniffing and code inspection were crucial in identifying the problem
Best Practices
- Do not directly proxy responses of external requests to the caller
- Wrap external API responses with custom response objects
- Ensure observability in microservice applications
- Be aware that similar issues can occur in other languages beyond Java
This solution resolved the upstream connect error and ensured proper communication between services in the Istio service mesh environment.