Build AI-Generated Code Your Users Can Rely On

fault is an Open-Source Rust-powered CLI and AI-MCP server that lets engineers and AI agents team up to ship production-ready code.

Plug fault into your favorite AI-Editor

Score | Review | Change | Measure | Monitor

fault agent tool fault_score_reliability

                        
{
  "explanation": "The function has no error handling, timeout, retry or fallback logic, making it prone to upstream failures causing unhandled exceptions.",
  "score": 0.2
}

fault agent code-review

                        Here’s a small, focused refactor of your `/` handler that will immediately improve resilience, observability and error‐handling without drastically changing your app:


1. **Single, long-lived HTTP client**  
   Re-using an `httpx.Client` gives you connection pooling and lets you centrally configure timeouts.

2. **Per-request timeouts**  
   By setting `timeout=httpx.Timeout(connect=5, read=10)` we fail fast on network issues.

3. **Retries with exponential backoff**  
   The [`backoff`](https://pypi.org/project/backoff/) library wraps `_fetch_todo()` so that transient errors and 5xx status codes automatically retry (up to 5× or ~30 s total).

4. **Error propagation & HTTPExceptions**  
   We catch `RequestError` (network) vs `HTTPStatusError` (non-2xx) and translate them into appropriate 5xx/4xx FastAPI responses.

5. **Logging**  
   Warnings on upstream 4xxs and errors on network failures let you trace the failure mode in your logs.

## Next steps

- **Metrics & tracing**:  
  Plug in Prometheus counters or OpenTelemetry spans around `_fetch_todo` to observe retry counts, latencies, and success/fail rates.  
- **Circuit breaker**:  
  If your upstream is flapping, consider a circuit-breaker pattern (e.g. [`pybreaker`](https://pypi.org/project/pybreaker/)) to avoid overwhelming it.  
- **Async support**:  
  If you expect high concurrency, convert to `httpx.AsyncClient` and make `index` an `async def`.  

These minimal changes will immediately make your endpoint far more reliable under network glitches or upstream outages—yet remain straightforward to slot into your existing `app.py`.

fault agent tool fault_make_reliability_and_perf_changes

                        --- app.py
+++ app.py
@@ -1,15 +1,58 @@
 import os
+import logging
 
 import httpx
-from fastapi import FastAPI
+from fastapi import FastAPI, HTTPException
 
 UPSTREAM_URL = os.getenv("UPSTREAM_URL", "https://jsonplaceholder.typicode.com")
+REQUEST_TIMEOUT = float(os.getenv("REQUEST_TIMEOUT", 5.0))
+MAX_RETRIES = int(os.getenv("MAX_RETRIES", 3))
+BACKOFF_FACTOR = float(os.getenv("BACKOFF_FACTOR", 0.5))
+RETRY_STATUS_CODES = [502, 503, 504]
 
 app = FastAPI()
+logger = logging.getLogger("uvicorn.error")
 
+transport = httpx.RetryTransport(
+    retries=MAX_RETRIES,
+    backoff_factor=BACKOFF_FACTOR,
+    status_forcelist=RETRY_STATUS_CODES,
+    allowed_methods=["GET"],
+)
 
+@app.on_event("startup")
+async def startup_event():
+    '''
+    Initialize the HTTPX client with retry and timeout settings for resilience.
+    '''
+    app.state.client = httpx.AsyncClient(
+        base_url=UPSTREAM_URL,
+        timeout=httpx.Timeout(REQUEST_TIMEOUT),
+        transport=transport,
+        headers={"Host": "jsonplaceholder.typicode.com"},
+    )
+
+@app.on_event("shutdown")
+async def shutdown_event():
+    '''
+    Close the HTTPX client on application shutdown to release resources.
+    '''
+    await app.state.client.aclose()
+
 @app.get("/")
-def index():
-    return httpx.get(f"{UPSTREAM_URL}/todos/1", headers={
-        "Host": "jsonplaceholder.typicode.com"
-    }).json()
+async def index():
+    '''
+    Retrieves a todo item from the upstream service.
+    Implements retry logic, timeouts, and error handling for resilience.
+    '''
+    try:
+        response = await app.state.client.get("/todos/1")
+        response.raise_for_status()
+    except httpx.RequestError as exc:
+        logger.error(f"Request error: {exc}")
+        raise HTTPException(status_code=503, detail="Upstream service unavailable")
+    except httpx.HTTPStatusError as exc:
+        logger.error(f"Invalid response {exc.response.status_code}: {exc}")
+        raise HTTPException(status_code=502, detail="Upstream returned an error")
+
+    return response.json()

fault agent tool fault_suggest_service_level_objectives_slo

                        [
  {
    "type": "latency",
    "title": "P95 Latency for /todos/1 API",
    "objective": 95.0,
    "explanation": "95th percentile of request latency must stay under threshold to ensure responsiveness",
    "threshold": 0.3,
    "unit": "s",
    "window": "7d",
    "sli": {
      "prometheus": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{handler=\"/todos/1\"}[5m])) by (le)) * 1000",
      "gcp/cloudrun": {
        "displayName": "95% - P95 Latency - Calendar day",
        "goal": 0.95,
        "calendarPeriod": "DAY",
        "serviceLevelIndicator": {
          "windowsBased": {
            "windowPeriod": "300s",
            "goodTotalRatioThreshold": {
              "basicSliPerformance": {
                "latency": {
                  "threshold": "0.3s"
                }
              },
              "threshold": 0.95
            }
          }
        }
      }
    }
  },
  {
    "type": "availability",
    "title": "2xx Success Rate for /todos/1 API",
    "objective": 99.9,
    "explanation": "Percentage of requests returning HTTP 2xx status codes",
    "threshold": 99.9,
    "unit": "",
    "window": "7d",
    "sli": {
      "prometheus": "sum(rate(http_requests_total{handler=\"/todos/1\",status=~\"2..\"}[5m]))/sum(rate(http_requests_total{handler=\"/todos/1\"}[5m]))*100",
      "gcp/cloudrun": {
        "displayName": "99.9% - Availability - Calendar day",
        "goal": 0.999,
        "calendarPeriod": "DAY",
        "serviceLevelIndicator": {
          "windowsBased": {
            "windowPeriod": "300s",
            "goodTotalRatioThreshold": {
              "threshold": 0.999
            }
          }
        }
      }
    }
  },
  {
    "type": "error",
    "title": "5xx Error Rate for /todos/1 API",
    "objective": 1.0,
    "explanation": "Percentage of requests resulting in HTTP 5xx errors must remain low",
    "threshold": 1.0,
    "unit": "%",
    "window": "7d",
    "sli": {
      "prometheus": "sum(rate(http_requests_total{handler=\"/todos/1\",status=~\"5..\"}[5m]))/sum(rate(http_requests_total{handler=\"/todos/1\"}[5m]))*100",
      "gcp/cloudrun": {
        "displayName": "1% - 5xx Error Rate - Calendar day",
        "goal": 0.01,
        "calendarPeriod": "DAY",
        "serviceLevelIndicator": {
          "windowsBased": {
            "windowPeriod": "300s",
            "goodTotalRatioThreshold": {
              "threshold": 0.01
            }
          }
        }
      }
    }
  }
]

fault agent tool fault_run_latency_impact_scenario

                        # Report


## Scenario: Evaluating runtime performance of http://localhost:9090  (items: 1)

### 🎯 `GET` http://localhost:9090 | Passed

**Call**:

- Method: `GET`
- Timeout: 10000ms
- Headers: -
- Body?: No

**Strategy**: load for 10s with 1 clients @ 3 RPS

**Faults Applied**:

| Type | Timeline | Description |
|------|----------|-------------|
| latency | 0% `xxxxxxxxxx` 100% | Latency: ➡️🖧, Per Read/Write Op.: false, 
  Distribution: normal, Mean: 300.00 ms, Stddev: 0.00 ms |


**Run Overview**:

Num. Requests: 29
Num. Errors: 0 (0.0%)
Min. Response Time: 334.14
Max Response Time: 462.50
Mean Latency (ms): 339.96
Total Time: 10 seconds and 214 ms

| Latency Percentile | Latency (ms) | Num. Requests (% of total) |
|------------|--------------|-----------|
| p25 | 335.78 | 8 (27.6%) |
| p50 | 339.96 | 15 (51.7%) |
| p75 | 342.26 | 23 (79.3%) |
| p95 | 406.86 | 29 (100.0%) |
| p99 | 462.50 | 29 (100.0%) |

| SLO       | Pass? | Objective | Margin | Num. Requests Over Threshold (% of total) |
|-----------|-------|-----------|--------|--------------------------|
| 99% @ 350ms | ❌ | 99% < 350ms | Above by 112.5ms | 2 (6.9%) |
| 95% @ 200ms | ❌ | 95% < 200ms | Above by 206.9ms | 29 (100.0%) |

fault is an Open-Source Rust-powered CLI and AI-MCP server that lets engineers and AI agents team up to ship production-ready code.

Plug fault into your favorite AI-Editor

AI-Code Generators Need Guidance

We believe software engineers experience has never been more necessary.

Lack of operational context

Refactoring pain is real

Modern Features for Hybrid Engineering Teams

AI is an evolution in the engineering industry journey. Not a rupture from its past.

Ready?

Download fault and build production-grade applications now.

Frequently Asked Questions

Everything you need to know about bringing fault into your organization.

fault