Skip to content

High Availability with Two Data Centers

IBM API Connect: Architecture

Ravi Ramnarayan
© IBM v1.81 2023-10-19

Goals

Prologue

The data centers could be on premises or vendor sites. Ideally, the data centers should be in different locations with separate lines for power and communication.

Tabled for a later discussion
Modern cloud vendors can provide a variation which might be less expensive. APIC subsystems comprise three pods in Production deployments. Within a single OCP cluster, each subsystem pod could run in different regions (vendor nomenclature vary), providing a quorum of two active pods even if a region failed.

APIC Deployment Architecture

We compare two APIC deployment architectures using the metrics Recovery Time Objective (RTO) and Recovery Point Objective (RPO). Please see Planning your deployment topology.

Recovery Point Objective (RPO)
The RPO measures how far back in time the recovery point is, and therefore how much new data is lost. An RPO of zero would assert that no data is lost, but such a solution is often a compromise against the cost and performance of the system.

1. Two Data Center Deployment (2DCDR)

APIC-2DCDR

Pro

  • High Availability (HA) for API traffic
    With adequate capacity, the DataPower cluster in a single DC could hand the full load if the other DC fails.
  • RPO for API Products
    Near zero RPO for published API Products.
  • RPO for Consumer Subscriptions
    Near zero RPO for Consumer Subscriptions. If the business requires near zero RPO, the Two data center deployment strategy on Kubernetes and OpenShift is the appropriate choice.
  • RTO Failover
    If the active APIC cluster fails, it is possible to activate APIC on the other cluster in a short period. APIC functions cannot be performed until the completion of RTO Failover.
    • Consumer Organizations cannot initiate new Subscriptions
    • Provider Organizations cannot publish or update API Products

Contra

  • Maintenance
    Typically, APIC upgrades occur two to four times a year. Each upgrade bears the burden of increased time and effort.
  • RTO to restore 2DCDR requires Disaster Recovery (DR)
    The goal of DR is to restore 2DCDR, which is different from RTO Failover. DR processes need to be aware of the active APIC site, adding complexity.
  • Costs
    Software license and operational costs might be higher than the other architectures. Please verify specifics for your installations.

2. APIC with DataPower HA in DC2

APIC-2DC-HA-Gwy

Pro

  • High Availability (HA) for API traffic
    Same HA level as the 2DCDR architecture.
  • RPO for API Products
    It is possible to achieve near zero RPO tolerance with a backup soon after publishing products.
  • Maintenance
    • Upgrade of APIC & DataPower in DC1 using the top level APIConnectCluster Custom Resource (CR) requires less effort than upgrading APIC subsystems of the 2DCDR deployment
    • DataPower subsystem in DC2 can be upgraded in a separate change window
      Starting with APIC v10.0.5.x, all fixpacks (fourth position) of APIC will be compatible with DataPower v10.5.0.x. The flexibility allows a relaxed schedule to upgrade DC2.
  • Disaster Recovery (DR)
    The process is less complex than DR for 2DCDR and should to be quicker to complete.
  • Costs
    Software license and operational costs might be lower than for 2DCDR. Please verify specifics for your installations.

Contra

  • RPO for Consumer Subscriptions depends on the frequency of backups
    It is possible to capture backups frequently, but RPO will not be comparable to 2DCDR's near zero tolerance for Consumer Subscriptions.
  • RTO depends on DR
    The recovery time can be reduced with automation and DR drills. However, it is likely to take longer than the RTO Failover enabled by 2DCDR. APIC functions cannot be performed until DR restores normal service.
    • Consumer Organizations cannot initiate new Subscriptions
    • Provider Organizations cannot publish or update API Products

Deploy APIC with DataPower HA in DC2

Note: The steps below assume a connected OCP. Adjust as needed for air-gapped OCP.

Install APIC in DC1

Start with Installing API Connect and follow the trail to Installing with the top-level CR on OpenShift.

Install DataPower in DC2

Installing with subsystem CRs in different namespaces or environments provides the overview. We will install the DataPower subsystem in DC2.

  • OCP in DC2 needs preparations such as pull-secret and the IBM Operator Catalog source.
  • Install the IBM API Connect operator in apigw2 namespace in DC2
    This is the same operator used to install APIC in DC1.
  • The versions of the operators in DC1 & DC2 must be the same
  • Make sure the DC2 cluster has the same certificates as APIC in DC1
    See Common Issuers & Gateway Secrets.

Note: DataPower in DC2 could be on OCP or Kubernetes (K8s). This document details steps for DataPower on OCP. VMware and Linux are feasible platforms, but DataPower configuration and maintenance operations are more intense than OCP/K8s.

Namespace & Operators

  • Create a namespace, for example, apigw2

    Note: Follow your naming conventions. apigw2 is just an example.

  • Install the IBM API Connect operator in apigw2
    • Subscribe to the same channel as the parent APIC in DC1

Common Issuers & Gateway Secrets

Follow steps in Installing the Gateway subsystem section Before you begin.

  • Clone ingress-ca from the primary OCP in DC1
    Installing the Gateway subsystem section Before you begin Step 1. Detailed instructions are in Extracting the Management ingress-ca certificates.

    • Apply the extracted ingress-ca to the DC2 Project apigw2
      oc apply -f <your ingress-ca extract.yaml> -n apigw2
    • Get the complete name of ingress-ca
      oc get secret | grep ingress-ca    
      apis-minimum-ingress-ca            kubernetes.io/tls         3      16m    
      

      Note: The secret name in your installation will be different.

  • Define Common issuers and Gateway secrets
    Obtain YAML from Installing the Gateway subsystem section Before you begin Step 2.

    • Edit the YAML and set the Issuer spec.ca.secretName to value above
      See example 230-common-issuer-and-gateway-certs.yaml which sets spec.ca.secretName to apis-minimum-ingress-ca.

      apiVersion: cert-manager.io/v1
      kind: Issuer
      metadata:
        name: ingress-issuer
        labels: {
          app.kubernetes.io/instance: "management",
          app.kubernetes.io/managed-by: "ibm-apiconnect",
          app.kubernetes.io/name: "ingress-issuer"
        }
      spec:
        ca:
          secretName: apis-minimum-ingress-ca
      

    • Apply your YAML to Project apigw2 in DC2

      oc apply -f <your-common-issuer-and-gateway-certs.yaml> -n apigw2  
      

    • Confirm issuers were created and ready for use

      oc get issuers -n apigw2    
      NAME                 READY   AGE    
      ingress-issuer       True    13m    
      selfsigning-issuer   True    13m    
      

    • Confirm the creation of gateway secrets

      oc get secrets -n apigw2 | grep gateway    
      gateway-peering                    kubernetes.io/tls         3      24s  
      gateway-service                    kubernetes.io/tls         3      24s  
      

      Note: "3" in the third column indicates three components in the TLS (good).

Deploy DataPower Gateway in DC2

  • Create Gateway admin secret
    You could use the same password as in the DC1 OCP or assign a different value. Create the secret in DC2 namespace apigw2:

    oc -n apigw2 create secret generic <gw_admin_secret_name> \
      --from-literal=password=<gw_admin_pswd>  
    

  • Define the APIGW Gateway
    Copy the YAML from Installing the Gateway subsystem section Procedure Step 2. Use values appropriate to your installation. Recommend following naming conventions in DC1.

    • $ fields
    • metadata.name (optional, default value is okay)
    • metadata.labels.app.kubernetes.io/name (optional, default value is okay)
    • spec.adminUser.secretName should be <gw_admin_secret_name>, which you created in the previous step

See Example Values in 250-apigateway_cr.yaml.

Note: Example values are NOT shell script substitutions. You should edit the file manually.

oc apply -f <your-apigateway_cr.yaml> -n apigw2`  
It takes about five minutes to create the GatewayCluster.

  • Is the gateway running?
    oc get GatewayCluster -n apigw2

  • Is the Gateway Management endpoint active?
    curl -k https://<spec.gatewayManagerEndpoint.hosts.name>/health
    should return:
    {"status":"ok"}

  • Register DC2 Gateway Service in the parent DC1 APIC
    Obtain URL for endpoints from OCP Routes in project apigw2.

Deploy Analytics in DC2

Note: Optional. Analytics is not needed in DC2, unless the API traffic is extremely high. You could install Analytics at a later date, if needed.

Deploy the Analytics subsystem in DC2 within the same namespace as the DataPower Gateway. The steps in Installing the Analytics subsystem are similar to Installing the Gateway subsystem.

  • Common Issuers
    Nothing to do. Been there, done that in Common Issuers & Gateway Secrets. Skip over section Before you begin in Installing the Analytics subsystem.

  • Obtain YAML from section Procedure Step 1 of Installing the Analytics subsystem.

  • Ensure clientSubjectDN in DC2 Analytics matches the Subject common name of DC1 APIC CMC Analytics ingestion keystore.

    • In APIC 10.0.5.1, the Analytics ingestion keystore Subject common name is a7s-ing-client
    • Modify spec.ingestion.clientSubjectDN to match
      The sample file 280-analytics_cr.yaml contains the correction.
  • Create the Analytics subsystem in DC2
    oc apply -f <your-analytics_cr.yaml> -n apigw2

  • Is the Analytics subsystem running?
    oc get AnalyticsCluster -n apigw2

  • Register DC2 Analytics in the parent DC1 APIC

  • Associate DC2 Analtyics with the DC2 DataPower Service

Troubleshoot

If you encounter a 403 Forbidden error while registering the Analytics Service, the cause is likely to be a mismatch of clientSubjectDN.

  • Find the pod named mtls-gw

    $ oc get pod -n apigw2 | grep mtls
    analytics-mtls-gw-6bbc4cbfcc-gchhc                       1/1     Running     0             17m
    

  • Do the logs show mismatched clientSubjectDN?

      $ oc logs analytics-mtls-gw-6bbc4cbfcc-gchhc | grep 403 -A2 -B1  
    

      10.254.20.1 - - [05/Dec/2022:23:08:36 +0000] "GET /_healthz HTTP/1.1" 200 2 "-" "kube-probe/1.23"  
      2022/12/05 23:08:36 [warn] 12#12: *2106 [lua] access_by_lua(nginx.conf:56):7: rejected request because certificate subject 'CN=a7s-ing-client' did not match expected 'CN=a7s-ing-client,O=cert-manager', client: 10.254.12.1, server: ai.apps.fainted.myco.com, request: "POST /analytics-service-registration-create?admin=true&management=true HTTP/1.1", host: "ai.apps.fainted.myco.com"  
      10.254.12.1 - - [05/Dec/2022:23:08:36 +0000] "POST /analytics-service-registration-create?admin=true&management=true HTTP/1.1" 403 159 "-" "axios/0.26.1"  
      10.254.20.1 - - [05/Dec/2022:23:08:46 +0000] "GET /_healthz HTTP/1.1" 200 2 "-" "kube-probe/1.23"  
    
  • Fix the problem
    Modify the clientSubjectDN in the DC2 AnalyticsCluster YAML.

References