Resolving DNS issues in kubernetes cluster

danielF · February 1, 2022, 11:30am

Okay. I guess I should message them at openwhisk then. But do you have any ideas on how to debug openwhisk?

danielF · February 1, 2022, 11:41am

I have this.

default       dnsutils                                   1/1     Running     0          22m
kube-system   calico-kube-controllers-85b5b5888d-4t6gc   1/1     Running     0          3h8m
kube-system   calico-node-2j65n                          1/1     Running     0          3h8m
kube-system   calico-node-7hrcm                          1/1     Running     0          3h8m
kube-system   coredns-64897985d-fp4sv                    1/1     Running     0          3h9m
kube-system   coredns-64897985d-tj48n                    1/1     Running     0          3h9m
kube-system   etcd-node                                  1/1     Running     0          3h9m
kube-system   kube-apiserver-node                        1/1     Running     0          3h9m
kube-system   kube-controller-manager-node               1/1     Running     0          3h9m
kube-system   kube-proxy-5bgk7                           1/1     Running     0          3h8m
kube-system   kube-proxy-qqxt7                           1/1     Running     0          3h9m
kube-system   kube-scheduler-node                        1/1     Running     0          3h9m
openwhisk     owdev-alarmprovider-5fd46859bd-s2qs6       0/1     Init:0/1    0          17m
openwhisk     owdev-apigateway-66486c84cd-hfk8j          1/1     Running     0          17m
openwhisk     owdev-controller-0                         0/1     Init:0/2    0          17m
openwhisk     owdev-couchdb-696d7db9bb-4bv7k             1/1     Running     0          17m
openwhisk     owdev-gen-certs-qrvst                      0/1     Completed   0          17m
openwhisk     owdev-init-couchdb-7xf8d                   0/1     Error       0          17m
openwhisk     owdev-init-couchdb-7zcdl                   0/1     Error       0          15m
openwhisk     owdev-init-couchdb-cb4hb                   0/1     Error       0          15m
openwhisk     owdev-init-couchdb-pzz89                   0/1     Error       0          14m
openwhisk     owdev-install-packages-7924t               0/1     Init:0/1    0          17m
openwhisk     owdev-invoker-0                            0/1     Init:0/1    0          17m
openwhisk     owdev-kafka-0                              0/1     Init:0/1    0          17m
openwhisk     owdev-kafkaprovider-c74cb9956-mmsks        0/1     Init:0/1    0          17m
openwhisk     owdev-nginx-7b4b46485-2k7qj                0/1     Init:0/1    0          17m
openwhisk     owdev-redis-685cd564d8-g5n6q               1/1     Running     0          17m
openwhisk     owdev-wskadmin                             1/1     Running     0          17m
openwhisk     owdev-zookeeper-0                          1/1     Running     0          17m```

danielF · February 1, 2022, 12:46pm

And this

Cloning into '/openwhisk'...
fatal: unable to access '[https://github.com/apache/openwhisk/](https://github.com/apache/openwhisk/)': Could not resolve host: [github.com](http://github.com)```

wayneW · February 1, 2022, 1:59pm

Those were couchdb init containers. However, the couchdb pod is running

wayneW · February 1, 2022, 3:43pm

What about the pods in init state. Did you try to describe the pod to identify what the issue is?

danielF · February 1, 2022, 3:59pm

Let me try. I have this from the controller.

Defaulted container "controller" out of: controller, wait-for-kafka (init), wait-for-couchdb (init)
error: unable to upgrade connection: container not found ("controller")```
Waiting for couchdb and kafka.

wayneW · February 1, 2022, 4:42pm

There are too many service; it won’t be easy debugging them over slack

danielF · February 1, 2022, 5:32pm

Not sure what we can do about that

So anyway. The output of kubectl describe pods owdev-controller-0 -n openwhisk for example is pretty long. Not sure where to look exactly. Here’s part of it.

  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  23m   default-scheduler  Successfully assigned openwhisk/owdev-controller-0 to [cloud2.t6.ch-geni-net.geni.case.edu](http://cloud2.t6.ch-geni-net.geni.case.edu)
  Normal  Pulled     22m   kubelet            Container image "openwhisk/ow-utils:3e6138d" already present on machine
  Normal  Created    22m   kubelet            Created container wait-for-kafka
  Normal  Started    22m   kubelet            Started container wait-for-kafka```

danielF · February 1, 2022, 7:13pm

Maybe it’s waiting for kafka?

danielF · February 1, 2022, 7:31pm

Full output https://pastebin.com/MCw8y8QL

danielF · February 1, 2022, 9:27pm

…kafka is stuck in init. What’s next?

danielF · February 1, 2022, 10:10pm

Kafka is waiting for zookeeper. Let me keep tracing.

  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  30m   default-scheduler  Successfully assigned openwhisk/owdev-kafka-0 to [cloud2.t6.ch-geni-net.geni.case.edu](http://cloud2.t6.ch-geni-net.geni.case.edu)
  Normal  Pulled     30m   kubelet            Container image "busybox:latest" already present on machine
  Normal  Created    29m   kubelet            Created container wait-for-zookeeper
  Normal  Started    29m   kubelet            Started container wait-for-zookeeper```

danielF · February 1, 2022, 11:35pm

Zookeeper has a warning I guess.

  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  31m                default-scheduler  Successfully assigned openwhisk/owdev-zookeeper-0 to [cloud2.t6.ch-geni-net.geni.case.edu](http://cloud2.t6.ch-geni-net.geni.case.edu)
  Normal   Pulled     31m                kubelet            Container image "zookeeper:3.4" already present on machine
  Normal   Created    30m                kubelet            Created container zookeeper
  Normal   Started    30m                kubelet            Started container zookeeper
  Warning  Unhealthy  11s (x7 over 30m)  kubelet            Readiness probe failed: command "/bin/bash -c echo ruok | nc -w 1 localhost 2181 | grep imok" timed out```
```$ kubectl logs owdev-zookeeper-0 -n openwhisk
0
tickTime=2000
clientPort=2181
initLimit=5
syncLimit=2
dataDir=/data
dataLogDir=/datalog
server.0=owdev-zookeeper-0.owdev-zookeeper.openwhisk.svc.cluster.local:2888:3888
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.server.quorum.QuorumPeerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See [http://logging.apache.org/log4j/1.2/faq.html#noconfig](http://logging.apache.org/log4j/1.2/faq.html#noconfig) for more info.```

danielF · February 1, 2022, 11:58pm

Not sure. Out of ideas. I guess I’ll take it to openwhisk’s support. You’re telling me that the connections are fine, because we tested them with the “dnsutils” pod.

wayneW · February 2, 2022, 12:21am

I believe getting touch with the openwhisk team is a better option

danielF · February 2, 2022, 2:17am

Mmhm. okay. Thanks

danielF · February 2, 2022, 2:22am

For future reference. I won’t be tracing this issue anymore. I was trying to get openwhisk kubernetes to work because I was having issues with openwhisk standalone. And I fixed the openwhisk standalone issues. So unfortunately, if someone is facing this same issue, I don’t have a solution to it.