Resolving DNS issues in kubernetes cluster

Okay. I guess I should message them at openwhisk then. But do you have any ideas on how to debug openwhisk?

I have this.

default       dnsutils                                   1/1     Running     0          22m
kube-system   calico-kube-controllers-85b5b5888d-4t6gc   1/1     Running     0          3h8m
kube-system   calico-node-2j65n                          1/1     Running     0          3h8m
kube-system   calico-node-7hrcm                          1/1     Running     0          3h8m
kube-system   coredns-64897985d-fp4sv                    1/1     Running     0          3h9m
kube-system   coredns-64897985d-tj48n                    1/1     Running     0          3h9m
kube-system   etcd-node                                  1/1     Running     0          3h9m
kube-system   kube-apiserver-node                        1/1     Running     0          3h9m
kube-system   kube-controller-manager-node               1/1     Running     0          3h9m
kube-system   kube-proxy-5bgk7                           1/1     Running     0          3h8m
kube-system   kube-proxy-qqxt7                           1/1     Running     0          3h9m
kube-system   kube-scheduler-node                        1/1     Running     0          3h9m
openwhisk     owdev-alarmprovider-5fd46859bd-s2qs6       0/1     Init:0/1    0          17m
openwhisk     owdev-apigateway-66486c84cd-hfk8j          1/1     Running     0          17m
openwhisk     owdev-controller-0                         0/1     Init:0/2    0          17m
openwhisk     owdev-couchdb-696d7db9bb-4bv7k             1/1     Running     0          17m
openwhisk     owdev-gen-certs-qrvst                      0/1     Completed   0          17m
openwhisk     owdev-init-couchdb-7xf8d                   0/1     Error       0          17m
openwhisk     owdev-init-couchdb-7zcdl                   0/1     Error       0          15m
openwhisk     owdev-init-couchdb-cb4hb                   0/1     Error       0          15m
openwhisk     owdev-init-couchdb-pzz89                   0/1     Error       0          14m
openwhisk     owdev-install-packages-7924t               0/1     Init:0/1    0          17m
openwhisk     owdev-invoker-0                            0/1     Init:0/1    0          17m
openwhisk     owdev-kafka-0                              0/1     Init:0/1    0          17m
openwhisk     owdev-kafkaprovider-c74cb9956-mmsks        0/1     Init:0/1    0          17m
openwhisk     owdev-nginx-7b4b46485-2k7qj                0/1     Init:0/1    0          17m
openwhisk     owdev-redis-685cd564d8-g5n6q               1/1     Running     0          17m
openwhisk     owdev-wskadmin                             1/1     Running     0          17m
openwhisk     owdev-zookeeper-0                          1/1     Running     0          17m```

And this

Cloning into '/openwhisk'...
fatal: unable to access '[https://github.com/apache/openwhisk/](https://github.com/apache/openwhisk/)': Could not resolve host: [github.com](http://github.com)```

Those were couchdb init containers. However, the couchdb pod is running

What about the pods in init state. Did you try to describe the pod to identify what the issue is?

Let me try. I have this from the controller.

Defaulted container "controller" out of: controller, wait-for-kafka (init), wait-for-couchdb (init)
error: unable to upgrade connection: container not found ("controller")```
Waiting for couchdb and kafka.

There are too many service; it won’t be easy debugging them over slack

Not sure what we can do about that :sweat_smile:

So anyway. The output of kubectl describe pods owdev-controller-0 -n openwhisk for example is pretty long. Not sure where to look exactly. Here’s part of it.

  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  23m   default-scheduler  Successfully assigned openwhisk/owdev-controller-0 to [cloud2.t6.ch-geni-net.geni.case.edu](http://cloud2.t6.ch-geni-net.geni.case.edu)
  Normal  Pulled     22m   kubelet            Container image "openwhisk/ow-utils:3e6138d" already present on machine
  Normal  Created    22m   kubelet            Created container wait-for-kafka
  Normal  Started    22m   kubelet            Started container wait-for-kafka```

Maybe it’s waiting for kafka?

Full output https://pastebin.com/MCw8y8QL

…kafka is stuck in init. What’s next?

Kafka is waiting for zookeeper. Let me keep tracing.

  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  30m   default-scheduler  Successfully assigned openwhisk/owdev-kafka-0 to [cloud2.t6.ch-geni-net.geni.case.edu](http://cloud2.t6.ch-geni-net.geni.case.edu)
  Normal  Pulled     30m   kubelet            Container image "busybox:latest" already present on machine
  Normal  Created    29m   kubelet            Created container wait-for-zookeeper
  Normal  Started    29m   kubelet            Started container wait-for-zookeeper```

Zookeeper has a warning I guess.

  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  31m                default-scheduler  Successfully assigned openwhisk/owdev-zookeeper-0 to [cloud2.t6.ch-geni-net.geni.case.edu](http://cloud2.t6.ch-geni-net.geni.case.edu)
  Normal   Pulled     31m                kubelet            Container image "zookeeper:3.4" already present on machine
  Normal   Created    30m                kubelet            Created container zookeeper
  Normal   Started    30m                kubelet            Started container zookeeper
  Warning  Unhealthy  11s (x7 over 30m)  kubelet            Readiness probe failed: command "/bin/bash -c echo ruok | nc -w 1 localhost 2181 | grep imok" timed out```
```$ kubectl logs owdev-zookeeper-0 -n openwhisk
0
tickTime=2000
clientPort=2181
initLimit=5
syncLimit=2
dataDir=/data
dataLogDir=/datalog
server.0=owdev-zookeeper-0.owdev-zookeeper.openwhisk.svc.cluster.local:2888:3888
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.server.quorum.QuorumPeerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See [http://logging.apache.org/log4j/1.2/faq.html#noconfig](http://logging.apache.org/log4j/1.2/faq.html#noconfig) for more info.```

Not sure. Out of ideas. I guess I’ll take it to openwhisk’s support. You’re telling me that the connections are fine, because we tested them with the “dnsutils” pod.

I believe getting touch with the openwhisk team is a better option

Mmhm. okay. Thanks :grin:

For future reference. I won’t be tracing this issue anymore. I was trying to get openwhisk kubernetes to work because I was having issues with openwhisk standalone. And I fixed the openwhisk standalone issues. So unfortunately, if someone is facing this same issue, I don’t have a solution to it.