Pipeline studio service is not running

Problem

Error No discoverable found for request POST /v3/namespaces/system/apps/pipeline/services/studio/methods/v1/contexts/default/validations/stage HTTP/1.1 is seen when validating pipelines.

Symptom(s)

Pipeline Studio status is red (System Admin > Management).

Solution(s)

Pipeline studio errors

Look for errors in the Pipeline Studio logs. If you see something suspicious, send Pipeline Studio and App Fabric logs to the CDF team for debugging.

Remediation:

Restart the pipeline studio service

Navigate to System Admin > Configuration > Make HTTP calls.
Make a POST call to namespaces/system/apps/pipeline/services/studio/start.
Monitor the Pipeline Studio status in the System Admin page. You should be able to validate your pipeline when it turns green.

Pipeline studio service is not running

Log lines like the following in App Fabric logs indicate that Pipeline Studio service failed to start:

2020-06-17 12:18:10,075 - DEBUG [program.status:i.c.c.k.r.KubeTwillRunnerService@223] - Monitoring application service.system.pipeline.studio with run 4627ebab-8437-4cf1-ac4e-2b6496aae72c starts in 300 SECONDS
[...]
2020-06-17 12:23:10,147 - INFO  [OkHttp https://10.114.64.33/api/v1/namespaces/default/configmaps/cdap-dfusion-sl-edwea-master-dev-service-system-pipeline-studio-4627ebab-8437-4cf1-ac4e-2b6496aae72c:i.c.c.i.a.r.d.AbstractTwillProgramController@77] - Twill program terminated: program_run:system.pipeline.-SNAPSHOT.service.studio.9832b5d7-b094-11ea-b4ba-da2f27192df4, twill runId: 4627ebab-8437-4cf1-ac4e-2b6496aae72c

Follow the instructions here to connect to the GKE cluster in your tenant project.

Check pod status

Run kubectl get pods.

Check if:

any pods are not in RUNNING state.
App Fabric pod is running and hasn’t restarted several times.
Pipeline Studio pod is running.

Pipeline Studio pod is not running

If the Pipeline Studio pod is in PENDING state, kubectl descibe pod/<pipeline-studio-pod-id> will tell you why it’s not running. The most likely cause is pod scheduling issues, indicated by errors like the following:

Events:
  Type     Reason             Age                From                Message
  ----     ------             ----               ----                -------
  Warning  FailedScheduling   29s (x2 over 29s)  default-scheduler   0/6 nodes are available: 6 Insufficient cpu.
  Normal   NotTriggerScaleUp  9s (x3 over 29s)   cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added):

Remediation:

Kill all CDAP pods, so Kubernetes will reschedule them: kubectl delete pods -l cdap.instance.

When the command returns, check kubectl get pods to make sure that all pods eventually transition to RUNNING state.

Orphaned pod

If the Pipeline Studio pod gets stuck in PENDING state for 5 minutes and then gets deleted, that means the cluster doesn’t have enough resources to schedule the pod.

It’s possible that you might have an orphaned pod. Check if you have more than one instance of a pod.

In this example, there are 2 Dataprep pods:

cdap-dfusion-sl-edwea-master-dev-service-system-dataprep-s2jv22   1/1     Running   0          25m
cdap-dfusion-sl-edwea-master-dev-service-system-dataprep-sb94f4   1/1     Running   0          24m

Remediation:

Stop the Dataprep service.
- Get the router service name from kubectl get services
- Stop the Dataprep service using kubectl exec -it <app-fabric-pod-name> --curl -X POST http://<router-service-name>:11015/v3/namespaces/system/apps/dataprep/services/service/stop
Delete Dataprep deployment to remove the orphaned Dataprep pod.
- Get Dataprep deployment name from kubectl get deployment
- Delete the deployment using kubectl delete deployment/<dataprep-deployment-name>
Verify that Dataprep pods don’t show up in kubectl get pods.
Start the Dataprep service: kubectl exec -it <app-fabric-pod-name> -- curl -X POST http://<router-service-name>:11015/v3/namespaces/system/apps/dataprep/services/service/start
Start the Pipeline Studio service: kubectl exec -it <app-fabric-pod-name> -- curl -X POST http://<router-service-name>:11015/v3/namespaces/system/apps/pipeline/services/studio/start
Verify that all pods and services are up. Check kubectl get pods, kubectl get services.