Pipeline studio service is not running
Problem
Error No discoverable found for request POST /v3/namespaces/system/apps/pipeline/services/studio/methods/v1/contexts/default/validations/stage HTTP/1.1
is seen when validating pipelines.
Symptom(s)
Pipeline Studio status is red (System Admin > Management).
Solution(s)
Pipeline studio errors
Look for errors in the Pipeline Studio logs. If you see something suspicious, send Pipeline Studio and App Fabric logs to the CDF team for debugging.
Remediation:
Restart the pipeline studio service
Navigate to System Admin > Configuration > Make HTTP calls.
Make a POST call to
namespaces/system/apps/pipeline/services/studio/start
.Monitor the Pipeline Studio status in the System Admin page. You should be able to validate your pipeline when it turns green.
Pipeline studio service is not running
Log lines like the following in App Fabric logs indicate that Pipeline Studio service failed to start:
2020-06-17 12:18:10,075 - DEBUG [program.status:i.c.c.k.r.KubeTwillRunnerService@223] - Monitoring application service.system.pipeline.studio with run 4627ebab-8437-4cf1-ac4e-2b6496aae72c starts in 300 SECONDS
[...]
2020-06-17 12:23:10,147 - INFO [OkHttp https://10.114.64.33/api/v1/namespaces/default/configmaps/cdap-dfusion-sl-edwea-master-dev-service-system-pipeline-studio-4627ebab-8437-4cf1-ac4e-2b6496aae72c:i.c.c.i.a.r.d.AbstractTwillProgramController@77] - Twill program terminated: program_run:system.pipeline.-SNAPSHOT.service.studio.9832b5d7-b094-11ea-b4ba-da2f27192df4, twill runId: 4627ebab-8437-4cf1-ac4e-2b6496aae72c
Follow the instructions here to connect to the GKE cluster in your tenant project.
Check pod status
Run kubectl get pods
.
Check if:
any pods are not in RUNNING state.
App Fabric pod is running and hasn’t restarted several times.
Pipeline Studio pod is running.
Pipeline Studio pod is not running
If the Pipeline Studio pod is in PENDING state, kubectl descibe pod/<pipeline-studio-pod-id>
will tell you why it’s not running. The most likely cause is pod scheduling issues, indicated by errors like the following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 29s (x2 over 29s) default-scheduler 0/6 nodes are available: 6 Insufficient cpu.
Normal NotTriggerScaleUp 9s (x3 over 29s) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added):
Remediation:
Kill all CDAP pods, so Kubernetes will reschedule them: kubectl delete pods -l cdap.instance
.
When the command returns, check kubectl get pods
to make sure that all pods eventually transition to RUNNING state.
Orphaned pod
If the Pipeline Studio pod gets stuck in PENDING state for 5 minutes and then gets deleted, that means the cluster doesn’t have enough resources to schedule the pod.
It’s possible that you might have an orphaned pod. Check if you have more than one instance of a pod.
In this example, there are 2 Dataprep pods:
cdap-dfusion-sl-edwea-master-dev-service-system-dataprep-s2jv22 1/1 Running 0 25m
cdap-dfusion-sl-edwea-master-dev-service-system-dataprep-sb94f4 1/1 Running 0 24m
Remediation:
Stop the Dataprep service.
Get the router service name from
kubectl get services
Stop the Dataprep service using
kubectl exec -it <app-fabric-pod-name> --curl -X POST http://<router-service-name>:11015/v3/namespaces/system/apps/dataprep/services/service/stop
Delete Dataprep deployment to remove the orphaned Dataprep pod.
Get Dataprep deployment name from
kubectl get deployment
Delete the deployment using
kubectl delete deployment/<dataprep-deployment-name>
Verify that Dataprep pods don’t show up in
kubectl get pods
.Start the Dataprep service:
kubectl exec -it <app-fabric-pod-name> -- curl -X POST http://<router-service-name>:11015/v3/namespaces/system/apps/dataprep/services/service/start
Start the Pipeline Studio service:
kubectl exec -it <app-fabric-pod-name> -- curl -X POST http://<router-service-name>:11015/v3/namespaces/system/apps/pipeline/services/studio/start
Verify that all pods and services are up. Check
kubectl get pods
,kubectl get services
.