Thursday, March 15, 2018

Spark2 Can't create directory errors

18/03/15 11:00:27 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1521132642246_0008_01_000007 on host: host123. Exit status: -1000. Diagnostics: Application application_1
521132642246_0008 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is reporting
main : requested yarn user is reporting
Can't create directory /cdh/0/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/1/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/10/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/11/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/12/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/13/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/14/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/15/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/16/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/17/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/18/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/19/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/2/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/20/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/21/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/22/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/23/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/3/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/4/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/5/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/6/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/7/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/8/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/9/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Did not create any app directories


18/03/15 11:00:27 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1521132642246_0008_01_000009 on host: host123. Exit status: -1000. Diagnostics: Application application_1
521132642246_0008 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is reporting
main : requested yarn user is reporting
Can't create directory /cdh/0/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/1/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/10/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/11/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/12/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/13/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/14/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/15/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/16/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied
Can't create directory /cdh/17/yarn/nm/usercache/reporting/appcache/application_1521132642246_0008 - Permission denied


I have seen these types of Spark2 errors since Kerberizing my cluster. Usually not service impacting, Spark2 seemed to just retry on a different node. If the job was big - the odds of a fatal failure increased, probably hits a too many failed container limit or something. (Just guessing there)

Found out that Kerberizing with these directories created pre-kerberization could cause that issue. I have since deleted en mass and have not seen anything like this since.

rm -rf [your_mount_points]/yarn/nm/usercache/reporting/appcache/* on all nodes.

Hope that helps somebody!

No comments:

Post a Comment