我是靠谱客的博主 认真过客,这篇文章主要介绍Spark读取hudi中的数据报Unable to infer schema for Parquet. It must be specified manually.;Spark读取写入Hudi中的数据时报了这个错误 ,现在分享给大家,希望可以做个参考。

Spark读取写入Hudi中的数据时报了这个错误 

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
10131 [dispatcher-event-loop-4] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7722 bytes) 10140 [Executor task launch worker for task 0] INFO org.apache.spark.executor.Executor - Running task 0.0 in stage 0.0 (TID 0) 10247 [Executor task launch worker for task 0] INFO org.apache.spark.executor.Executor - Finished task 0.0 in stage 0.0 (TID 0). 708 bytes result sent to driver 10257 [task-result-getter-0] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 0.0 (TID 0) in 143 ms on localhost (executor driver) (1/1) 10260 [task-result-getter-0] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Removed TaskSet 0.0, whose tasks have all completed, from pool 10266 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - ResultStage 0 (resolveRelation at DefaultSource.scala:78) finished in 0.487 s 10271 [main] INFO org.apache.spark.scheduler.DAGScheduler - Job 0 finished: resolveRelation at DefaultSource.scala:78, took 0.525224 s Exception in thread "main" org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:78) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:47) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at sparkAndHudiToHive.SparkOnHudiToHiveExample$.SelectHudi(SparkOnHudiToHiveExample.scala:152) at sparkAndHudiToHive.SparkOnHudiToHiveExample$.main(SparkOnHudiToHiveExample.scala:36) at sparkAndHudiToHive.SparkOnHudiToHiveExample.main(SparkOnHudiToHiveExample.scala) 10279 [Thread-1] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook 10286 [Thread-1] INFO org.spark_project.jetty.server.AbstractConnector - Stopped Spark@62417a16{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 10288 [Thread-1] INFO org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://windows:4040 10298 [dispatcher-event-loop-5] INFO org.apache.spark.MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped! 10317 [Thread-1] INFO org.apache.spark.storage.memory.MemoryStore - MemoryStore cleared 10318 [Thread-1] INFO org.apache.spark.storage.BlockManager - BlockManager stopped 10324 [Thread-1] INFO org.apache.spark.storage.BlockManagerMaster - BlockManagerMaster stopped 10326 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped! 10335 [Thread-1] INFO org.apache.spark.SparkContext - Successfully stopped SparkContext 10336 [Thread-1] INFO org.apache.spark.util.ShutdownHookManager - Shutdown hook called 10336 [Thread-1] INFO org.apache.spark.util.ShutdownHookManager - Deleting directory C:Users10437AppDataLocalTempspark-834fcdc9-2e63-4918-bee5-dcc9e6793015 Process finished with exit code 1

原来在我的代码中没有详细的指定好hudi的文件存储路径  记住哟啊加 

 文件目录要加 "/*/*"

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
package sparkAndHudiToHive import org.apache.hudi.DataSourceWriteOptions import org.apache.hudi.config.{HoodieHBaseIndexConfig, HoodieIndexConfig, HoodieWriteConfig} import org.apache.hudi.index.HoodieIndex import org.apache.spark.sql.{DataFrame, Dataset, Row, SaveMode, SparkSession} import org.apache.spark.{SparkConf, SparkContext} /** * @ClassName SparkOnHudiToHiveExample * @Description: * @Author 庄 * @Date 2021/1/18 * @Version V1.0 **/ object SparkOnHudiToHiveExample { def main(args: Array[String]): Unit = { val sc: SparkConf = new SparkConf().setMaster("local[*]") .setAppName(this.getClass.getName) .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") val sparkSession: SparkSession = SparkSession.builder().config(sc).getOrCreate() // 初始化切入数据 到 Hudi //InsertHudi(sparkSession: SparkSession) // 读取普通的本地数据写入Hive //InsertHive(sparkSession: SparkSession) //查询写入hudi中的数据 SelectHudi(sparkSession:SparkSession) def SelectHudi(sparkSession: SparkSession) = { val df: DataFrame = sparkSession.read.format("org.apache.hudi") //最早是这样的 看了一下 //.load("/hudi/insertHDFS/") //这样就对了 .load("/hudi/insertHDFS/*/*") df.show() } } }

 

最后

以上就是认真过客最近收集整理的关于Spark读取hudi中的数据报Unable to infer schema for Parquet. It must be specified manually.;Spark读取写入Hudi中的数据时报了这个错误 的全部内容,更多相关Spark读取hudi中的数据报Unable内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(107)

评论列表共有 0 条评论

立即
投稿
返回
顶部