今天在学习如何从hadoop中读取数据时,写了一个简单的方法,测试时,却报以下错误:
以下是读取hadoop中文件并写入本地磁盘的代码:
package hdfs;import java.io.BufferedReader;import java.io.FileWriter;import java.io.InputStream;import java.io.InputStreamReader;import java.net.URL;import org.apache.hadoop.io.IOUtils;public class HDFS { public static void main(String[] args) throws Exception { InputStream inputStream = null; FileWriter writer = null; try { URL url = new URL("hdfs://localhost:9000/input.txt"); inputStream = url.openStream(); writer = new FileWriter("/home/wxl/桌面/tmp.txt"); InputStreamReader reader = new InputStreamReader(inputStream); BufferedReader bufferedReader = new BufferedReader(reader); String line = null; while((line = bufferedReader.readLine()) != null) { writer.write(line); } } finally { IOUtils.closeStream(inputStream); if(writer != null) { writer.close(); } } }}
几经周折,在《Hadoop权威指南》中找到这样的结果:
"There’s a little bit more work required to make Java recognize Hadoop’s hdfs URL scheme. This is achieved by calling the setURLStreamHandlerFactory method on URL with an instance of FsUrlStreamHandlerFactory . This method can be called only once per JVM, so it is typically executed in a static block."
意即:“让Java程序能够识别Hadoop的hdfs URL方案还需要一些额外的工作,这里采用的方法是通过FsUrlStreamHandlerFactory实例调用URL中的setURLStreamHandlerFactory方法。由于Java虚拟机只能调用一次上述方法,因此通常在静态方法中调用上述方法。”
于是,在类中加入静态执行块:
static { // This method can be called at most once in a given JVM. URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());}
因此代码变成了下面这样:
package hdfs;import java.io.BufferedReader;import java.io.FileWriter;import java.io.InputStream;import java.io.InputStreamReader;import java.net.URL;import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;import org.apache.hadoop.io.IOUtils;public class HDFS { static { // This method can be called at most once in a given JVM. URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory();); } public static void main(String[] args) throws Exception { InputStream inputStream = null; FileWriter writer = null; try { URL url = new URL("hdfs://localhost:9000/input.txt"); inputStream = url.openStream(); writer = new FileWriter("/home/wxl/桌面/tmp.txt"); InputStreamReader reader = new InputStreamReader(inputStream); BufferedReader bufferedReader = new BufferedReader(reader); String line = null; while((line = bufferedReader.readLine()) != null) { writer.write(line); } } finally { IOUtils.closeStream(inputStream); if(writer != null) { writer.close(); } } }}
OK,不再报错,成功运行。