使用POI和SAX读取Excel
Published on: | Views: 139最近做了一个excel导入功能,在线上使用时崩溃了,看了一下,数据量也不大,就几W条, 也没有具体的错误信息,猜测可能是内存崩溃了。 原来的代码如下:
Workbook workbook = new XSSFWorkbook(inputStream);
Sheet sheet = workbook.getSheetAt(0);
int i = 0;
for (Row row : sheet) {
//...
}
}
根据官网的解释, 这样写的话,会先把整个excel文件加载解析到内存中,所以会占用比较多的内存。 如果excel文件太大,可以使用SAX事件来解析,具体的文档参考 : https://poi.apache.org/components/spreadsheet/how-to.html#xssf_sax_api
然后文档里有个示例代码,搬过来就可以了。
初看示例代码有点迷茫,所以可以先把整个excel解析后的xml打印出来,然后对照阅读就没有什么问题了。
原始excel如下:
解析出来的xml数据如下:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing"
xmlns:x14="http://schemas.microsoft.com/office/spreadsheetml/2009/9/main"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:etc="http://www.wps.cn/officeDocument/2017/etCustomData">
<sheetPr/>
<dimension ref="A1:F5"/>
<sheetViews>
<sheetView tabSelected="1" workbookViewId="0">
<selection activeCell="D6" sqref="D6"/>
</sheetView>
</sheetViews>
<sheetFormatPr defaultColWidth="9" defaultRowHeight="15" outlineLevelRow="4" outlineLevelCol="5"/>
<cols>
<col min="1" max="2" width="21.5583333333333" style="1" customWidth="1"/>
<col min="3" max="3" width="12.775" style="1" customWidth="1"/>
<col min="4" max="4" width="40.1083333333333" style="1" customWidth="1"/>
<col min="5" max="5" width="5.55833333333333" style="1" customWidth="1"/>
<col min="6" max="6" width="19.8833333333333" style="1" customWidth="1"/>
<col min="7" max="16384" width="8.88333333333333" style="1"/>
</cols>
<sheetData>
<row r="1" spans="1:6">
<c r="A1" s="1" t="s">
<v>0</v>
</c>
<c r="B1" s="1" t="s">
<v>1</v>
</c>
<c r="C1" s="1" t="s">
<v>2</v>
</c>
<c r="D1" s="1" t="s">
<v>3</v>
</c>
<c r="E1" s="1" t="s">
<v>4</v>
</c>
<c r="F1" s="1" t="s">
<v>5</v>
</c>
</row>
<row r="2" spans="1:6">
<c r="A2" s="1" t="s">
<v>0</v>
</c>
<c r="B2" s="1" t="s">
<v>1</v>
</c>
<c r="C2" s="1" t="s">
<v>6</v>
</c>
<c r="D2" s="1" t="s">
<v>7</v>
</c>
<c r="E2" s="1" t="s">
<v>4</v>
</c>
<c r="F2" s="1" t="s">
<v>8</v>
</c>
</row>
//....省略
</sheetData>
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
<headerFooter/>
</worksheet>
处理流程
示例数据:
<row r="1" spans="1:6">
<c r="A1" s="1" t="s">
<v>0</v>
</c>
...
</row>
其中: row表示元素为行 c表示元素为列 v表示元素为值,它的数据是一个偏移量 r是列号或者行号 spans表示行数据的跨度 s是? t是数据类型,值为"s"表示字符串
事件处理流程就是这样的:
1. row元素开始
2. c元素开始
3. v元素开始
4. v元素结束
5. c元素结束
6. row元素结束
相关代码中的处理流程:
1. startElement <row r="1" spans="1:6">
2. startElement <c r="A1" s="1" t="s">
3. startElement <v>
4. characters 0
5. endElement </v>
6. endElement </c>
7. endElement </row>
封装
为了简便处理导出,写了一个简单的封装类:
public class ExcelImporter {
/**
* @param inputStream 输入流,处理结束后自动关闭
* @param contentHandler 内容处理器
* @param rId Normally it's of the form rId# or rSheet#, example rId1
*/
public static void importFromInputStream(InputStream inputStream,
ExcelSheetHandler contentHandler,
String rId) {
try {
OPCPackage pkg = OPCPackage.open(inputStream);
XSSFReader xssfReader = new XSSFReader(pkg);
SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();
XMLReader parser = XMLHelper.newXMLReader();
contentHandler.init(sharedStringsTable);
parser.setContentHandler(contentHandler);
InputStream sheet = xssfReader.getSheet(rId);
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
public abstract class ExcelSheetHandler<T> extends DefaultHandler {
private SharedStringsTable sst;
private StringBuilder lastContents = new StringBuilder();
private StringBuilder lastColumnLabel = new StringBuilder();
private T dto;
private boolean nextIsString;
private Integer rowIndex = 0;
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
if (rowIndex < getSkipHeaderRowCount()) {
return;
}
if (name.equals("row")) {
//行开始
if (dto == null) {
dto = createData();
}
clearData(dto);
}
// c => cell
if (name.equals("c")) {
lastColumnLabel.setLength(0);
lastColumnLabel.append(attributes.getValue("r"));
// Figure out if the value is an index in the SST
String cellType = attributes.getValue("t");
nextIsString = cellType != null && cellType.equals("s");
}
// Clear contents cache
lastContents.setLength(0);
}
public void endElement(String uri, String localName, String name)
throws SAXException {
if (rowIndex < getSkipHeaderRowCount()) {
if (name.equals("row")) {
rowIndex++;
}
return;
}
if (name.equals("row")) {
onRowEnd(dto, rowIndex);
rowIndex++;
}
// Process the last contents as required.
// Do now, as characters() may be called more than once
if (nextIsString) {
int idx = Integer.parseInt(lastContents.toString());
lastContents.setLength(0);
lastContents.append(sst.getItemAt(idx).getString());
nextIsString = false;
}
// v => contents of a cell
// Output after we've seen the string contents
if (name.equals("v")) {
onColEnd(dto, rowIndex, lastColumnLabel.toString(), lastContents.toString());
}
}
public void characters(char[] ch, int start, int length) {
lastContents.append(ch, start, length);
}
public void init(SharedStringsTable sst) {
this.sst = sst;
}
protected int getSkipHeaderRowCount() {
return 0;
}
protected abstract T createData();
protected abstract void clearData(T dto);
protected abstract void onRowEnd(T dto, int row);
protected abstract void onColEnd(T dto, int row, String label, String value);
}