Apache OpenOffice (AOO) Bugzilla – Issue 15138
Text Data source has duplicated columns due to spaces in header record
Last modified: 2006-05-31 14:29:06 UTC
Hi, I opened a text file which was defined as a data source. To my surprise it had repeated data to the right which were extra columns and repeated previous columns. Further investigation showed this as being a result of extra spaces in the header record and spaces were used as the field delimiter. To duplicate what I mean. Create a simple text file. In the first record type 'Name' without the quotes which will be the field name. In the second record type 'Kelvin' which will be the data. Set this up as a text data source with space as the field delimiter. View the text data source in the data source window. Everything works as expected. Now go back and add extra spaces to the header record. That is 'Name '. Now view the text data source again and you will find 'Kelvin' appears in additional columns. This was unexpected behaviour. A space is often used as a field delimiter when there are multiple columns of dates, codes or numbers. If spaces inadvertantly appear in the header record I don't see any problem with extra blank columns. I do however see a problem with an earlier column being repeated as this makes it appear there is data in those columns when in fact there is no data. Thanks Kelvin
Kelvin, thanks for reporing this. For simplicity in reproducing this, could you please attach a sample text file? Additionally, can you please tell us the setup of your text data source? Especially the settings on the "Text" tab page in the data source administration dialog would be interesting.
Created attachment 6589 [details] Simple text file with data. Header has extra blanks after field name
Hi, I've added the sample file 'DataSource.txt' as requested for ease of duplication. What is important about this file is the header record contains extra space characters. To test this create a data source with the Database Type of Text. Change the Field separator to {space}. No other settings need to be changed. Then view using the data source window. Hope this helps. Kelvin
thanks for the file, Kelvin. confirming, targeting, and re-assigning. We'll have to think about the implications when changing this. Basically, it would mean that we accept <n> whitespaces as field separator, not only 1, as we do currently. However, this may raise the problem of discrepancies between the header line and the data lines - two spaces in a data line then could mean "2 separator characters", or "separator, empty content, separator".
Hi Frank, I wouldn't change the handling of accepting one blank space as the field separator to <n> white spaces without further thought. It may simply be the data exists, but the header text is a space and IMHO that should be valid. If there are extra blank spaces in the header then simply create the extra blank columns if there is no data in the data record. The header record is the driving record. The header records determines the number of columns currently and I believe this to be correct. If there are extra columns in the detail records that do not exist in the header record, then these are currently ignored which IMHO is correct. If there are extra spaces or valid field names in the header record and insufficient data in the detail records, then the last field value is copied to all remaining columns. If there is insufficient data in the detail record the remaining columns should just be empty, null or blank depending on the data source. The data should not be copied across to other columns. The only issue as I have said, is the data in the columns should be set to a variation of blank for the given data source if no data actually exists. I did not realise when I reported the problem that it also applied if there were more field names in the header record than there is data in the detail records. To duplicate this just add additional field names to the header record but don't add any data to the detail records. I hope this explains it a little further. With the suggestion of providing the option to handle <n> white spaces as a single delimiter, which is done in other modules, this could possibly be a good additional option. I have noticed people importing text files wanting to get rid of extra spaces between data items. This would also be a valid approach to using the text file as a data source. So using the repeated delimiter may be a handy option. Kelvin
Fixed in CWS oj4. Now every column even if there is no content between the separator, will get a name (default). If there is need to filter the columns which are empty, the user has to define a query which doesn't contain these columns. The value from the last "valid" column will no longer be duplicated. Best regards, Ocke
Hi, I was just checking issues which I have raised. I'm not certain if CWS oj4 means this is fixed in OO1.1BetaRC, but given it is still not 100% I thought I would open the issue again. I performed two tests. Test 1: - Two extra blanks after field heading in first row. Expected result: There should be one extra column produced as one space is the separator and the other is the non specified field Actual result: Two extra columns produced which is unexpected. Test file attached. Test 2: - Missing data in records Expected result: No data should appear in the columns when data is missing. (Alternatively an error could be raised that there is insufficient data. However this will not be known until all records are read and is thus probably not a viable alternative.) Actual result: Data from last non empty column is repeated in further columns. Note: In this test there is extra data in the last record. Given the first record will normally determine the number of columns, I feel it is quite valid if there are insufficient header fields for excess data to be truncated. Just my opinion. Please feel free to close this issue again if this has been fixed after OO1.1BetaRC Please also note again that I do not consider these to be major bugs. If the Hope this helps Kelvin
Created attachment 7536 [details] Two blanks after headers
Created attachment 7537 [details] Missing and extra data
Hi Kelvin, thanks for the additional test cases. As you might haven't seen the target milestone for this issue is set to OOo 2.0 which means that this one will be fixed after OOo 1.1RC. CWS only means Child work space, in this work space our OA will test all fixes included, so that the main trunk will not be trashed :-) Best regards, Ocke
Send to QA
set to fixed
set to ficed
verified in OJ4
works in oo1.1
Hi, I wouldn't consider this fixed. Please see attached screen dump. The field separator in this example is a space as mentioned in the earlier comments. I simply downloaded the file I first submitted. Set up the datasource using space as the field separator. Still looks like repeated fields to me. It also looks like an extra blank column as well so I think things have gotten worse. This is not a major issue and it is tricky to work out all the combinations. A space is not usually used as a field separator in my experience. It would also be unusual for the heading record to contain extra spaces. Anyway, I hope the feedback is of use. Kelvin
Created attachment 12156 [details] Screen dump showing duplicated data and now additional blank column
The bug was fixed on the SRC680 branch only (as also indicated by the target), though I don't know how clu's comment "works in oo1.1" came in then. I suppose it was a mistake. Kelvin, which version did you try? A recent developer snapshot, or OOo 1.1?
Hi, My test was done in OOo1.1.0. I was testing it again as a result of clu's works in OO1.1 comment. No problem if this is being fixed for OOo2.0. I'll test it again then:-) Kelvin
CLU: sorry, kelvin - this confusion was my mistake, like fs has written - fix will hopefully came out with next available version - thx for reopening this issue CLU->protocol: fix was checked in main trunk (680 - internal version) after integrating the cws
change subcomponent to 'none'
fixed in oj4
oj4 integrated in 680m4 - on oo developer version 680m30 is available, so the fix is out