15138 – Text Data source has duplicated columns due to spaces in header record

Issue 15138 - Text Data source has duplicated columns due to spaces in header record

Summary: Text Data source has duplicated columns due to spaces in header record

Status:	CLOSED FIXED

Alias:	None

Product:	Base
Classification:	Application
Component:	code (show other issues)
Version:	OOo 1.1 RC
Hardware:	PC Windows XP

Importance:	P3 Trivial (vote)
Target Milestone:	OOo 2.0
Assignee:	christoph.lukasiak
QA Contact:	issues@dba

URL:
Keywords:

Depends on:
Blocks:

Reported:	2003-06-01 00:53 UTC by kelvine
Modified:	2006-05-31 14:29 UTC (History)
CC List:	1 user (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Simple text file with data. Header has extra blanks after field name (14 bytes, text/plain) 2003-06-02 10:10 UTC, kelvine	no flags	Details
Two blanks after headers (18 bytes, text/plain) 2003-07-10 16:15 UTC, kelvine	no flags	Details
Missing and extra data (41 bytes, text/plain) 2003-07-10 16:16 UTC, kelvine	no flags	Details
Screen dump showing duplicated data and now additional blank column (15.96 KB, image/png) 2003-12-29 02:24 UTC, kelvine	no flags	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description kelvine 2003-06-01 00:53:21 UTC

Hi,

I opened a text file which was defined as a data source.

To my surprise it had repeated data to the right which were extra columns and 
repeated previous columns.

Further investigation showed this as being a result of extra spaces in the 
header record and spaces were used as the field delimiter.

To duplicate what I mean.

Create a simple text file.
In the first record type 'Name' without the quotes which will be the field 
name.
In the second record type 'Kelvin' which will be the data.
Set this up as a text data source with space as the field delimiter.
View the text data source in the data source window.
Everything works as expected.

Now go back and add extra spaces to the header record.  That is 'Name    '.
Now view the text data source again and you will find 'Kelvin' appears in 
additional columns.

This was unexpected behaviour.  A space is often used as a field delimiter 
when there are multiple columns of dates, codes or numbers.

If spaces inadvertantly appear in the header record I don't see any problem 
with extra blank columns.

I do however see a problem with an earlier column being repeated as this makes 
it appear there is data in those columns when in fact there is no data.

Thanks

Kelvin

Comment 1 Frank Schönheit 2003-06-02 08:36:08 UTC

Kelvin, thanks for reporing this. For simplicity in reproducing this,
could you please attach a sample text file?
Additionally, can you please tell us the setup of your text data
source? Especially the settings on the "Text" tab page in the data
source administration dialog would be interesting.

Comment 2 kelvine 2003-06-02 10:10:15 UTC

Created attachment 6589 [details]
Simple text file with data. Header has extra blanks after field name

Comment 3 kelvine 2003-06-02 10:15:24 UTC

Hi,

I've added the sample file 'DataSource.txt' as requested for ease of 
duplication.

What is important about this file is the header record contains 
extra space characters.

To test this create a data source with the Database Type of Text.
Change the Field separator to {space}.  No other settings need to be 
changed.

Then view using the data source window.

Hope this helps.

Kelvin

Comment 4 Frank Schönheit 2003-06-02 10:37:56 UTC

thanks for the file, Kelvin.
confirming, targeting, and re-assigning.

We'll have to think about the implications when changing this.
Basically, it would mean that we accept <n> whitespaces as field
separator, not only 1, as we do currently. However, this may raise the
problem of discrepancies between the header line and the data lines -
two spaces in a data line then could mean "2 separator characters", or
"separator, empty content, separator".

Comment 5 kelvine 2003-06-02 11:26:49 UTC

Hi Frank,

I wouldn't change the handling of accepting one blank space as the 
field separator to <n> white spaces without further thought.

It may simply be the data exists, but the header text is a space and 
IMHO that should be valid.

If there are extra blank spaces in the header then simply create the 
extra blank columns if there is no data in the data record.

The header record is the driving record.  The header records 
determines the number of columns currently and I believe this to be 
correct.

If there are extra columns in the detail records that do not exist 
in the header record, then these are currently ignored which IMHO is 
correct.

If there are extra spaces or valid field names in the header record 
and insufficient data in the detail records, then the last field 
value is copied to all remaining columns.

If there is insufficient data in the detail record the remaining 
columns should just be empty, null or blank depending on the data 
source.  The data should not be copied across to other columns.

The only issue as I have said, is the data in the columns should be 
set to a variation of blank for the given data source if no data 
actually exists.

I did not realise when I reported the problem that it also applied 
if there were more field names in the header record than there is 
data in the detail records.

To duplicate this just add additional field names to the header 
record but don't add any data to the detail records.

I hope this explains it a little further. 

With the suggestion of providing the option to handle <n> white 
spaces as a single delimiter, which is done in other modules, this 
could possibly be a good additional option.

I have noticed people importing text files wanting to get rid of 
extra spaces between data items.  This would also be a valid 
approach to using the text file as a data source. So using the 
repeated delimiter may be a handy option.



Kelvin

Comment 6 ocke.janssen 2003-07-02 14:40:02 UTC

Fixed in CWS oj4. Now every column even if there is no content between
the separator, will get a name (default). If there is need to filter
the columns which are empty, the user has to define a query which
doesn't contain these columns. 
The value from the last "valid" column will no longer be duplicated.

Best regards,

Ocke

Comment 7 kelvine 2003-07-10 16:14:35 UTC

Hi,

I was just checking issues which I have raised. I'm not certain if 
CWS oj4 means this is fixed in OO1.1BetaRC, but given it is still 
not 100% I thought I would open the issue again.

I performed two tests.

Test 1: - Two extra blanks after field heading in first row.

Expected result: There should be one extra column produced as one 
space is the separator and the other is the non specified field

Actual result: Two extra columns produced which is unexpected.  Test 
file attached.

Test 2: - Missing data in records

Expected result: No data should appear in the columns when data is 
missing. (Alternatively an error could be raised that there is 
insufficient data. However this will not be known until all records 
are read and is thus probably not a viable alternative.)

Actual result: Data from last non empty column is repeated in 
further columns.  

Note: In this test there is extra data in the last record.  Given 
the first record will normally determine the number of columns, I 
feel it is quite valid if there are insufficient header fields for 
excess data to be truncated.  Just my opinion.

Please feel free to close this issue again if this has been fixed 
after OO1.1BetaRC

Please also note again that I do not consider these to be major 
bugs.  If the 

Hope this helps

Kelvin

Comment 8 kelvine 2003-07-10 16:15:32 UTC

Created attachment 7536 [details]
Two blanks after headers

Comment 9 kelvine 2003-07-10 16:16:21 UTC

Created attachment 7537 [details]
Missing and extra data

Comment 10 ocke.janssen 2003-07-11 06:50:29 UTC

Hi Kelvin,

thanks for the additional test cases. 
As you might haven't seen the target milestone for this issue is set
to OOo 2.0 which means that this one will be fixed after OOo 1.1RC.
CWS only means Child work space, in this work space our OA will test
all fixes included, so that the main trunk will not be trashed :-)

Best regards,

Ocke

Comment 11 ocke.janssen 2003-08-25 10:57:34 UTC

Send to QA

Comment 12 marc.neumann 2003-09-03 12:12:16 UTC

set to fixed

Comment 13 marc.neumann 2003-09-03 12:12:36 UTC

set to ficed

Comment 14 marc.neumann 2003-09-03 12:12:58 UTC

verified in OJ4

Comment 15 christoph.lukasiak 2003-10-07 11:26:08 UTC

works in oo1.1

Comment 16 kelvine 2003-12-29 02:22:12 UTC

Hi,

I wouldn't consider this fixed.

Please see attached screen dump.

The field separator in this example is a space as mentioned in the earlier 
comments. I simply downloaded the file I first submitted. Set up the 
datasource using space as the field separator.

Still looks like repeated fields to me. It also looks like an extra blank 
column as well so I think things have gotten worse.

This is not a major issue and it is tricky to work out all the combinations. A 
space is not usually used as a field separator in my experience. It would also 
be unusual for the heading record to contain extra spaces.

Anyway, I hope the feedback is of use.

Kelvin

Comment 17 kelvine 2003-12-29 02:24:38 UTC

Created attachment 12156 [details]
Screen dump showing duplicated data and now additional blank column

Comment 18 Frank Schönheit 2004-01-05 09:13:12 UTC

The bug was fixed on the SRC680 branch only (as also indicated by the target),
though I don't know how clu's comment "works in oo1.1" came in then. I suppose
it was a mistake.
Kelvin, which version did you try? A recent developer snapshot, or OOo 1.1?

Comment 19 kelvine 2004-01-05 09:41:02 UTC

Hi,

My test was done in OOo1.1.0.

I was testing it again as a result of clu's works in OO1.1 comment.

No problem if this is being fixed for OOo2.0. I'll test it again then:-)

Kelvin

Comment 20 christoph.lukasiak 2004-01-05 10:11:29 UTC

CLU: sorry, kelvin - this confusion was my mistake, like fs has written - fix
will hopefully came out with next available version - thx for reopening this issue

CLU->protocol: fix was checked in main trunk (680 - internal version) after
integrating the cws

Comment 21 hans_werner67 2004-02-02 12:17:57 UTC

change subcomponent to 'none'

Comment 22 christoph.lukasiak 2004-03-23 10:32:46 UTC

fixed in oj4

Comment 23 christoph.lukasiak 2004-03-23 10:34:29 UTC

oj4 integrated in 680m4 - on oo developer version 680m30 is available, so the
fix is out