Page 1 of 1

Nationality Field

Posted: Sun Mar 27, 2011 11:17 am
by gardano
I'm parsing all nationalities in order to allow filtering by nationality, and it seems to me that this field could use some clean-up (unless I'm not understanding the current uses of this field). Here's what I get:

As you can see, in many instances, a nationality has
* Nationality
* Nationality + 'people'
* Nationality [pluralized]

It certainly would make my life easier if each nationality were presented as either the proper name of the country/region ("Germany") or the name as adjective ("German").

Comments?

'Albanian'
'Algerians'
'American people'
'American'
'Americans'
'Argentinian'
'Argentinians'
'Armenians'
'Australian people'
'Australian'
'Australians'
'Austrian'
'Austrians'
'Basques'
'Belgian'
'Belgians'
'Bosnian'
'Brazilian'
'Brazilians'
'British'
'Britishs'
'Bulgarians'
'Canadian'
'Canadians'
'Catalan'
'Catalans'
'Chilean'
'Chileans'
'Chineses'
'Colombians'
'Croatian'
'Croatians'
'Cubans'
'Czech people'
'Czech'
'Czechs'
'Danish people'
'Danish'
'Danishs'
'Dutch people'
'Dutch'
'Dutchs'
'Dutchs'
'Ecuadorian'
'Egyptians'
'English people'
'English'
'Englishs '
'Englishs'
'Estonians'
'Filipinos'
'Finnish'
'Finnishs'
'French People'
'French people'
'French'
'Frenchs'
'Georgians'
'German people'
'German'
'Germans'
'Germans'
'Greeks'
'Guatemalan'
'Haitians'
'Hawaiians'
'Hungarian people'
'Hungarian'
'Hungarians'
'Icelandics'
'Indian'
'Indians'
'Iranian'
'Irish'
'Irishs'
'Israelis'
'Italian people'
'Italian'
'Italians'
'Japanese'
'Japaneses'
'Latvian'
'Latvians'
'Liechtensteinian'
'Lithuanian'
'Lithuanians'
'Malteses'
'Mexican'
'Mexicans'
'Monegasques'
'Norwegian'
'Norwegians'
'Paraguayans'
'Peruvians'
'Polish people'
'Polish'
'Polishs'
'Portuguese'
'Portugueses'
'Puerto Ricans'
'Romanian'
'Romanians'
'Russian'
'Russians'
'Scottish'
'Scottishs'
'Serbian'
'Serbians'
'Slovakians'
'Slovenian'
'Slovenians'
'Soviet'
'Soviets'
'Spanish'
'Spanishs'
'Swedish'
'Swedishs'
'Swiss people'
'Swiss'
'Swisss'
'Turkishs'
'Ukrainian'
'Ukrainians'
'Uruguayans'
'Venezuelan'
'Venezuelans'
'Welsh'
'Welshs'

Re: Nationality Field

Posted: Sun Mar 27, 2011 4:22 pm
by KGill
That's very strange - I've never seen anything like many of the examples you gave. There should be one of three forms given: the old one ('Austrian composers'), the new correct one ('Austrian'), and an alternate form that appears on only a few pages ('Austrian people'). There shouldn't be any pages whatsoever with (to continue the example) 'Austrians' or 'Austria'. Could you give a few specific examples of composer pages with one of those forms?

Re: Nationality Field

Posted: Sun Mar 27, 2011 4:34 pm
by gardano
KGill wrote:That's very strange - I've never seen anything like many of the examples you gave. There should be one of three forms given: the old one ('Austrian composers'), the new correct one ('Austrian'), and an alternate form that appears on only a few pages ('Austrian people'). There shouldn't be any pages whatsoever with (to continue the example) 'Austrians' or 'Austria'. Could you give a few specific examples of composer pages with one of those forms?
I'll look. But I'm just looking at my parser's output -- what it's gotten when importing the data. As I do searches on the data, I'm not getting hits back for, say, "Frenchs", or "Germans". As you can imagine, doing a full import is a heavy and expensive operation. Next time I do so, I'll see where those items are coming from. Seems they are appearing in a nationality field somewhere, but I have yet to discover where.

I'll let you know.

Re: Nationality Field

Posted: Sun Mar 27, 2011 5:27 pm
by gardano
I see what the problem is. I had a bug in my parsing code.

The list actually looks like this:

'Albanian'
'Algerian composers'
'American composers'
'American people'
'American'
'Argentinian composers'
'Argentinian'
'Armenian composers'
'Australian composers'
'Australian people'
'Australian'
'Austrian composers'
'Austrian'
'Basque composers'
'Belgian composers'
'Belgian'
'Bosnian'
'Brazilian composers'
'Brazilian'
'British composers'
'British'
'Bulgarian composers'
'Canadian composers'
'Canadian'
'Catalan composers'
'Catalan'
'Chilean composers'
'Chilean'
'Chinese composers'
'Colombian composers'
'Croatian composers'
'Croatian'
'Cuban composers'
'Czech composers'
'Czech people'
'Czech'
'Danish composers'
'Danish people'
'Danish'
'Dutch composers'
'Dutch people'
'Dutch'
'Ecuadorian'
'Egyptian composers'
'English composers'
'English people'
'English'
'Estonian composers'
'Filipino composers'
'Finnish composers'
'Finnish'
'French composer'
'French composers'
'French People'
'French people'
'French'
'Georgian composers'
'German composers'
'German people'
'German'
'Greek composers'
'Guatemalan'
'Haitian composers'
'Hawaiian composers'
'Hungarian composers'
'Hungarian people'
'Hungarian'
'Icelandic composers'
'Indian composers'
'Indian'
'Iranian'
'Irish composers'
'Irish'
'Israeli composers'
'Italian composers'
'Italian people'
'Italian'
'Japanese composers'
'Japanese'
'Latvian composers'
'Latvian'
'Liechtensteinian'
'Lithuanian composers'
'Lithuanian'
'Maltese composers'
'Mexican composers'
'Mexican'
'Monegasque composers'
'Norwegian composers'
'Norwegian'
'Paraguayan composers'
'Peruvian composers'
'Polish composers'
'Polish people'
'Polish'
'Portuguese composers'
'Portuguese'
'Puerto Rican composers'
'Romanian composers'
'Romanian'
'Russian composers'
'Russian'
'Scottish composers'
'Scottish'
'Serbian composers'
'Serbian'
'Slovakian composers'
'Slovenian composers'
'Slovenian'
'Soviet composers'
'Soviet'
'Spanish composers'
'Spanish'
'Swedish composers'
'Swedish'
'Swiss composers'
'Swiss people'
'Swiss'
'Turkish composers'
'Ukrainian composers'
'Ukrainian'
'Uruguayan composers'
'Venezuelan composers'
'Venezuelan'
'Welsh composers'
'Welsh'

Re: Nationality Field

Posted: Sun Mar 27, 2011 11:08 pm
by pml
Hi Gardano,

the Nationality field has gradually changed over time as the IMSLP categories gradually enlarged from comprising only composers to its current state of including performers, writers, arrangers, etc. With over 5,500 such pages it requires some effort to overhaul them for consistency. The current preferred way these are handled is that it should only contain the adjectival form, e.g.

|Nationality=Italian

but the way the fte template is implemented, and subsequent words such as “composers” are stripped away when the nationality is added to the appropriate category, e.g. Category:Italian people.

Cheers, PML

Re: Nationality Field

Posted: Mon Mar 28, 2011 1:19 am
by gardano
pml wrote:but the way the fte template is implemented, and subsequent words such as “composers” are stripped away when the nationality is added to the appropriate category, e.g. Category:Italian people.

Cheers, PML
OK thanks for the answer. So to completely understand you, when someone now adds the category "Category:Italian people", ' people' is stripped away, or does the implementation only look for "Composer[s]"?

Thanks,
Gardano

Re: Nationality Field

Posted: Mon Mar 28, 2011 4:21 am
by pml
Hi Gardano,

essentially, the fte template could be fed any of the following lines, and the result would be the page is added to Category:Italian people

|Nationality=Italian composers
|Nationality=Italian people
|Nationality=Italian aardvarks
|Nationality=Italian

Cheers, Philip

Re: Nationality Field

Posted: Mon Mar 28, 2011 2:33 pm
by gardano
Thanks for the answer.

One last question. So is it safe to say that the first word after the "|Nationality=" becomes the nationality field value? I'm asking because of nations like "New Zealand", which wouldn't fit into that assumption...

Re: Nationality Field

Posted: Mon Mar 28, 2011 7:40 pm
by KGill
Wow, it's a good thing you brought that up :o I only now noticed that two-word nationalities (e.g., Puerto Rican) are parsed incorrectly, i.e. they are put into a category based on just the first word. (In this example, there are two victims of this.) I had previously assumed that the code simply removed the last word, as I recall it working correctly almost two months ago when the changes to the system were made. Feldmahler, would it be possible to fix this?

Re: Nationality Field

Posted: Mon Mar 28, 2011 7:48 pm
by gardano
Oh, I like that solution best of all -- just strip out the last word, rather than any other involved logic. And I'm glad our discussion aired issues that need fixing. Fixing is a good thing! :¬)

Re: Nationality Field

Posted: Tue Mar 29, 2011 12:13 am
by KGill
OK, Feldmahler has happily come up with a fix for this - all the nationality categories should now work correctly. The caveat is that one now cannot insert anything other than 'composers' after the nationality (preferably, of course, there shouldn't be anything after it), or else it will break. I've removed all the 'people' that appear in the field across the site, so as of right now everything should be fixed.

Re: Nationality Field

Posted: Tue Mar 29, 2011 1:07 am
by pml
So ignore my second posting in this thread – it is now out-of-date after Feldmahler's change – but my original reply stands! PML

Re: Nationality Field

Posted: Tue Mar 29, 2011 1:18 am
by gardano
Awesome!

It'd be much easier to just look for the string " composers" rather than composers+people, etc, or to do error-prone (on my part) string splitting.

Thanks folks for your really fine quick work!