Australia’s ‘sanitised’ history powering AI


Joseph Brookes
Senior Reporter

Wikipedia pages for Australian cities, national parks and landmarks present a “sanitised” neo-colonial image of the nation, according to new research, raising concerns that artificial intelligence is amplifying and insulating the view from scrutiny.

The UTS research to be released on Tuesday reveals how Wikipedia’s Australian place entries have become a battleground of ideologies while the articles and volunteer editing efforts have skewed towards “white settler perspective” and highly populated areas.

The researchers found instances of “ideologically motivated” editors removing Indigenous place names, while Indigenous history was often reflected with a passive voice or omitted altogether from entries.

New research shows Wikipedia and the choices its volunteer editors make can lead to absences, omissions and sanitised views in articles about Australia. Image: UTS/Rosa Alice

The first of its kind UTS research is based on analysis of 35,000 entries and some of the people responsible for more than half-a-million edits to them. It lays out a key weaknesses in the 23-year-old online encyclopedia, which anyone can write and edit for.

It shows the community driven approach that allows Wikipedia to eschew commercial models and influence still suffers from absences, omissions and sanitised views in articles about Australia’s places.

According to the ARC funded study, the entries are mostly guided by the cities, towns, and administrative divisions founded by European settlers. Ecological and First Nations perspectives must “fight or negotiate to find room within this nationalist European structure”.

Some of the mostly volunteer Wikipedia editors are pushing for greater inclusion of marginalised aspects of Australian places. But others are pushing back based on their own ideologies, while some of the most experienced editors steer away from the issues altogether to avoid conflict.

“What you find is that a lot of reasonable editors are not editing articles about place when they’re potentially contentious, and they’re not trying to add the Indigenous place names, for example,” lead researcher Associate Professor of Digital and Social Media Heather Ford told InnovationAus.com.

“The articles as they stand, they end up being quite sanitised.”

Some of the most contentious topics are whether horses in the Barmah national park should be considered as “wild” or “feral”, while almost half the entries for Howard Springs in the Northern Territory have been reverted because of attempts to label its COVID-19 quarantine facility a “concentration camp”.

Even seemingly non-contentious entries, like Katoomba’s which has barely any reverts among almost 700 edits, omits crucial facts.

The Katoomba entry does not include the forcible removal of Darug and Gundungurra peoples in 1957, despite noting the town’s official declaration as an Aboriginal Place.

Associate Professor Ford says the incomplete or inaccurate place entries in Wikipedia have consequences for shared knowledge and the growing digital tools that rely on the public data like knowledge graphs and artificial intelligence.

Wikipedia is a foundational source for large language models, ranking as the second most used internet site for Google’s AI training data set. All of Wikipedia’s English language pages feed OpenAI’s GPT-3.

Wikipedia entries are also frequently returned in Google search results or in responses from voice assistants like Amazon’s Alexa or Apple’s Siri.

“What Wikipedia presents is often mirrored in these kind of secondary sources that are taught using these massive data sets. Wikipedia is pretty unique in that sense,” Associate Professor Ford said.

The growing reliance on the tools, which are concentrated in the hands of a handful of mostly US companies and often obscure or misrepresent sources, is a “huge concern” for Associate Professor Ford, who is now working researching the relationship between emerging technology and knowledge.

“When we teach our students, we say, ‘this [answer] is just the beginning’. Then you have to go and interrogate the sources,” she tells InnovationAus.com.

“And we don’t even have the tools for doing that anymore because if we wanted to look for the sources, we can’t even do that because they’re just not part of the ways in which these tools are being presented.

“The end result has been a growing reliance on these tools, and this idea that somehow they are omniscient. And they really aren’t. They’re filled with a lot of errors and biased data sets. We should, really, should not be relying on them.”

Associate Professor Ford said Australia must develop its own AI large language models to ensure they reflect national priorities and culture.

“We really can’t rely on these [existing] applications that are really subjectively developed by such a small handful of people, usually in the US and elsewhere.”

Other experts have also backed the development of sovereign AI systems, with one estimating it would need around $100 million in compute costs. One proposal would place a levy on the foreign firms dominating the generative AI market in large part through public data to fund the sovereign systems.

Do you know more? Contact James Riley via Email.

Leave a Comment

Related stories