Looks like a good software for plagiarism

To be clear, I support that. For me it is fair to say what is the source, but it is not necessary. All of us are own to the history of humankind and an enormous number of humans, apes, mammals… — who did something before and because of which work we are able, for example, to eat or to use Internet.  I am not giving my works under PD only because I like viral behavior of GPL, GFDL and CC-BY-SA. And for sure I would not ask that someone deletes my (changed or not) text which is not copied/modified under such terms — if it is done by some (not so bad 😉 ) person, group or small business.

I am sure that I am one of many bloggers who started a blog and who is very often looks into the statistics. From time to time I am frustrated because a lot of views came from the search engine terms like “slimvirgin mi5” and because of that I didn’t finish yet the whole investigation (even I have a conclusion for a long time). I don’t want to give materials for sensationalist press.

But, this text is not about that…

WordPress has a feature called “Incoming Links”. I am not so sure what is the difference between this and “Referrers”, but whatever. Up to yesterday I had only one “incoming link” (from ivanlanin.wordpress.com). Today I saw one more.

Please, compare my text and this one. At the first look I was very confused. It looks like my text, but it isn’t. The first thought on my mind (while I just checked the text) was that this text is based on my text because the author thinks similar what do I think. But, when I read this text more carefully, it became obvious to me that it is my text put in some program which changes texts a little bit, but not a lot (while there are some newly generated nonsenses, of course).

This event tells that it is quite possible to make a program which is able to change text enough that it is not possible to say that it is someone’s text. I compared two text with my tool for finding how two texts are close. This (simple) tool couldn’t conclude that “it is possible that the same person wrote the article” (which is the best which it is able to do), but that “it is possible that it is written in the same language and that it is about the same things”. (If you have some tool for comparing two articles, I would like to hear your results!)

I have to say that I am very happy to see that programs which change articles a little bit  started to exist. (Of course, this is just the beginning of the road and this tool is not so perfect because I am almost sure that I would be able to prove that this article is modified my article.) This means that knowledge will be able to be spread in more better and intelligent ways.

BTW, a guy who did it — did it almost completely according to the terms of CC-BY-SA; he gave a link back.


~ by millosh on September 6, 2007.

6 Responses to “Looks like a good software for plagiarism”

  1. Looks like you have been splogged.

  2. 🙂 Look at the site. It *is* obvious that the intention is to make better Google rank, but it is not a classic type of splog. I was splogged with other, more obvious methods, but this one is quite interesting.

    But, the point is not about what was done to me, but what the method is. Rudiments of some future intelligent text changing are very clear.

    And, of course, I didn’t think that such kind of achievement would come from some big publishing company 😉

  3. Copying an article from a random wiki and slightly distorting it so that Google won’t notice seems like a rather obvious method to me.

    I would guess they machine-translated a few words to some language and back; sometimes these are replaced with more or less synonims (bring – carry, will – ordain, like – desire), others don’t make sense in english, but it would make sense (well – come up, cover – course, document – enter). At any rate, I would not call it intelligent.

  4. Yes, you are right about translation engine usage. I realize that now.

    However, I am much more excited about the fact that people started to do such things, then about not so perfect software. Methods needed to make a better software for this purpose are known to me, which means that it is not so unknown thing.

    But, I will not be a person who would make such software. This will be done by sploggers (or other kind of spammers) for sure. And for the first time I see their (spammers) contribution to the community. No matter if it is intentional or not.

    It will not be directly useful for Wikipedia, but it may have a big consequences toward accessibility of knowledge. In the developed form one splogger may make a very relevant site about some particular field. And it may be used instead some unaccessible papers.

    Also, this guy is doing a good job. I would like to get hist site via some search engine (instead of some without any useful information, like a lot of corporate sites are) because I would be able to read the original document, too.

  5. Emacs has had for years:
    dissociated-press is an interactive autoloaded Lisp function in `dissociate’.
    (dissociated-press &optional ARG)

    Dissociate the text of the current buffer.

  6. I saw now that is that about (Dissociated-Press; Wikipedia article is a copy of Raymond’s article) and I tested it.

    I didn’t think about the software about the software which is able to make “potentially humorous garbage”, but a useful text which statistically may not be connected with the original text.

Comments are closed.

%d bloggers like this: