Tumgik
data-monkey · 5 years
Text
It’s not Yuletide, interestingly enough.  I thought that would be a big effect but it actually isn’t, at least with the way I’m counting fandoms.
Short version: I wanted to make sure I got all the stuff in the big fandoms like Marvel, so I put every fandom in the highest metafandom tag it was part of.  So, for example, fic for the Justice League movie goes into the DCU bucket, but so does stuff for Batgirl, which was a Yuletide fandom. More importantly for Yuletide, I think, that means that, like, all the rare historical RPF gets dumped into the general RPF bucket, and fics for individual books may get dumped into an author’s works tag. This is a great way to understand big fandoms, but a terrible way to understand tiny fandoms.
Once I’ve done that bucketizing, only 20 of the 5000* single-work fandoms are due to Yuletide.  I went spot-checking through those 5000 fandoms and I saw vids and giant multi-fandom crossovers in addition to regular fics in rare fandoms.  I checked about 15 things and didn’t see any that were for specific challenges, so the rate of single-work top-level fandoms that are due to challenges is probably less than ~25%--not nearly enough to explain why the distribution looks like it does.
If you check out the Yuletide fandoms page, 2865 of the 7325 fandoms have only a single work in the Yuletide collection. But if I spot-check those fandoms, most of them have a number of non-Yuletide works as well. 
I do think Yuletide is great at promoting rare fandoms, to be clear! It’s just that you can’t see it very well with the way this particular project handled fandom tags.  (And as I say in a later section of the original post, I think it’s especially good at steering readers to those fics.)
*I messed up the vertical axis on the original plot. It’s right relatively speaking--something twice as big is still twice as big--but the absolute scale is wrong; there are about 5,000 fandoms with only 1 work, not 14,000. Working on a fix.
AO3 stats project: fandoms
Next up: a more detailed analysis of fandoms, the engine of fan works everywhere.
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff
Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
What are the top fandoms on the AO3?
Tumblr media
I pulled this data directly from the Archive fandoms pages in mid-March, just to make sure I was comparing work counts on the same day. And, as it happens, I checked about 3 days after BTS pipped Star Wars to become the 10th-biggest fandom on AO3! You may note that there’s significant overlap between some of these fandoms–K-pop and BTS, Marvel and Avengers–but they are classified as different fandoms so I’m preserving that here. (In a technical sense, while there’s significant overlap between Marvel and the Avengers, Marvel has some works Avengers doesn’t and vice versa.) Edit 4/25: in fact, I had a data processing failure and BTS should have been a subfandom of K-pop all along! I’m leaving the plots for now, but worth keeping in mind.
These fandoms aren’t of equal popularity over time:
Tumblr media
(The height of the curves are relative within each fandom but not correct between fandoms, by the way. The BTS work count is like ⅓ of the Marvel work count, fore example, but it looks taller because a higher fraction of those works were posted in recent years. Basically, all the colored blocks have the same area, so the ones popular over a short time are also taller.)
RPF and Supernatural are nearly-constant juggernauts, while Marvel rises and falls with movie releases, and K-pop has exploded in the last few years. You can also see release dates of Sherlock series reflected in the Sherlock Holmes tag, and Fantastic Beasts in the Harry Potter tag. (And in the old version of this where Star Wars was the 10th biggest fandom, you could REALLY see The Force Awakens.) Marvel has the biggest single day for any fandom–on Dec 24, 2015, there were (at least) 452 Marvel works posted! In fact, we can look at Marvel in more detail. Here’s Marvel posting rates over time, with the MCU movie release dates overplotted:
Tumblr media
Wow–guess we all hated Civil War, lol. In fact, that dip is so big that you can see it on the Archive-wide stats from the previous post–other fandoms had a small dip there, but nothing like Marvel, so it drives most of the decrease you see in mid-2016.
Here’s a fun comparison: the top 10 fandoms by number of works; by total number of hits on all works; and by median hit count per work, for fandoms with at least 1,000 works. Another way to think of this table is: most popular with creators; most popular with readers; and highest reader-to-creator ratio. For an apples-to-apples comparison, I’m using the number of works in my dataset and not the Archive counts, so this top-fandoms-by-works list is a little different from the plot above.
Tumblr media
The total works/total hits lists are not that different, though there’s some obvious order reshuffling. The top fandoms by median hit count list is really different, though, with only Teen Wolf on there from among the top fandoms by hits or number of works. I can think of two explanations for why those fandoms in particular: either they’ve got massively better fic than other fandoms (hard to know why that would be), or there’s a big unmet desire for fic in those fandoms. Maybe a place to write, if you’re looking for lots of approbation. :)
Do fandoms produce works of the same length?
Tumblr media
Kind of surprisingly: no. Those are big differences: the median BTS fic is 70% longer than the median Sherlock or Supernatural fic! Also note how very small these values are. 50% of all the works in Sherlock fandom are under 1705 words. You can also see that in the wordcount histograms in the last post, of course.
A couple of other questions: how many works are there in a typical fandom?
Tumblr media
The most common number is 1! That’s very surprising to me.
I was also curious about how per-work hit counts relate to the number of works in a fandom. Naively, I would think that having more works in a fandom would increase hit counts: a person who reads a fic about fandom X is likely to want to read more fics about fandom X, so you build a self-sustaining readership if there are lots of fics to choose from. Also, since work creators are a subset of work readers, in general, what writers choose to write in is probably a good proxy for what readers are interested in reading; more fics means more people interested means more readers.
Here’s the actual relationship between number of works and median hit count:
Tumblr media
It’s kind of noisy (meaning the points move around a lot), but for fandoms with more than ~5 works, we do see that more works means more hits. The increase actually stops around 1000 works, which I should have predicted above. (I’ve cut off the graph because it’s very noisy above 10,000 works, but the flattening continues.) Apparently, that’s about the point where you have more works in a fandom than even a devoted reader could read. If you have 10 works, or 20 works, then every possible reader can read everything, so more interest means more hits. But once you have more works than people can read, then, basically, adding readers and adding creators cancel each other out in the average hits per work.
Also kind of interesting is that things with <5 works seem to have more hits on average. I suspect this is because of Yuletide, which steers people to rare fandoms they might not read on their own.
Up next: tags.
122 notes · View notes
data-monkey · 5 years
Text
Yup, this plot is already doing that!
Technically, what it’s doing is picking some evenly distributed sample points along the horizontal axis, and then counting up nearby data points to figure out how many many are near that particular point and how near they are to it, which makes the resulting graph smoother.  (It’s a kernel density estimation, to be technical about it.) And then it’s working in logarithmic (multiplicative) space on the x-axis, which means that as the values get bigger, what counts as “nearby” gets more expansive, too.  This graph has a point that is most dependent on the fandoms with a number of works between 8.8 and 9.8, then between 9.8 and 10.9, 10.9 and 12.1 (getting bigger)...by the time you get up to 2345, it’s actually adding up everything between roughly 2150 and 2400 works!
Also, thanks for pointing this out, because it made me realize the vertical axis on these plots has the wrong scale because [insert boring technical explanation here].  There are only about 5,000 fandoms with only 1 work, rather than 14,000. I’ll have a think about how to fix this [insert boring explanation about why this is hard].
AO3 stats project: fandoms
Next up: a more detailed analysis of fandoms, the engine of fan works everywhere.
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff
Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
What are the top fandoms on the AO3?
Tumblr media
I pulled this data directly from the Archive fandoms pages in mid-March, just to make sure I was comparing work counts on the same day. And, as it happens, I checked about 3 days after BTS pipped Star Wars to become the 10th-biggest fandom on AO3! You may note that there’s significant overlap between some of these fandoms–K-pop and BTS, Marvel and Avengers–but they are classified as different fandoms so I’m preserving that here. (In a technical sense, while there’s significant overlap between Marvel and the Avengers, Marvel has some works Avengers doesn’t and vice versa.) Edit 4/25: in fact, I had a data processing failure and BTS should have been a subfandom of K-pop all along! I’m leaving the plots for now, but worth keeping in mind.
These fandoms aren’t of equal popularity over time:
Tumblr media
(The height of the curves are relative within each fandom but not correct between fandoms, by the way. The BTS work count is like ⅓ of the Marvel work count, fore example, but it looks taller because a higher fraction of those works were posted in recent years. Basically, all the colored blocks have the same area, so the ones popular over a short time are also taller.)
RPF and Supernatural are nearly-constant juggernauts, while Marvel rises and falls with movie releases, and K-pop has exploded in the last few years. You can also see release dates of Sherlock series reflected in the Sherlock Holmes tag, and Fantastic Beasts in the Harry Potter tag. (And in the old version of this where Star Wars was the 10th biggest fandom, you could REALLY see The Force Awakens.) Marvel has the biggest single day for any fandom–on Dec 24, 2015, there were (at least) 452 Marvel works posted! In fact, we can look at Marvel in more detail. Here’s Marvel posting rates over time, with the MCU movie release dates overplotted:
Tumblr media
Wow–guess we all hated Civil War, lol. In fact, that dip is so big that you can see it on the Archive-wide stats from the previous post–other fandoms had a small dip there, but nothing like Marvel, so it drives most of the decrease you see in mid-2016.
Here’s a fun comparison: the top 10 fandoms by number of works; by total number of hits on all works; and by median hit count per work, for fandoms with at least 1,000 works. Another way to think of this table is: most popular with creators; most popular with readers; and highest reader-to-creator ratio. For an apples-to-apples comparison, I’m using the number of works in my dataset and not the Archive counts, so this top-fandoms-by-works list is a little different from the plot above.
Tumblr media
The total works/total hits lists are not that different, though there’s some obvious order reshuffling. The top fandoms by median hit count list is really different, though, with only Teen Wolf on there from among the top fandoms by hits or number of works. I can think of two explanations for why those fandoms in particular: either they’ve got massively better fic than other fandoms (hard to know why that would be), or there’s a big unmet desire for fic in those fandoms. Maybe a place to write, if you’re looking for lots of approbation. :)
Do fandoms produce works of the same length?
Tumblr media
Kind of surprisingly: no. Those are big differences: the median BTS fic is 70% longer than the median Sherlock or Supernatural fic! Also note how very small these values are. 50% of all the works in Sherlock fandom are under 1705 words. You can also see that in the wordcount histograms in the last post, of course.
A couple of other questions: how many works are there in a typical fandom?
Tumblr media
The most common number is 1! That’s very surprising to me.
I was also curious about how per-work hit counts relate to the number of works in a fandom. Naively, I would think that having more works in a fandom would increase hit counts: a person who reads a fic about fandom X is likely to want to read more fics about fandom X, so you build a self-sustaining readership if there are lots of fics to choose from. Also, since work creators are a subset of work readers, in general, what writers choose to write in is probably a good proxy for what readers are interested in reading; more fics means more people interested means more readers.
Here’s the actual relationship between number of works and median hit count:
Tumblr media
It’s kind of noisy (meaning the points move around a lot), but for fandoms with more than ~5 works, we do see that more works means more hits. The increase actually stops around 1000 works, which I should have predicted above. (I’ve cut off the graph because it’s very noisy above 10,000 works, but the flattening continues.) Apparently, that’s about the point where you have more works in a fandom than even a devoted reader could read. If you have 10 works, or 20 works, then every possible reader can read everything, so more interest means more hits. But once you have more works than people can read, then, basically, adding readers and adding creators cancel each other out in the average hits per work.
Also kind of interesting is that things with <5 works seem to have more hits on average. I suspect this is because of Yuletide, which steers people to rare fandoms they might not read on their own.
Up next: tags.
122 notes · View notes
data-monkey · 5 years
Text
Oh, interesting, thank you for the information!
AO3 stats project: fun stuff
RANDOM FUN QUESTIONS I can ask of the AO3 data, coming right up!
Keep reading
74 notes · View notes
data-monkey · 5 years
Text
That’s an artifact of how this was plotted, unfortunately.  When I make a plot like this, it’s actually binning up the data: showing the average over a period of, say, 4 weeks, rather than every day. It smooths things out and makes it easier to read.
When I was trying to get this plot to look good, every size of bin I tried ended up with what looks like a spike at the start of 2016.  Here’s what’s happening: there was the expected end-of-2015/beginning-of-2016 spike from all the winter break fests like Yuletide (you can see a little one a year later in 2017, too).  And then right after that, basically, was the big dip in the Marvel fandom that I discussed in a previous post, which dropped the posting rate for the whole AO3.  In the other plots like this, that spike followed by a dip were in the same bin, which averaged them down against each other.  In this one, no matter what I did, any bin size that made the rest of the plot look good split the bump and the dip into two adjacent bins, which make the winter break bump look bigger.  You’re not the only one who noticed this, so in retrospect I should have probably just made the rest of the plot look worse to avoid confusion!
(Also, I feel like I need to apologize for putting this giant post on everybody’s timeline. I edited the post on mobile and it got rid of my “Read More” tag without my noticing. Sorry about that!)
AO3 stats project: tags
In this post, we’ll discuss tags on the Archive of Our Own! Please note: because the works on the Archive include explicit material, some of the tags discussed in this post may not be appropriate for your workplace.
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
The Archive of Our Own has one of the best tagging systems around. You can read more about it here, here, here, or here. For our purposes, the important part is that users can tag their works however they want, and then a group of people called “tag wranglers” sort those tags, either adding them as synonyms of existing tags or creating new canonical versions for them. What I’ll be showing here is the “canonical” version of the tags. For example, a work tagged “flufffffff” or “so fluffy!” would have those two tags assigned to the canonical tag “Fluff”, so I will consider both of those tags as being “Fluff” to get the most accurate count.
The other important thing about AO3 tags is that they come in four flavors. The first one is “warnings”, the content warnings required by the Archive (plus the default tag indicating you’re abstaining from the warnings system). The second flavor is “Characters”, tags describing the characters in the work. The third is “Relationships”, tags describing the platonic or romantic relationships depicted in the work–typically, “X/Y” indicates a romantic and/or sexual relationship between characters X and Y, while “X&Y” means a platonic relationship, although this usage isn’t universal and isn’t enforced. The final category is “freeform”, aka everything else.
Again, the tagging system is freeform and optional. In particular, I’ll note that “character” tags and “relationship” tags don’t necessarily imply each other: you can have a work tagged “Sherlock Holmes/John Watson” that only features Mycroft Holmes, or that features John and Sherlock but doesn’t tag them as characters, only as the relationship. So remember that–while it’s pretty good on average, because people tag their works so readers/viewers can find them–the number of uses of a character tag isn’t the same as the number of works that feature that character, for example.
Okay! So what are the most popular freeform tags on the Archive? If you read a lot of fanfiction, I doubt you will be surprised by anything on this list. Left column is the top 15 tags by number of uses, while right column is the top 15 tags by the cumulative hit count on every work tagged with that tag.
Tumblr media
Are these tags consistently popular over time? For reasons of space, I’ll just plot the top 10 by number of works:
Tumblr media
If you look back at the works vs time plot in the second post, you’ll see that yes, the shape of these trends is similar to the total number of works, so trends in fannish tastes haven’t changed much over the time the AO3 has been in existence. (These show a little more bumpiness because there are fewer works in each plot.) Some of these have gained a little more recent popularity vs earlier works–smut, fluff, and the two specific alternate universes are a little more weighted towards later times, while humor and general AUs are falling a little behind–but the differences aren’t as large as we saw for fandom trends in the previous post.
I’m sure you’re curious about characters and relationships. Here are the top character tags, omitting the catchall character tags of “Original Character(s)”, “Original Male Character(s)”, “Original Female Character(s)”, and “Reader” (all of which would otherwise appear in the top 15). Also, remember this is missing some of the data from 2018 and 2019, as described in the first post, so BTS characters should probably be higher:
Tumblr media
And here are the top relationship tags (again, excluding the catchall “Minor or Background Relationship(s)”):
Tumblr media
And in particular, here are the top characters of color (excluding works with fictionalized race/ethnicity power systems–um, more than modern-day Western society’s power systems are made up–and characters from Voltron Legendary Defender, since I wasn’t able to find enough information on them):
Park Jimin (BTS)
Min Yoongi | Suga
Jeon Jungkook
Kim Taehyung | V
Kim Namjoon | Rm
Jung Hoseok | J-Hope
Kim Seokjin | Jin
Zayn Malik
Katsuki Yuuri
Sam Wilson (Marvel)
Magnus Bane
Nick Fury
Midoriya Izuku
Bakugou Katsuki
Erica Reyes
And here are the top relationships that are not M/M:
Evil Queen | Regina Mills/Emma Swan
Oliver Queen/Felicity Smoak
Bellamy Blake/Clarke Griffin
Clarke Griffin/Lexa
Clint Barton/Natasha Romanov
Pepper Potts/Tony Stark
Captain Hook | Killian Jones/Emma Swan
Kylo Ren/Rey
Hermione Granger/Ron Weasley
Kara Danvers/Lena Luthor
Sherlock Holmes/Molly Hooper
Belle/Rumplestiltskin | Mr. Gold
Allison Argent/Scott McCall
James Potter/Lily Evans Potter
Hermione Granger/Draco Malfoy
Here are the top ten freeform tags for the top ten fandoms. Different fandoms seem to produce different kinds of fanworks–which you’d expect, based on the variety in the source material.
Tumblr media
AU = alternate universe, AU - CD = Alternate universe - canon divergence, AU - C/U = alternate universe - college/university, AU - HS = alternate universe - high school, BJs = blow jobs, ER = established relationship, H/C = hurt/comfort, PWP = plot what plot/porn without plot, RPF = real person fiction, SPN = supernatural.
Finally, for fun, here’s the top 200 tags of all kinds, sorted against each other. You can find a lot of fun things on this list. Some of my favorites:
Supernatural is so big, and so focused on so few characters, that Dean Winchester is the sixth most popular tag on the entire AO3.
Clint Barton is way higher than I would have expected.
Sherlock Holmes is slightly less popular than anal sex.
Original female characters are more popular than anal sex.
Similarly, cuddling is more popular than A/B/O.
Harry Styles is less popular than 3/7ths of BTS (at least as of sometime in 2018); Louis Tomlinson barely tops Draco Malfoy.
Alcohol comes between Katsuki Yuuri and Viktor Nikiforov.
Leonard McCoy is below songfic. Please join me in picturing how pissed off he’d be.
The one-two punch of “Spanking” and “I’m Sorry” is pretty amusing.
If I had put up the top 201 tags, 200 and 201 would have been “Flirting” and “Murder”, so Hannibal is almost on this list.
Fluff
Angst
Alternate Universe
Romance
Hurt/Comfort
Dean Winchester
Humor
Established Relationship
Smut
Sam Winchester
Steve Rogers
Alternate Universe - Canon Divergence
Alternate Universe - Modern Setting
Tony Stark
Friendship
Original Female Character(s)
Anal Sex
Drabble
Original Characters
Sherlock Holmes
Fluff And Angst
Plot What Plot/Porn Without Plot
Castiel/Dean Winchester
One Shot
John Watson
Stiles Stilinski
Castiel
Harry Potter
Oral Sex
James “Bucky” Barnes
Natasha Romanov
Blow Jobs
Clint Barton
Emotional Hurt/Comfort
Drama
Derek Hale
Reader
Slow Burn
Original Male Character(s)
Sherlock Holmes/John Watson
First Time
Alternate Universe - College/University
Kissing
Derek Hale/Stiles Stilinski
First Kiss
Angst With A Happy Ending
Light Angst
Violence
Family
Park Jimin (BTS)
Min Yoongi | Suga
Jeon Jungkook
Crossover
Harry Styles
Crack
Friends To Lovers
Love
Fluff And Smut
Kim Taehyung | V
Louis Tomlinson
Other Additional Tags To Be Added
Draco Malfoy
Alternate Universe - High School
Explicit Sexual Content
Masturbation
Hermione Granger
Pining
Bruce Banner
Anal Fingering
Kim Namjoon | RM
Canon Compliant
Thor (Marvel)
Domestic Fluff
Jung Hoseok | J-Hope
Keith (Voltron)
Kim Seokjin | Jin
Depression
Character Death
Sexual Content
Happy Ending
Harry Styles/Louis Tomlinson
Canon-Typical Violence
James “Bucky” Barnes/Steve Rogers
Lance (Voltron)
Cuddling & Snuggling
Alpha/Beta/Omega Dynamics
Dirty Talk
Post-Traumatic Stress Disorder - PTSD
Post-Canon
Loki (Marvel)
Christmas
Hand Jobs
Scott Mccall
Niall Horan
Sex
Mycroft Holmes
Blood
Shiro (Voltron)
Liam Payne
Rimming
Rough Sex
Zayn Malik
Cute
Original Character(s)
Original Character
Castiel (Supernatural)
Dubious Consent
Phil Coulson
Severus Snape
Ron Weasley
Character Study
Mpreg
Lydia Martin
Explicit Language
Slash
Grief/Mourning
Polyamory
Future Fic
Draco Malfoy/Harry Potter
Minor Character Death
Greg Lestrade
Steve Rogers/Tony Stark
Peter Parker
Mutual Pining
Swearing
Eventual Smut
Sirius Black
Dom/Sub
Pre-Slash
Sad
Love Confessions
Unrequited Love
Alternate Universe - Soulmates
Remus Lupin
Dean Winchester/Sam Winchester
Falling In Love
Jealousy
Spanking
I’m Sorry
Pepper Potts
Death
Keith/Lance (Voltron)
Spoilers
James T. Kirk
Hunk (Voltron)
Sans (Undertale)
Emma Swan
Gabriel (Supernatural)
Fluff And Humor
Magic
Torture
Alternate Universe - Human
Bruce Wayne
Isaac Lahey
Levi (Shingeki No Kyojin)
Eren Yeager
Clarke Griffin
Self-Harm
Slow Build
Victor Nikiforov
Alcohol
Katsuki Yuuri
Suicidal Thoughts
Implied Sexual Content
BDSM
Nightmares
Canonical Character Death
Action/Adventure
Sam Wilson (Marvel)
Developing Relationship
Adrien Agreste | Chat Noir/Marinette Dupain-Cheng | Ladybug
Bondage
Friendship/Love
Tooth-Rotting Fluff
Dark
Sheriff Stilinski
Mental Health Issues
Allison Argent
Reader-Insert
Slice Of Life
Allura (Voltron)
Kurt Hummel
Getting Together
Kidnapping
Katsuki Yuuri/Victor Nikiforov
Dick Grayson
Merlin (Merlin)
Panic Attacks
Heavy Angst
Comfort
Alec Lightwood
Pre-Canon
Ficlet
Kid Fic
Adrien Agreste | Chat Noir
Implied/Referenced Character Death
Songfic
Leonard Mccoy
First Meetings
Flirting
60 notes · View notes
data-monkey · 5 years
Text
AO3 stats project: fun stuff
RANDOM FUN QUESTIONS I can ask of the AO3 data, coming right up!
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff
Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
All right. You should probably read the earlier posts in this series if you want to really understand what's happening. This is basically my "fun stuff" post: all the questions that aren't grand, sweeping statements about what's on the Archive, but rather my own personal quirky interests. :)
First up: Drabbles! My most pointless hill to die on is that a drabble is exactly 100 words. Do creators on the Archive agree?
Tumblr media
KIND OF! There's definitely a spike of things at 100 words. It’s actually big enough that you can see it on the general word count histogram from the basic demographics post. There's also a sense of "drabble is a short fic" (what I would call a "ficlet") of approximately 520 words, give or take...a lot. Okay, fine. (I'm still right.)
I looooove AUs. What are the most popular AUs, and how have they changed over time?
Tumblr media
As with the previous plots I’ve shown like this, the height of the curves are relative within each tag but not correct between tags. Soulmate AUs are way less popular than modern setting AUs, but their curve looks taller because a higher fraction of works with that tag were posted in a short span of time.
Things to note: The biggest AUs (modern setting, canon divergence, college/university, and high school) all approximately trace the growth of fics on AO3 over time. I thought A/B/O would be the most recently popular AU type, but actually it looks like soulmates is even more skewed to recent times than A/B/O, which I didn’t expect! Human AUs have gotten less popular lately (I think this was really popular in Teen Wolf, so as the rate of Teen Wolf works has slowed, the number of human AUs has also slowed). I’m not sure what fandom liked canon AUs, or even what that is, but it was really popular in 2012-2013 nevertheless.
I was also curious about crossovers. Like I mentioned briefly in the fandoms post, there are things that are in two different fandom categories that we probably wouldn’t think of as an actual crossover: an Avengers fic can be in both Avengers and Marvel, but most readers would recognize that as the same universe. I made a list fandoms that appear together on works, weeded out the ones that look like same-universe crossovers, and came up with this list:
Tumblr media
DCU and Marvel should have been predictable, I think. A little more surprising is that SuperWhoLock actually did have an effect on the number of crossovers, by a lot--you can see all pairs of those three fandoms on this list. (If a work had three fandoms A, B, and C, that counted once as a crossover for A and B, once for A and C, and once for B and C.) Harry Potter is a common crossover too, which I expected, as I’ve read a Harry Potter AU in every fandom I’ve been in that wasn’t Harry Potter.
But this graph has my single favorite piece of information in this entire analysis: there are a ton of Guardians of Childhood & Hiccup Series crossovers. If you’re not familiar, those are two canon sources made into Dreamworks animated movies: Rise of the Guardians and How To Train Your Dragon. I had no idea there was a large fanfic fandom for those, let alone that there would be thousands of works that cross them over!! Isn’t that awesome?
I used to be in Harry Potter fandom, so here's some fun with different Harry Potter ships. The top ten ships are:
Tumblr media
Okay. I could have predicted...some of those. Note that this doesn’t mean those are the top main pairings: I expect that James/Lily is a common side pairing for Sirius/Remus, and that Hermione/Ron appears a lot in Draco/Harry fics. Draco/Harry is much more popular than my old stomping grounds of Harry/Snape, which I think was also true at the time, although the works represented on the Archive probably don’t include most of what I was reading in 2003.
Actually, we can check my point about secondary pairings. Here’s a correlation matrix for those top ten ships (remember, green means they’re correlated--appear as tags on the same work more often than you’d expect based on chance--and pink means they’re anti-correlated, or appear together less often than you’d expect):
Tumblr media
Yeah, absolutely, some of these are likely secondary pairings for popular ships. Actually, there’s one obvious set of pairings I missed--the canon pairings, Harry/Ginny and Hermione/Ron, which are EXTREMELY correlated. You do also see some Hermione/Ron with Draco/Harry (it’s actually a big correlation despite the light color, because the scale has to account for the five HUNDRED percent relative likelihood of Harry/Ginny and Hermione/Ron). And, yep, James/Lily and Sirius/Remus are together. But apparently there’s also some Harry/Ginny with Hermione/Draco (not sure I would have picked that--or, actually, that Hermione/Draco would be this high on the list) and, more obviously, Harry/Ginny with Scorpius/Albus Severus.
There's a genre of fics that I see sometimes when browsing: the "Reader/somebody" genre, where you're explicitly supposed to insert yourself into the fic. (Some of these have Y/N scattered throughout the fic, for "your name", for example.) Who are the top characters that get paired with "Reader"?
Tumblr media
Those aren't the same as the top character tags. So this kind of thing is serving a sub-audience of AO3 readers, different in certain ways from the typical reader, I guess. I don’t think I would have been able to predict this list in any way at all, either. I really like that fandom can encompass so many things that are different from what I read!
Finally: older fandoms on AO3. I asked my fandom friends to come up with what they thought of as "classic" fanfiction fandoms. Then I excluded anything with new (major) canon later than Nov. 15, 2009, when the Archive opened for beta, and anything with less than 1,000 works. Here's how those fandoms are faring on AO3.
Tumblr media
I LOVE THAT THEY ARE STILL AROUND. I love fandom. That's all. Thanks for reading.
74 notes · View notes
data-monkey · 5 years
Text
AO3 stats project: kudos
Up next: what kinds of works get kudos?
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff
Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
I want to be really specific before we get into this: I don't mean "which works are better." There's a perception I see sometimes in fandom that more kudos = better material (e.g. when you're sorting on the works page). That may be true within narrow categories, but kudos also relate to things like "was this emotionally moving vs technically skilled or both or neither" and "what would I be embarrassed to be seen liking" (since the usernames of people who leave kudos appear at the bottom of the page). So exactly why some things have more kudos than others is outside the scope of this post.
The first thing to look at is how the number of kudos scales with the number of hits. Obviously, if more people look at your work, more people are available to leave kudos. But we wouldn't expect the same ratio at low hit counts and high hit counts. That is, if 10 of the first 100 people who looked at your work left kudos on it, we would expect fewer than 10 of the next 100 to do it. That's basically because the top fans--of the author, of the genre, whatever--are more likely to read it right away when it's new, while people who read it later may not fit into the target audience quite as well. Even if a lot of them like it, statistically, you might expect that fewer of them will hit the kudos button. Also, because you can only leave kudos once, repeat visits will increase the hit counts without increasing the kudos counts; that will matter for works that people like to reread, or for works that were posted one chapter at a time.
So here's kudos vs hits. The blue-colored region is showing where the most works are: low hits and low kudos, like we already knew. The heavy purple line is the average trend. If you look very closely at the low end, you can see it bends a little bit: the average kudos-to-hits ratio is higher for low-hit-count things, like we predicted.
Tumblr media
In fact, we can just plot that (average) ratio:
Tumblr media
We can do the same for bookmarks and comments. (I’m just going to show the ratio from now on, because it’s easier to interpret.)
Tumblr media Tumblr media
Those are pretty similar, with fewer overall numbers, and slightly more comments than bookmarks. I’m not sure what to make of the fact that bookmark & comment ratios increase with the number of hits. I don’t really use either bookmarks or comments, so maybe I’m missing something of the essential psychology there.
Okay. That’s averaged over all works on AO3. Do things change if we subdivide the works into categories?
First up: have the number of kudos per fic stayed the same over time? The first few years of the Archive didn't have the kudos feature, so the oldest works we would expect to be very low, but did things change after that?
Tumblr media
Yes. I was a little surprised by this, but it does look robust: Works seem to get peak kudos (adjusted for hit counts) in 2016 and 2017, with less both before and after. I think there are a few possible explanations for this besides "people got more likely to leave kudos and then less likely to leave kudos." One would be a difference in other kinds of reader behavior: for example, repeat visits to the same work will drive down the kudos-to-hit-count ratio, because you can only leave kudos once; or, if people click on a work to mark it for later, that can register as another hit without actually being another human reader. Another would be a difference in reader populations: maybe the same people who have always left kudos are still leaving kudos, but the AO3 is now big enough that we also get readers who aren’t in fandom social circles and are less likely to hit that button.  Also, if works in progress get a lower kudos/hits ratio, then more recent stories with a higher WiP rate might have a lower average--although, as I’ll show below, I don’t think it’s large enough to explain this discrepancy.
How about word count?
Tumblr media
By the way: the labels have the same color as the lines they go with. I’m using an algorithm that tries to place the labels so they don’t overlap, which sometimes means the labels aren’t that close to the line they go with--for example, here, the 3,000-10,000 word line is actually mostly covered by its label, and there will be some cases below where the labels are pretty far off. Just match the colors and you should be able to figure out which is which!
Anyway, peak kudos occur around the 3,000-30,000 word count. Lower for the very longest and very shortest things. My anecdotal experience is that very long works are most likely to be ficlet collections or otherwise works in progress, and so you might expect more repeat visitors to those works as authors add new chapters. Actually, let's check on works in progress in general. Here's complete works and works in progress:
Tumblr media
You do see signs that having a WiP gets you fewer kudos per hit. Again, this is almost certainly because the same readers come back for successive chapters but can't leave multiple kudos. The differences aren’t very large, though--look at the y-axis.  The different years range anywhere from 0.03 to 0.08 at a low hit count, but the same range here is like 0.060 to 0.065!
Ratings?
Tumblr media
Here’s an example where the labels are pretty far off the lines--“Not Rated” is the green line that ends under the second “e” of “General Audiences.” Teen and gen are a little higher, explicit a little lower--I would guess some combination of more repeat visits to explicit fics and possible embarrassment at leaving kudos on fics with certain kinds of content. (Also, higher ratings are somewhat more likely to have warnings on them, which may change the emotional calculus that goes into whether you hit the kudos button.) But again, these differences aren’t that big--nowhere near the differences for years or word counts, for example, if you look at the range the vertical axis is spanning.
How about pairing type?
Tumblr media
As a reminder, those pairing types are gen (no romantic or sexual relationship); male/male, male/female, and female/female; then “multi” meaning either multiple types of pairings, or a relationship with more than two people; and “other” meaning everything that doesn’t fit neatly into those categories. I could have seen this plot going either way--M/M is the most popular so people like it, or other things are less popular and so readers reward them more. Looks like it’s the former. Not sure what to make of “other” and “multi” being the lowest.
Are fandoms different?
Tumblr media
Indeed they are. (I don’t think this is entirely explained by differing kudos rates & fandom popularity over time, although that's probably some of it.) How about tags?
Tumblr media
I feel like there's a paper somewhere in this plot alone...
One last note. As I mentioned, people tend to view kudos as a measure of quality. I have no way to measure actual quality of works. But I do have one way to think about it: Creators are likely to produce similar-quality work. (Not always, as fans of many published authors can tell you. But often.) So how much do individual authors' kudos counts vary? As I mentioned in the first post, I have a policy of not showing data for individual authors here. But I can show my own, since I can give informed consent as to this usage of my data. :) So here's the average kudos vs hit count line, with my fics shown as dots around the line.
Tumblr media
I would definitely say my works have variable quality (particularly because I included all my own backdated works, meaning my fics span more than 15 years of my life--and I hope I'm a better writer now than I was when I was 18!). But you can see that the kudos to hit count ratios are ALL OVER the place. Twice the value of the line, 1/10th the value of the line--really varying, right?
So here's how I made the plot I'm about to show you. For every creator in the database, I look at their kudos vs hit counts for all their fics, and I measure how far away from the average line they are (and whether they're above it or below it). And then I measure what's called the “standard deviation” of those distances--basically, asking how much those numbers vary. A really low value means of an author’s works are the same--and that could be "every work has only 10% of the average number of kudos" or "every work has 1000% the average number of kudos"; doesn't matter, it just means they're consistent. On the other hand, a high value would mean, well, somebody like me: with some works that get a lot of kudos, and some that don’t. Here's the distribution of that standard deviation, for all authors with more than one fic and at least 1000 total hits on all their fics:
Tumblr media
So you can see it does, actually, vary a lot. My particular number, by this count, is 0.44, and you can see that that's a little high but still pretty typical--even though it's a lot of variation if you look at it! So I think this is support for my statement earlier, that you shouldn't think of kudos (or kudos/hits) as a measure of quality, for some definition of quality. It's measuring something--but it's way more complicated than quality.
One more post after this: just some fun questions.
44 notes · View notes
data-monkey · 5 years
Text
AO3 stats project: correlations
Okay. Now for some really fun stuff: correlations between different categories of metadata (like, correlations between ratings and tags, character tags and genre tags, etc). And as with the previous post in this series, some of the content I discuss may not be strictly work-appropriate.
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff
Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
All right. We've got a list of top tags now. How do those tags relate to each other? That is, if I have a work labeled "Fluff", does that change how likely it is that that work will also be labeled "Angst"? I'm plotting, here, a matrix that answers that question directly. I can compute how often "Fluff" and "Angst" would appear together if they were just randomly assigned to all 4.3 million works that I collected metadata for. The blocks I'm showing below are colored by whether the actual number of times "Fluff" and "Angst" appear together is greater or lesser than that expectation. (We do it that way, instead of just counting the raw number of connections, because otherwise it looks like everything is correlated with "Fluff", just because there are a lot of works labeled "Fluff" in the data set.) If I pick two tags--say, "Fluff" on the bottom, and "Angst" on the left--I can follow them to where they intersect, and the color of that little block tells me about whether they're correlated. Things that are pink are less likely than you'd expect to appear together, while things that are green are more likely. The diagonal line is always that pale green-grey color because, by definition, "Romance" appears with "Romance" exactly as often as you'd expect, and the graph is symmetric across that diagonal line because it doesn't matter if we take "Fluff" then "Angst" or "Angst" then "Fluff."
Tumblr media
So one really interesting thing is that this plot is mostly green (so things are correlated). Not only do these tags appear a lot, but they appear together more than you’d expect, and even when they’re anti-correlated--that is, when being labeled one makes you less likely to be labeled another--it’s not by very much. The strongest correlation is between “Romance” and “Humor”, so I guess the rom com is alive and well! Angst and hurt/comfort also appear together a lot, which I suppose makes sense.
We can make correlation plots like this for other things. How about pairing types?
Tumblr media
Per the AO3, “Multi” means “more than one kind of relationship, or a relationship with multiple partners”, and “Other” means “everything not covered by the other labels”. The structure of this is kind of interesting, too. Remember, pink means things are anti-correlated (appear together less often than expected) and green means they’re correlated (appear together MORE often than expected), so M/M is the Lone Ranger of pairing types, making all other pairing types less likely if it's included. Even more than Gen, which you’d expect to exclude other categories!
Now for the REALLY fun stuff: how do all of these things correlate with other things? Here's an obvious one: ratings vs tags. No more symmetry, because we're showing different things on the two axes.
Tumblr media
Not too surprised by this: “Smut” has a really strong relationship with rating, because most works of erotica deserve the higher ratings. The other tags are much less correlated with rating; Fluff and Humor incline to lower ratings, Established Relationship to higher ratings, and the others are kind of in the middle.
Do ratings and tags correlate with pairing type?
Tumblr media
Hmm. Interesting. Romance is way more likely to be F/M than you’d expect. (Do we not write as many M/M romances, or do we just call them something else? A couple of friends also pointed out to me that this might mean “romance” as in “the publication genre of romance” not as in “romantic plotlines generally”, which makes sense and would make them more M/F-heavy given publication trends.) Established relationship is very skewed to M/M. Gen anti-correlates with most of the tags you’d expect. Apparently only single-pairing romantic relationships can be fluffy. F/F is neither as funny nor as angsty as chance would indicate.
Tumblr media
I think this pattern can be explained this way: Gen is way more likely to be a low rating, and the trend you see for most other things is just that we’re comparing to the average--if Gen is way more likely to be rated General Audiences than is typical, then the other pairing types have to be slightly less General Audiences than you’d expect to make up for it. (This argument doesn’t necessarily apply to the other correlations I was showing, because in those plots there are a bunch of tags I’m not showing and because non-ratings tags can appear together.)
How about the top relationship tags--do they correlate with anything?
Tumblr media
Huh. Well, most of these ships have preferentially high ratings--I think that’s the same effect as in the rating and pairing correlation: things without a romantic/sexual relationship have lower ratings, so on average the works containing relationships will have a higher rating. That’s not universal--look at Magnus/Alec, for example--but it’s common. The other two obvious things here are 1) Dean/Sam really skews to high ratings, 2) apparently Harry/Louis fans reject the rating system.
Tumblr media
Okay, that’s...less interesting than I was expecting. Lots of Harry/Louis smut, lots of Keith/Lance modern AUs. The most likely established relationship is Derek/Stiles. Magnus/Alex and Yuuri/Victor are the fluffiest. Not much romance, except for Draco/Harry, and that pairing also has an unusual amount of humor.
What about character tags?
Tumblr media
Hmm. Looks like there are a lot of teen-rated Marvel works, and Supernatural leans towards the higher ratings (which we already knew).
Tumblr media
This basically just repeats stuff we already noticed in the relationships plot, I think. One thing I didn’t notice up there is that John and Sherlock are not that likely to be tagged in Established Relationship works, which is kind of interesting as they’re long-term partners (not necessarily romantic partners) in most versions of the canon.
How about characters and pairing type...are there characters that appear more often in one kind of pairing than you'd expect based on randomness? (Note that all these characters appear most in M/M stories, because those are by far the most common--this is just asking a relative question about how much they appear in other kinds of stories.)
Tumblr media
Mostly not super interesting, I have to say. Steve, Tony, Natasha, and Harry Potter are all more likely than usual to appear in poly relationships or in F/M stories, apparently, and Stiles and Castiel are less likely than usual to appear in gen works.
Finally, a really fun thing (that I have to link to an external site to do, because tumblr doesn’t like javascript in posts). Instead of just looking at the top 10 tags, here are the top 100 tags portrayed as dots, arranged in a connected graph: the dots represent tags (with the popularity of the tag represented by its size), and they’re connected by lines whose thickness indicates how often the two things appear together, relative to chance. You can also use this kind of setup to work out sets of interconnected tags that are more closely tied to each other than to the other tags. I colored those sets with different colors, so you can identify them. Hovering your mouse pointer over a dot should tell you which tag it is.
Here are the blocks of tags that the algorithm found:
Alternate Universe, Alternate Universe - College/University, Alternate Universe - High School, Alternate Universe - Human, Alternate Universe - Modern Setting, Alternate Universe - Soulmates, Christmas, Crack, Cute, Domestic Fluff, Fluff, Fluff And Humor, Humor, Light Angst, One Shot, Romance, Tooth-Rotting Fluff
 Angst, Blood, Canon-Typical Violence, Canonical Character Death, Character Death, Dark, Death, Depression, Emotional Hurt/Comfort, Grief/Mourning, Hurt/Comfort, I'm Sorry, Magic, Minor Character Death, Nightmares, Post-Traumatic Stress Disorder - PTSD, Sad, Self-Harm, Suicidal Thoughts, Torture, Violence
Action/Adventure, Alternate Universe - Canon Divergence, Canon Compliant, Character Study, Crossover, Drabble, Drama, Family, Friendship, Future Fic, Post-Canon, Pre-Slash, Spoilers
Alcohol, Alpha/Beta/Omega Dynamics, Established Relationship, Explicit Language, Explicit Sexual Content, First Time, Fluff And Smut, Jealousy, Kissing, Mpreg, Polyamory, Sex, Sexual Content, Slash, Smut
Anal Fingering, Anal Sex, Bdsm, Blow Jobs, Bondage, Dirty Talk, Dom/Sub, Dubious Consent, Hand Jobs, Masturbation, Oral Sex, Plot What Plot/Porn Without Plot, Rimming, Rough Sex, Spanking
Angst With A Happy Ending, Developing Relationship, Eventual Smut, Falling In Love, First Kiss, Fluff And Angst, Friends To Lovers, Friendship/Love, Happy Ending, Implied Sexual Content, Love, Love Confessions, Mutual Pining, Other Additional Tags To Be Added, Pining, Slow Build, Slow Burn, Swearing, Unrequited Love
I love this! To me, those sets look like: fluffy plot-based tags, violence and disturbing content tags, more action-oriented plot-based tags, less explicit vanilla-ish erotica tags, more explicit or kinky erotica tags, and romance. That’s so cool. (Not everything makes sense--why is Drabble where it is?--but still, cool.)
Or in other words: some numerical routines correctly identified the porn. :)
Up next: what gets kudos?
31 notes · View notes
data-monkey · 5 years
Text
AO3 stats project: tags
In this post, we'll discuss tags on the Archive of Our Own! Please note: because the works on the Archive include explicit material, some of the tags discussed in this post may not be appropriate for your workplace.
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
The Archive of Our Own has one of the best tagging systems around. You can read more about it here, here, here, or here. For our purposes, the important part is that users can tag their works however they want, and then a group of people called "tag wranglers" sort those tags, either adding them as synonyms of existing tags or creating new canonical versions for them. What I'll be showing here is the "canonical" version of the tags. For example, a work tagged "flufffffff" or "so fluffy!" would have those two tags assigned to the canonical tag "Fluff", so I will consider both of those tags as being "Fluff" to get the most accurate count.
The other important thing about AO3 tags is that they come in four flavors. The first one is "warnings", the content warnings required by the Archive (plus the default tag indicating you're abstaining from the warnings system). The second flavor is "Characters", tags describing the characters in the work. The third is "Relationships", tags describing the platonic or romantic relationships depicted in the work--typically, "X/Y" indicates a romantic and/or sexual relationship between characters X and Y, while "X&Y" means a platonic relationship, although this usage isn't universal and isn't enforced. The final category is "freeform", aka everything else.
Again, the tagging system is freeform and optional. In particular, I'll note that "character" tags and "relationship" tags don't necessarily imply each other: you can have a work tagged "Sherlock Holmes/John Watson" that only features Mycroft Holmes, or that features John and Sherlock but doesn't tag them as characters, only as the relationship. So remember that--while it's pretty good on average, because people tag their works so readers/viewers can find them--the number of uses of a character tag isn't the same as the number of works that feature that character, for example.
Okay! So what are the most popular freeform tags on the Archive? If you read a lot of fanfiction, I doubt you will be surprised by anything on this list. Left column is the top 15 tags by number of uses, while right column is the top 15 tags by the cumulative hit count on every work tagged with that tag.
Tumblr media
Are these tags consistently popular over time? For reasons of space, I’ll just plot the top 10 by number of works:
Tumblr media
If you look back at the works vs time plot in the second post, you'll see that yes, the shape of these trends is similar to the total number of works, so trends in fannish tastes haven’t changed much over the time the AO3 has been in existence. (These show a little more bumpiness because there are fewer works in each plot.) Some of these have gained a little more recent popularity vs earlier works--smut, fluff, and the two specific alternate universes are a little more weighted towards later times, while humor and general AUs are falling a little behind--but the differences aren’t as large as we saw for fandom trends in the previous post.
I'm sure you're curious about characters and relationships. Here are the top character tags, omitting the catchall character tags of “Original Character(s)”, “Original Male Character(s)”, “Original Female Character(s)”, and “Reader” (all of which would otherwise appear in the top 15). Also, remember this is missing some of the data from 2018 and 2019, as described in the first post, so BTS characters should probably be higher:
Tumblr media
And here are the top relationship tags (again, excluding the catchall “Minor or Background Relationship(s)”):
Tumblr media
And in particular, here are the top characters of color (excluding works with fictionalized race/ethnicity power systems--um, more than modern-day Western society’s power systems are made up--and characters from Voltron Legendary Defender, since I wasn’t able to find enough information on them):
Park Jimin (BTS)
Min Yoongi | Suga
Jeon Jungkook
Kim Taehyung | V
Kim Namjoon | Rm
Jung Hoseok | J-Hope
Kim Seokjin | Jin
Zayn Malik
Katsuki Yuuri
Sam Wilson (Marvel)
Magnus Bane
Nick Fury
Midoriya Izuku
Bakugou Katsuki
Erica Reyes
And here are the top relationships that are not M/M:
Evil Queen | Regina Mills/Emma Swan
Oliver Queen/Felicity Smoak
Bellamy Blake/Clarke Griffin
Clarke Griffin/Lexa
Clint Barton/Natasha Romanov
Pepper Potts/Tony Stark
Captain Hook | Killian Jones/Emma Swan
Kylo Ren/Rey
Hermione Granger/Ron Weasley
Kara Danvers/Lena Luthor
Sherlock Holmes/Molly Hooper
Belle/Rumplestiltskin | Mr. Gold
Allison Argent/Scott McCall
James Potter/Lily Evans Potter
Hermione Granger/Draco Malfoy
Here are the top ten freeform tags for the top ten fandoms. Different fandoms seem to produce different kinds of fanworks--which you'd expect, based on the variety in the source material.
Tumblr media
AU = alternate universe, AU - CD = Alternate universe - canon divergence, AU - C/U = alternate universe - college/university, AU - HS = alternate universe - high school, BJs = blow jobs, ER = established relationship, H/C = hurt/comfort, PWP = plot what plot/porn without plot, RPF = real person fiction, SPN = supernatural.
Finally, for fun, here's the top 200 tags of all kinds, sorted against each other. You can find a lot of fun things on this list. Some of my favorites:
Supernatural is so big, and so focused on so few characters, that Dean Winchester is the sixth most popular tag on the entire AO3.
Clint Barton is way higher than I would have expected.
Sherlock Holmes is slightly less popular than anal sex.
Original female characters are more popular than anal sex.
Similarly, cuddling is more popular than A/B/O.
Harry Styles is less popular than 3/7ths of BTS (at least as of sometime in 2018); Louis Tomlinson barely tops Draco Malfoy.
Alcohol comes between Katsuki Yuuri and Viktor Nikiforov.
Leonard McCoy is below songfic. Please join me in picturing how pissed off he’d be.
The one-two punch of “Spanking” and “I’m Sorry” is pretty amusing.
If I had put up the top 201 tags, 200 and 201 would have been “Flirting” and “Murder”, so Hannibal is almost on this list.
Fluff
Angst
Alternate Universe
Romance
Hurt/Comfort
Dean Winchester
Humor
Established Relationship
Smut
Sam Winchester
Steve Rogers
Alternate Universe - Canon Divergence
Alternate Universe - Modern Setting
Tony Stark
Friendship
Original Female Character(s)
Anal Sex
Drabble
Original Characters
Sherlock Holmes
Fluff And Angst
Plot What Plot/Porn Without Plot
Castiel/Dean Winchester
One Shot
John Watson
Stiles Stilinski
Castiel
Harry Potter
Oral Sex
James "Bucky" Barnes
Natasha Romanov
Blow Jobs
Clint Barton
Emotional Hurt/Comfort
Drama
Derek Hale
Reader
Slow Burn
Original Male Character(s)
Sherlock Holmes/John Watson
First Time
Alternate Universe - College/University
Kissing
Derek Hale/Stiles Stilinski
First Kiss
Angst With A Happy Ending
Light Angst
Violence
Family
Park Jimin (BTS)
Min Yoongi | Suga
Jeon Jungkook
Crossover
Harry Styles
Crack
Friends To Lovers
Love
Fluff And Smut
Kim Taehyung | V
Louis Tomlinson
Other Additional Tags To Be Added
Draco Malfoy
Alternate Universe - High School
Explicit Sexual Content
Masturbation
Hermione Granger
Pining
Bruce Banner
Anal Fingering
Kim Namjoon | RM
Canon Compliant
Thor (Marvel)
Domestic Fluff
Jung Hoseok | J-Hope
Keith (Voltron)
Kim Seokjin | Jin
Depression
Character Death
Sexual Content
Happy Ending
Harry Styles/Louis Tomlinson
Canon-Typical Violence
James "Bucky" Barnes/Steve Rogers
Lance (Voltron)
Cuddling & Snuggling
Alpha/Beta/Omega Dynamics
Dirty Talk
Post-Traumatic Stress Disorder - PTSD
Post-Canon
Loki (Marvel)
Christmas
Hand Jobs
Scott Mccall
Niall Horan
Sex
Mycroft Holmes
Blood
Shiro (Voltron)
Liam Payne
Rimming
Rough Sex
Zayn Malik
Cute
Original Character(s)
Original Character
Castiel (Supernatural)
Dubious Consent
Phil Coulson
Severus Snape
Ron Weasley
Character Study
Mpreg
Lydia Martin
Explicit Language
Slash
Grief/Mourning
Polyamory
Future Fic
Draco Malfoy/Harry Potter
Minor Character Death
Greg Lestrade
Steve Rogers/Tony Stark
Peter Parker
Mutual Pining
Swearing
Eventual Smut
Sirius Black
Dom/Sub
Pre-Slash
Sad
Love Confessions
Unrequited Love
Alternate Universe - Soulmates
Remus Lupin
Dean Winchester/Sam Winchester
Falling In Love
Jealousy
Spanking
I'm Sorry
Pepper Potts
Death
Keith/Lance (Voltron)
Spoilers
James T. Kirk
Hunk (Voltron)
Sans (Undertale)
Emma Swan
Gabriel (Supernatural)
Fluff And Humor
Magic
Torture
Alternate Universe - Human
Bruce Wayne
Isaac Lahey
Levi (Shingeki No Kyojin)
Eren Yeager
Clarke Griffin
Self-Harm
Slow Build
Victor Nikiforov
Alcohol
Katsuki Yuuri
Suicidal Thoughts
Implied Sexual Content
BDSM
Nightmares
Canonical Character Death
Action/Adventure
Sam Wilson (Marvel)
Developing Relationship
Adrien Agreste | Chat Noir/Marinette Dupain-Cheng | Ladybug
Bondage
Friendship/Love
Tooth-Rotting Fluff
Dark
Sheriff Stilinski
Mental Health Issues
Allison Argent
Reader-Insert
Slice Of Life
Allura (Voltron)
Kurt Hummel
Getting Together
Kidnapping
Katsuki Yuuri/Victor Nikiforov
Dick Grayson
Merlin (Merlin)
Panic Attacks
Heavy Angst
Comfort
Alec Lightwood
Pre-Canon
Ficlet
Kid Fic
Adrien Agreste | Chat Noir
Implied/Referenced Character Death
Songfic
Leonard Mccoy
First Meetings
Flirting
60 notes · View notes
data-monkey · 5 years
Text
AO3 stats project: fandoms
Next up: a more detailed analysis of fandoms, the engine of fan works everywhere.
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff
Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
What are the top fandoms on the AO3?
Tumblr media
I pulled this data directly from the Archive fandoms pages in mid-March, just to make sure I was comparing work counts on the same day. And, as it happens, I checked about 3 days after BTS pipped Star Wars to become the 10th-biggest fandom on AO3! You may note that there’s significant overlap between some of these fandoms–K-pop and BTS, Marvel and Avengers–but they are classified as different fandoms so I’m preserving that here. (In a technical sense, while there’s significant overlap between Marvel and the Avengers, Marvel has some works Avengers doesn’t and vice versa.) Edit 4/25: in fact, I had a data processing failure and BTS should have been a subfandom of K-pop all along! I'm leaving the plots for now, but worth keeping in mind.
These fandoms aren’t of equal popularity over time:
Tumblr media
(The height of the curves are relative within each fandom but not correct between fandoms, by the way. The BTS work count is like ⅓ of the Marvel work count, fore example, but it looks taller because a higher fraction of those works were posted in recent years. Basically, all the colored blocks have the same area, so the ones popular over a short time are also taller.)
RPF and Supernatural are nearly-constant juggernauts, while Marvel rises and falls with movie releases, and K-pop has exploded in the last few years. You can also see release dates of Sherlock series reflected in the Sherlock Holmes tag, and Fantastic Beasts in the Harry Potter tag. (And in the old version of this where Star Wars was the 10th biggest fandom, you could REALLY see The Force Awakens.) Marvel has the biggest single day for any fandom–on Dec 24, 2015, there were (at least) 452 Marvel works posted! In fact, we can look at Marvel in more detail. Here’s Marvel posting rates over time, with the MCU movie release dates overplotted:
Tumblr media
Wow–guess we all hated Civil War, lol. In fact, that dip is so big that you can see it on the Archive-wide stats from the previous post–other fandoms had a small dip there, but nothing like Marvel, so it drives most of the decrease you see in mid-2016.
Here’s a fun comparison: the top 10 fandoms by number of works; by total number of hits on all works; and by median hit count per work, for fandoms with at least 1,000 works. Another way to think of this table is: most popular with creators; most popular with readers; and highest reader-to-creator ratio. For an apples-to-apples comparison, I’m using the number of works in my dataset and not the Archive counts, so this top-fandoms-by-works list is a little different from the plot above.
Tumblr media
The total works/total hits lists are not that different, though there’s some obvious order reshuffling. The top fandoms by median hit count list is really different, though, with only Teen Wolf on there from among the top fandoms by hits or number of works. I can think of two explanations for why those fandoms in particular: either they’ve got massively better fic than other fandoms (hard to know why that would be), or there’s a big unmet desire for fic in those fandoms. Maybe a place to write, if you’re looking for lots of approbation. :)
Do fandoms produce works of the same length?
Tumblr media
Kind of surprisingly: no. Those are big differences: the median BTS fic is 70% longer than the median Sherlock or Supernatural fic! Also note how very small these values are. 50% of all the works in Sherlock fandom are under 1705 words. You can also see that in the wordcount histograms in the last post, of course.
A couple of other questions: how many works are there in a typical fandom?
Tumblr media
The most common number is 1! That’s very surprising to me.
I was also curious about how per-work hit counts relate to the number of works in a fandom. Naively, I would think that having more works in a fandom would increase hit counts: a person who reads a fic about fandom X is likely to want to read more fics about fandom X, so you build a self-sustaining readership if there are lots of fics to choose from. Also, since work creators are a subset of work readers, in general, what writers choose to write in is probably a good proxy for what readers are interested in reading; more fics means more people interested means more readers.
Here’s the actual relationship between number of works and median hit count:
Tumblr media
It’s kind of noisy (meaning the points move around a lot), but for fandoms with more than ~5 works, we do see that more works means more hits. The increase actually stops around 1000 works, which I should have predicted above. (I’ve cut off the graph because it’s very noisy above 10,000 works, but the flattening continues.) Apparently, that’s about the point where you have more works in a fandom than even a devoted reader could read. If you have 10 works, or 20 works, then every possible reader can read everything, so more interest means more hits. But once you have more works than people can read, then, basically, adding readers and adding creators cancel each other out in the average hits per work.
Also kind of interesting is that things with <5 works seem to have more hits on average. I suspect this is because of Yuletide, which steers people to rare fandoms they might not read on their own.
Up next: tags.
122 notes · View notes
data-monkey · 5 years
Text
AO3 stats project: basic questions
Post 2 in my series of posts on statistics of works posted on the Archive of Our Own. The previous post described how I got the data. This one is basic content questions: broad characteristics of the posted works. 
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff
Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
My total dataset includes 4,337,545 works, with a collective hit total of 5,959,490,736 hits, and a collective word count of 28,543,023,393 words. That's equivalent to over 26,000 entire Harry Potter series of words. And the number of hits is an underestimate--anything posted before 2015 or so has the number of hits it had when I first grabbed the data in 2015, missing out on 3-4 years of collecting extra hits.
Let's start by looking at when works on the Archive were posted. (Remember, this only shows the most recent chapter of any work because of how I downloaded the data--so this should underestimate the number of works posted on any given day.) Here's a smoothed graph showing the number of works posted per day over time:
Tumblr media
It increases over time, as you'd expect, since the number of people using the Archive is growing. Cool.
How does the number of works posted vary throughout the year? We have to take out that increasing trend, or we'll always show more works posted in December than January--not because individual authors prefer December, but because more people joined over the course of the year. So I'm going to plot the difference between the number of works posted on an individual day and the number of works that would have been posted if the plot I just showed was a straight increasing line from Nov. 15, 2009 until today. (It isn't, but all the stuff that isn't a straight line is the interesting part!) Here's a heatmap of posting days:
Tumblr media
On light-colored days, there were many more works posted than you would have expected if nobody had any preferences for day of posting. You can see Yuletide pretty clearly (and other winter-break fests), and also Valentine's Day! And, to a lesser extent, Halloween. There seems to be a little less activity in September through November, though that might be an artifact of my simple “linear increase” model, but other than that there aren't particular posting days, averaged over the whole Archive.
Do creators post more on certain days of the week? (Unlike the heatmap above, the colors aren’t important in bar charts like this--I’m just trying to make them look pretty. :) )
Tumblr media
Wow, yeah! About 20% more works posted on Sunday than on Friday, the least-popular day of the week for posting.
Is reader behavior the same? Since the default display is reverse chronological order, people are more likely to view works posted on the day they visited the Archive than works posted earlier, so we can look at the total number of hits for fics posted on the different days:
Tumblr media
This is basically the same as the previous chart, except Thursday just edges Friday as the lowest day of the week. That means that, on average, you’ll get the same number of hits regardless of which day you post. In the past, I’ve sometimes tried to post on a day I think lots of readers will be around to read it, but apparently that wasn’t necessary--readers will find my stuff anyway! Useful to know.
Works aren't tagged with their work type as one of the default pieces of information (unlike, say, word count), but people who post things other than fanfiction can choose to tag their works. The most common non-fanfiction things are podfic (audiobooks), fanart (art), and fanvids (film/video art): how many works get tagged with those tags?
Tumblr media
Again, that's probably an undercount because not everybody tags, and I may have missed some of the relevant tags. This adds up to between 1 and 2% of the total works count in this dataset.
Not everything on the Archive is in English--here are the top ten languages...
Tumblr media
Very dominated by English. Still, I wouldn’t have guessed some of those languages--Bahasa Indonesia in particular, but I also wouldn’t have put Italian or Polish that high.
How are the works distributed among ratings and categories? (Note that works can have multiple categories, but only one rating.)
Tumblr media Tumblr media
I had no idea the ratings distribution looked like that! But I’m not surprised that M/M is dominating the categories. I suspect that looks different at other fic archives like Wattpad and ff.net--the Archive caters to an older and queerer audience, if I remember my fan studies articles properly. Although it might be conventional wisdom more than actual research that makes me think this, so maybe they would look the same.
Users can post complete works to AO3, or they can update works a chapter at a time (often called a “WiP”, or a work in progress). What fraction of the works on AO3 are currently in progress?
Tumblr media
I would have thought the number of works in progress would be higher than that. But I suppose a number of the completed works are former works in progress, now completed. Does this look different if we restrict to the last year or so?
Tumblr media
The bars are much closer in height--there are more works in progress, as a fraction of everything that was posted, in the last couple of years. So, assuming the desire to post WiPs hasn't grown appreciably over time--assuming about ¼ of the fics that get posted start out as WiPs, in 2010 as well as 2019--then the only way to explain the fact that less than ⅕ of works are WiPs now is that they used to be WiPs but now they’re completed. Cool! And good news for people who are reading WiPs--they get finished. :)
Some more basic questions: how long are the works on the Archive?
Tumblr media
Hmm. Okay. So the longest work on the AO3 is over 3 million words. That makes this plot kinda hard to read, because that giant bar on the left includes everything up to about 60,000 words! So I’m going to use a different kind of plot, where I mess with the scale on the bottom so we can see things more clearly--this is called a “log plot” (for logarithm). Now, instead of the vertical lines being every 500,000 words, they indicate factors of 10:
Tumblr media
Ah, now we can see more clearly: the most common wordcount on the Archive is somewhere between 1,000 and 3,000 words, and almost all the works have at least 100 words. That most common wordcount of just a couple of thousand is pretty surprising to me, actually; I thought it would be far longer. But I’m also surprised at just how many 100,000+ word works there are as well.
What about chapter counts?
Tumblr media
That’s fewer chaptered works than I expected, by quite a bit. On the other hand, I was also surprised by how short most works are, so I guess that makes sense: no need to make a 1000-word fic chaptered, in general.
And indeed, as you'd expect, the average number of chapters goes up with word count:
Tumblr media
Though I don't have a pretty plot, I can also tell you that 24% of the works posted on the Archive are a part of a series!
Most of that was about creator behavior. What does reader engagement look like? I’m going to keep using those plots with the power-of-10 x-axis, just so we can see things a little better.
Tumblr media Tumblr media Tumblr media Tumblr media
Lots of things get 10-100 kudos, which is nice for authors. Bookmarks and comment counts are much less common than kudos, though they track with each other pretty well. Note that more things have 2 comments (the tallest bar between 1 and 10) than 1 comment--I guess that means most authors are better than me at responding to comments. :)
Next up: fandoms!
26 notes · View notes
data-monkey · 5 years
Text
AO3 stats project: data provenance
A few years ago, I was curious about some of the characteristics of works posted on the Archive of Our Own, so I scraped some (lots) of data and went about analyzing it. This series of posts describes that analysis. It’s broken down into 7 posts; this is the first one, describing the data set and how it was collected.
The Data | Basic Questions | Fandoms | Tags | Correlations | Kudos | Fun Stuff
Thanks to @eloiserummaging for beta reading these posts; any remaining errors are my own.  A Python notebook showing the code I used to make these plots can be found here.
In case you’ve wandered in here without knowing about the Archive of Our Own (aka AO3), here’s a brief primer. It’s a fan-run, nonprofit archive for all kinds of “fan works”--that means fanfiction, but also other media such as fanart, podfic (fanfiction audiobooks), fanvids (audiovisual media), meta (fanwork criticism), etc. The AO3 has a ratings system and a warnings system, but they’re both optional in the sense that you can choose an “abstain” option (“No Rating” or “Choose Not To Use Archive Warnings”, respectively). When you do a search or look at an index page, you can generally see the title, fandom, creator, summary, warnings, rating, category (since many fanworks are romantic/sexual in nature, this is basically either “gen” for no romance, or the gender configuration of the main relationship(s)), language, word count, comment count, kudos count (you leave “kudos” for the author instead of hitting a “like” button), and whether the work is part of a series. On many works, but not all, you can also see the hit count--this is turned on by default, but authors can turn it off. And finally, there are also generally a number of freeform tags. The AO3 has one of the best tagging systems out there, and I’ll make use of that in some later posts.
Before we get into this, I wanted to shout out @destinationtoast, who has been on the fan stats beat for years. Some of what I’m going to show in the next few posts will duplicate work she or others have done. I’m still showing it because our data collection methods are different, and so the answer they get and the answer I get may be different. Redoing some of the analysis means that all the graphs I am showing are self-consistent. But you should really go check out @toastystats and her great masterpost of stuff if you’re interested in this topic.
For my analysis, I collected data by scraping AO3 works pages in most fandoms over a period of several years, skipping large subfandoms so I didn’t duplicate information--e.g., I did not download “Stargate: SG-1” data because all of those works are also included in “Stargate - All Media Types”. I attempted to be kind to the server load by leaving lag times between successive pages and taking frequent breaks, and I also have regularly donated to the OTW, more than enough to offset the cost of my downloads for this project. The total amount of data is well under a single day’s load on the servers, and collected over several years, although I would guess that on my peak downloading days I was probably the #1 user by data volume.
Here’s what the actual collection dates look like:
Tumblr media
I grabbed a bunch of stuff in 2015, updated again in 2016, let things lie for 18 months, and then picked back up. So the data set I have contains everything posted in 2017 or earlier that was still available when I collected the data, and most (but not all) of the data posted in 2018 or later. 
To avoid re-downloading, if I went back to update a previously-downloaded fandom, I only went back far enough to get to the newest works I’d seen the previous time I downloaded. That means that anything posted before 2016 or so is “frozen” in this data set--it hasn’t had its hits, kudos, or comments updated, and if other changes were made (orphaning, deletion) that is also not captured. If a work moved up in the sort date (for example, if it had a new chapter added), the new data was used instead of the earlier collected data.
I was not logged in, so no private works appear anywhere in this data set. (So you should probably think of this as “stats of public AO3 works” not “stats of AO3 works”--they will probably be different because some fandoms are more likely to privatize their fic than others.) Because I’m not able to get individual permission from authors to show their data, I’ll only be showing aggregate data--the rough cutoff in my head is that I only show data points that represent >1000 works.
I also wanted to download tag information, because the Archive tagging system allows users to choose any tags they like, and then bundles those tags into synonyms in the backend--allowing somebody to misspell a tag “John Waston” instead of “John Watson”, for example, but preserving that they meant “John Watson” for purposes of searching. I downloaded the tag description page for any tag which appears in the data >=500 times and used that data to unify all synonyms of the most-used tags. Tag pages are not redownloaded once downloaded, so tag updates since May 2015 are mostly not represented here, except for a few tags which crossed my lower-cutoff threshold sometime between May 2015 and 2018.
Overall, I downloaded meta-information for 4,337,545 works. I won’t be sharing this dataset because it includes now-deleted works, but if you have a question that a dataset like this could answer, I can try to answer that question, time allowing.
184 notes · View notes