Managing Data
Successes!
Mirza: S. Thailand cit sci worked w/ villagers to conserve Hornbills. People were collecting to supplement income. Got them to become research assistants. Started in 1994 and is ongoing, taught stewardship. Taught people to collet data on diet, breeding, & collected data to publish in peer-reviewed journals. Villagers presenting their work at a conference in May. Success evaluated by end of poaching; has been duplicated in India.
Sesh: wetland waterfowl census, survey birds in an area where there had been no prior survey. Not possible to do a synchronized count, recruited volunteers to help, some were rich and helped sponsor the effort. Announced for volunteers in the newspaper, had a basic screening, to ensure data quality, one of the people would go with the team and had done counts in advance, so they could measure accuracy. Prior to the field work, Tamil field guides were developed and they had a training. Took off and the rich guys started the Pearl City Nature Society, still in operation, has gone from 15 to 70 people. Data quality was pretty robust, did a straight up comparison. Since it was a census, relative abundance was close enough. Data were owned by the organization, volunteers signed off on indemnity forms on data ownership, if they contributed then they co-authored. Initially limited by number of people on board, then put them in teams as there were larger crowds. 300 lakes in 2 days!
Michiel: youth nature orgs in Netherlands, run by people up to 22 yo, worked well in the past but maybe not so much now. What worked well is that the age range was 14-23, a real transition of expertise occurred, and moved up to leadership as they developed knowledge. Whole organization and everything was run by people under 22. This probably helped to get such strong involvement early on, early leadership roles compared to other organizations. Majority of the best biologists in the country came out of these organizations, so it's a real pipeline. They take it very seriously, getting the data, writing the reports, and distributing to national parks. Had to get permits, but that was all arranged by students, including sponsorship, national conferences. Has fewer participants now, but people took it seriously and were really field biologists/naturalists by 17. Consensus on ID for data. Still have logbooks.
Fei: CLEAR - water quality in 2016, for 6 months. Studied upstream and downstream locations in a river. Every weekend they did sampling at 14 locations along the stream, approached the villagers, which helped them recognize the value of taking action, locations near the stream. So they started to steward the stream differently. Collected data for 6 months, villagers asked about results, asked about condition of the river, and recognized their own role for managing the sites. Sent some youths to learn from them, did a 2 day workshop, not detailed but assessment of the river with visual review and organism evaluations. When they originally approached some villagers, they didn't see the problems with their construction practices, but afterward they were able to bring this up to official government.
Pei Rong: Intertidal Watch, young program just started having volunteers last year, one year of data, student comparing cit sci volunteers to "experienced" volunteers, comparisons within the same quadrats. Wanted to tell how volunteers are doing, how much emphasis/trust to put in the data, with 3 or 4 surveys. They tried to get volunteers to ID to species, but sometimes they can't get it quite that far, that's the biggest difference between the cit sci and experienced volunteers. Does it make a difference to ID to species level or does family level work alright? Can still use the data because they're looking at community changes over time, and between sites, the data will still be good for decision support.
Challenges
NCSU guy - Validation: how to get signal from noise? Finding some more or less well defined galaxies in GZ, 3 to 15 viewings per galaxy, 10 seems pretty good - 3 seems like very few. OK if they're weighting by performance, but otherwise sketchy. More replications is less efficient, demands more participation.
Sesh: India biodiversity portal, is a gatekeeper for frogs. They have to upload a photo for verification, and can get help with ID suggestions. If there are 3 people that agree, it's fixed until correction. After awhile it goes to one of the gatekeepers -- has noticed people will just drop names, without saying what the rationale is. Community consensus can flip the ID, in the wrong way, requires manual correction. So gatekeepers are helpful. Or just putting ID please, they're not trying to make an effort at ID -- OK for beginners, but it can't become a habit.
Goh: depending on the study, cit sci isn't always the best approach. Horseshoe crabs were successful because they're easy to ID. Hard to do butterflies, species are hard to tell apart, to non-trained volunteer with a passing interest, you overwhelm them with 200 species, they give up. Some projects are less accessible.
Aleron: Evaluating a lot of papers, you see projects that do validation studies to have higher percentages when they are more focused with fewer species. Hard with multi-taxon. Inverts and marine are difficult. Many papers show that if there are only a few species, validity can be quite high. Can we use citizen science for a good baseline?
Ryan: Baseline for what -- some data is better than none, or from this we can monitor and understand change? Latter is more preferred. At this point we take snapshots and see how things change.
Michiel: some data is better than no data, might disagree with it depending on how people see it. If there's perception that there's data, incentive is to get good data, shift money from one side to another. Not always what you want. If they think your data is good, that might be risky. Don't oversell, it's the wrong signal and understanding of science. Goal is to design it, not just let people collect data and see what comes from it.
Pei Rong: But it's about getting different complementary data, not just for its own sake, varying degrees of quality from across projects, and keep assumptions in mind and how much emphasis on each data set, that can still be useful in making decisions.
Sesh: sustaining efforts -- you invest time and money. It so often happens that it goes to a point, then it becomes a problem and dies off. And the data are either in an online platform or private server, it just goes away. How can people sustain the project? Whether online or on the ground? In one project, the government got involved, and the data goes to the national archives, so it's safe, but don't know what happens after 30 years.
Aleron: the problem is funding? motivations?
Sesh: both, continuity -- ongoing participation and fundings.
Fei: sometimes people ask about money, but they don't have any, people may come and go. This is a normal ebb and flow.
Pei Rong: face these issues all the time, think about sustainability all the time. Make it so simple and easy that anyone can do and analyze. Engagement and buy-in of various groups of people, including parks managers to help run surveys. Training volunteers to lead their own groups, if things are simple enough, you can expand. Automating analysis isn't yet figured out, can probably be done though.
Boyi: volunteers want to be appreciated. Send an e-certificate in appreciation, to keep them on board.
Michiel: what's the balance of what people want? Feeling part of a thing, like member in a project, or self-satisfaction of simply doing it?
Pei Rong: some people like a gift, some people want the data, some people want access to limited-access locations. Sometimes people thank her for the privilege of doing monitoring.
Aleron: more academic approach: what motivates volunteers? Then consolidate that, combination of variables -- how long they have been involved, if they prioritize activities, etc., then predict their motivation. Different reports of projects report generally similar motivations with a few different ones. Singaporeans are more out to learn: sees a difference in his data compared to that from overseas, where people are out to be social with each other. Important to know who is participating.
Pei Rong: Important to be a case study for another researcher! Aleron is helping evaluate aspects of her project. Another collaborator is sampling in an overlapping/complementary fashion to see how the results compare.
----
Next steps/questions
Ryan Thomas: from those who shared, what platforms do you use? Particularly in US cases, smartphones are being pushed heavily, especially with GPS, though it's not always the best match.
Mizra: Indonesian ... Society just launched, is in initial state. Amphibi-Reptile-K? "our amphibian reptiles" for species abundance & distribution. Used iNat app, quite good because they already have data in it, can connect it to older data, can use image metadata from a photo and record notes on paper, then enter online later. Takes a long time for making an app, so used what's already there. Can translate but that's not easy because you have to go via iNat and wait a long time. Translation in complex.
Fei: Same problem. Using paper and pen, then share by phone. Upstream there's no reception due to water catchment and hills. When involving community, they are not trained to enter the data, don't always give full details. People were concerned due to media, that got them interested. Brought a pro in.
Goh: Smartphone for horseshoe crab population monitoring, measure their body size and length. Use pen and paper to measure and record. Got to the point that there is lots of paper, no one to type it in. Switched to Google Forms on smartphones. Felt it is OK, no need for GPS function and didn't require photos. Reduced the problems of the smartphone for that. Also horseshoe crabs move slowly. Just straight-up replacement for pen and paper.
Vasa (sp?): Bangalore, cell phone and paper in use, had people from all walks of life so couldn't require tech or hand any out. GPS Essentials app, works well with 5m accuracy. Only problem was if you haven't set format right, people come in with different kinds of data. Mistakes in conversion, and manual entry errors with formatting, that puts data points in wrong location, so they have to filter/review for that. Another app, FrogFind, specifically for Western Ghats, similar app for trees, birds, butterflies, etc. All sightings can be logged w/o cell connection, just GPS, and syncs when you're back on the network. Describe the ID and there are reviewers to help. Another app called BirdExplorer, linked with eBird, can select the name of the bird and not type it (less error), and select location and it generates the dynamic checklist for what's expected in that place. Were mapping Lorises in Bangalore, had a location in 1000km away. People didn't know they even existed there, were excited when they heard about it, and got involved.
Sesh: data sheets, field notebooks, topo maps with lakes marked out. Very old school. All possible things that could go wrong were worked out. Made them spend 2 hours entering data once they returned from the field, then a press release goes out that night with their names on it. Their lives and agriculture depend on these lakes, so the idea is getting the young people to understand this to get the incentive to protect them. Getting news out to media was one of the best ways to make that happen, they got it and did it, and created their own projects and organizations.
Michiel: Sampling design as an issue for data quality. Sometimes with adequate data, you can get around it. But with too little data, what do you do? A few strategies are out there. Lots of restoration work being done, but no monitoring. No follow up. Often they fail to take. Maybe funding opportunities to complement what people are doing? If people are doing it anyway, try to steer them to adapt so you can get data out of it, background sampling.
Pei Rong: Was doing pen/paper. Has asked SGBio to add a few fields to their app that would support her needs. This year plans to run qual surveys. Just a species list of what's in the area instead of counting. Both meet certain objectives, depends on how you want to use the data, both can make sense, better than not having any data at all. Also collect complementary professional data once a year for cross-checking. Reporting matters -- always some level of uncertainty, important to report it honestly. Also need to communicate the importance of following protocol properly, explain WHY certain things are done this way, they learn more about science and do it better.
Michiel: explaining the protocols is a useful step, they understand sampling better, this is a great contribution to a more informed population. Pushing to involve people in the data, even if they're not that into it. Incentivizing people with visuals is a good strategy, feedback loop helps engage them, e.g. the health trackers. Most people never look at graphs, most students are allergic to them, but some people without quant background, that's amazing.
Mizra: Concern over poaching with critically endangered. How to deal with this? Location fuzzing, different views for public vs internal uses. Has to be decided on a case by case basis.
Patrick: Astro cit sci project online to measure how tightly wound spiral arms are in galaxies. Since it's a website, they could just click around. Less commitment or accountability than going into the field. Pursuant discussion on how data validation has been done in online platforms.
Successes!
Mirza: S. Thailand cit sci worked w/ villagers to conserve Hornbills. People were collecting to supplement income. Got them to become research assistants. Started in 1994 and is ongoing, taught stewardship. Taught people to collet data on diet, breeding, & collected data to publish in peer-reviewed journals. Villagers presenting their work at a conference in May. Success evaluated by end of poaching; has been duplicated in India.
Sesh: wetland waterfowl census, survey birds in an area where there had been no prior survey. Not possible to do a synchronized count, recruited volunteers to help, some were rich and helped sponsor the effort. Announced for volunteers in the newspaper, had a basic screening, to ensure data quality, one of the people would go with the team and had done counts in advance, so they could measure accuracy. Prior to the field work, Tamil field guides were developed and they had a training. Took off and the rich guys started the Pearl City Nature Society, still in operation, has gone from 15 to 70 people. Data quality was pretty robust, did a straight up comparison. Since it was a census, relative abundance was close enough. Data were owned by the organization, volunteers signed off on indemnity forms on data ownership, if they contributed then they co-authored. Initially limited by number of people on board, then put them in teams as there were larger crowds. 300 lakes in 2 days!
Michiel: youth nature orgs in Netherlands, run by people up to 22 yo, worked well in the past but maybe not so much now. What worked well is that the age range was 14-23, a real transition of expertise occurred, and moved up to leadership as they developed knowledge. Whole organization and everything was run by people under 22. This probably helped to get such strong involvement early on, early leadership roles compared to other organizations. Majority of the best biologists in the country came out of these organizations, so it's a real pipeline. They take it very seriously, getting the data, writing the reports, and distributing to national parks. Had to get permits, but that was all arranged by students, including sponsorship, national conferences. Has fewer participants now, but people took it seriously and were really field biologists/naturalists by 17. Consensus on ID for data. Still have logbooks.
Fei: CLEAR - water quality in 2016, for 6 months. Studied upstream and downstream locations in a river. Every weekend they did sampling at 14 locations along the stream, approached the villagers, which helped them recognize the value of taking action, locations near the stream. So they started to steward the stream differently. Collected data for 6 months, villagers asked about results, asked about condition of the river, and recognized their own role for managing the sites. Sent some youths to learn from them, did a 2 day workshop, not detailed but assessment of the river with visual review and organism evaluations. When they originally approached some villagers, they didn't see the problems with their construction practices, but afterward they were able to bring this up to official government.
Pei Rong: Intertidal Watch, young program just started having volunteers last year, one year of data, student comparing cit sci volunteers to "experienced" volunteers, comparisons within the same quadrats. Wanted to tell how volunteers are doing, how much emphasis/trust to put in the data, with 3 or 4 surveys. They tried to get volunteers to ID to species, but sometimes they can't get it quite that far, that's the biggest difference between the cit sci and experienced volunteers. Does it make a difference to ID to species level or does family level work alright? Can still use the data because they're looking at community changes over time, and between sites, the data will still be good for decision support.
Challenges
NCSU guy - Validation: how to get signal from noise? Finding some more or less well defined galaxies in GZ, 3 to 15 viewings per galaxy, 10 seems pretty good - 3 seems like very few. OK if they're weighting by performance, but otherwise sketchy. More replications is less efficient, demands more participation.
Sesh: India biodiversity portal, is a gatekeeper for frogs. They have to upload a photo for verification, and can get help with ID suggestions. If there are 3 people that agree, it's fixed until correction. After awhile it goes to one of the gatekeepers -- has noticed people will just drop names, without saying what the rationale is. Community consensus can flip the ID, in the wrong way, requires manual correction. So gatekeepers are helpful. Or just putting ID please, they're not trying to make an effort at ID -- OK for beginners, but it can't become a habit.
Goh: depending on the study, cit sci isn't always the best approach. Horseshoe crabs were successful because they're easy to ID. Hard to do butterflies, species are hard to tell apart, to non-trained volunteer with a passing interest, you overwhelm them with 200 species, they give up. Some projects are less accessible.
Aleron: Evaluating a lot of papers, you see projects that do validation studies to have higher percentages when they are more focused with fewer species. Hard with multi-taxon. Inverts and marine are difficult. Many papers show that if there are only a few species, validity can be quite high. Can we use citizen science for a good baseline?
Ryan: Baseline for what -- some data is better than none, or from this we can monitor and understand change? Latter is more preferred. At this point we take snapshots and see how things change.
Michiel: some data is better than no data, might disagree with it depending on how people see it. If there's perception that there's data, incentive is to get good data, shift money from one side to another. Not always what you want. If they think your data is good, that might be risky. Don't oversell, it's the wrong signal and understanding of science. Goal is to design it, not just let people collect data and see what comes from it.
Pei Rong: But it's about getting different complementary data, not just for its own sake, varying degrees of quality from across projects, and keep assumptions in mind and how much emphasis on each data set, that can still be useful in making decisions.
Sesh: sustaining efforts -- you invest time and money. It so often happens that it goes to a point, then it becomes a problem and dies off. And the data are either in an online platform or private server, it just goes away. How can people sustain the project? Whether online or on the ground? In one project, the government got involved, and the data goes to the national archives, so it's safe, but don't know what happens after 30 years.
Aleron: the problem is funding? motivations?
Sesh: both, continuity -- ongoing participation and fundings.
Fei: sometimes people ask about money, but they don't have any, people may come and go. This is a normal ebb and flow.
Pei Rong: face these issues all the time, think about sustainability all the time. Make it so simple and easy that anyone can do and analyze. Engagement and buy-in of various groups of people, including parks managers to help run surveys. Training volunteers to lead their own groups, if things are simple enough, you can expand. Automating analysis isn't yet figured out, can probably be done though.
Boyi: volunteers want to be appreciated. Send an e-certificate in appreciation, to keep them on board.
Michiel: what's the balance of what people want? Feeling part of a thing, like member in a project, or self-satisfaction of simply doing it?
Pei Rong: some people like a gift, some people want the data, some people want access to limited-access locations. Sometimes people thank her for the privilege of doing monitoring.
Aleron: more academic approach: what motivates volunteers? Then consolidate that, combination of variables -- how long they have been involved, if they prioritize activities, etc., then predict their motivation. Different reports of projects report generally similar motivations with a few different ones. Singaporeans are more out to learn: sees a difference in his data compared to that from overseas, where people are out to be social with each other. Important to know who is participating.
Pei Rong: Important to be a case study for another researcher! Aleron is helping evaluate aspects of her project. Another collaborator is sampling in an overlapping/complementary fashion to see how the results compare.
----
Next steps/questions
Ryan Thomas: from those who shared, what platforms do you use? Particularly in US cases, smartphones are being pushed heavily, especially with GPS, though it's not always the best match.
Mizra: Indonesian ... Society just launched, is in initial state. Amphibi-Reptile-K? "our amphibian reptiles" for species abundance & distribution. Used iNat app, quite good because they already have data in it, can connect it to older data, can use image metadata from a photo and record notes on paper, then enter online later. Takes a long time for making an app, so used what's already there. Can translate but that's not easy because you have to go via iNat and wait a long time. Translation in complex.
Fei: Same problem. Using paper and pen, then share by phone. Upstream there's no reception due to water catchment and hills. When involving community, they are not trained to enter the data, don't always give full details. People were concerned due to media, that got them interested. Brought a pro in.
Goh: Smartphone for horseshoe crab population monitoring, measure their body size and length. Use pen and paper to measure and record. Got to the point that there is lots of paper, no one to type it in. Switched to Google Forms on smartphones. Felt it is OK, no need for GPS function and didn't require photos. Reduced the problems of the smartphone for that. Also horseshoe crabs move slowly. Just straight-up replacement for pen and paper.
Vasa (sp?): Bangalore, cell phone and paper in use, had people from all walks of life so couldn't require tech or hand any out. GPS Essentials app, works well with 5m accuracy. Only problem was if you haven't set format right, people come in with different kinds of data. Mistakes in conversion, and manual entry errors with formatting, that puts data points in wrong location, so they have to filter/review for that. Another app, FrogFind, specifically for Western Ghats, similar app for trees, birds, butterflies, etc. All sightings can be logged w/o cell connection, just GPS, and syncs when you're back on the network. Describe the ID and there are reviewers to help. Another app called BirdExplorer, linked with eBird, can select the name of the bird and not type it (less error), and select location and it generates the dynamic checklist for what's expected in that place. Were mapping Lorises in Bangalore, had a location in 1000km away. People didn't know they even existed there, were excited when they heard about it, and got involved.
Sesh: data sheets, field notebooks, topo maps with lakes marked out. Very old school. All possible things that could go wrong were worked out. Made them spend 2 hours entering data once they returned from the field, then a press release goes out that night with their names on it. Their lives and agriculture depend on these lakes, so the idea is getting the young people to understand this to get the incentive to protect them. Getting news out to media was one of the best ways to make that happen, they got it and did it, and created their own projects and organizations.
Michiel: Sampling design as an issue for data quality. Sometimes with adequate data, you can get around it. But with too little data, what do you do? A few strategies are out there. Lots of restoration work being done, but no monitoring. No follow up. Often they fail to take. Maybe funding opportunities to complement what people are doing? If people are doing it anyway, try to steer them to adapt so you can get data out of it, background sampling.
Pei Rong: Was doing pen/paper. Has asked SGBio to add a few fields to their app that would support her needs. This year plans to run qual surveys. Just a species list of what's in the area instead of counting. Both meet certain objectives, depends on how you want to use the data, both can make sense, better than not having any data at all. Also collect complementary professional data once a year for cross-checking. Reporting matters -- always some level of uncertainty, important to report it honestly. Also need to communicate the importance of following protocol properly, explain WHY certain things are done this way, they learn more about science and do it better.
Michiel: explaining the protocols is a useful step, they understand sampling better, this is a great contribution to a more informed population. Pushing to involve people in the data, even if they're not that into it. Incentivizing people with visuals is a good strategy, feedback loop helps engage them, e.g. the health trackers. Most people never look at graphs, most students are allergic to them, but some people without quant background, that's amazing.
Mizra: Concern over poaching with critically endangered. How to deal with this? Location fuzzing, different views for public vs internal uses. Has to be decided on a case by case basis.
Patrick: Astro cit sci project online to measure how tightly wound spiral arms are in galaxies. Since it's a website, they could just click around. Less commitment or accountability than going into the field. Pursuant discussion on how data validation has been done in online platforms.