YDT Blog

In the YDT blog you'll find the latest news about the community, tutorials, helpful resources and much more! React to the news with the emotion stickers and have fun!
Bottom image

[D] Could we repurpose a ML algo that detects lies using sub-dermal bloodflow analysis for COVID-19 diagnosis?

Here is the basis for the technology. Detecting cardiovascular abnormalities might be easier than lie detection. The training set could be readily available if test givers could also obtain video footage. Even if the margin of error were 10%, it could save countless lives if it erred on the safe side.

submitted by /u/flamingspew
[link] [comments]

Source: Reddit Machine Learning

[D] Discriminator loss (SDDE) collapsing in an unexpected way.

[D] Discriminator loss (SDDE) collapsing in an unexpected way.

Hello,

I'm trying to implement this paper: https://www.aclweb.org/anthology/N19-1255/

During the training process, the task is to encode the sentences into embedding and use those embedding to create the document embedding.

We then pick few positive and negative samples w.r.t the original document. With the help of the discriminator we try to predict whether the sample belongs to the document or not.

Now, during the training process the loss rapidly falls down to a very high negative value. Printing the output shows that the discriminator predicts either all zero or all one regardless of the sentence being a positive or negative sample.

Task is to minimize the loss which it actually did. However, results are not helpful in any way.

Since the output of D will be between 0 and 1. I am also adding a small epsilon value (1e-6). The loss value always converges to -13.8 ~ log(eps). Hence, network does not end up learning anything.

So, what was the expected result here? Because it seems the optimizer did it work. Should I try a different loss function?

The Loss

Here are some of the implementation details:

Encoder and Decoder are exactly the same as the paper.

Dataset: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

Pre-processing techniques include: removing HTML tags, removal of punctuation and other unwanted characters, remove low frequency word et cetera and finally remove documents with less than 3 total sentences.

Library: PyTorch

Pre-trained embeddings: None (Embedding are jointly trained)

Please help me if I am missing something crucial while implementing the paper.

submitted by /u/Vitrioil
[link] [comments]


Source: Reddit Machine Learning

[D] Item position in Clustering with Gravitational Emulation Local Search

From this research titled Efficient clustering in collaborative filtering recommender system: Hybrid method based on genetic algorithm and gravitational emulation local search algorithm

According to paragraph from the research

The current solution moves in accordance with its current location and its new velocity in the dimension d, and is located in a new location in the dimension d. In fact, the new velocity value is added to the number value of all items in the dth cluster in the current solution

If i have item sets of O={O1,O2,O3,O4,O5}

Assuming i have a solution which i encoded like this, in which C1 is the position in 1st dimension and C2 is the position of 2nd dimension, with velocity value of 1 and i want to move the current solution in dimension 1 (C1)

C={C1={O1, O3}, C2={O2, O4, O5}}

How do i move the current solution? The research mentioned about adding the velocity value to the number value of all items in the dth cluster. How do you add 1 to all value in C1?

I've though about converting the value of C1 to item index, so i can add 1 to all item in C1.

But if i add 1 to all item index in C1

submitted by /u/AndRegaze
[link] [comments]

Source: Reddit Machine Learning

[D] (Rant) What annoys me the most in a time of Machine Learning hype and the current pandemic.

First, this rant is not against people that really know their stuff, knowing the limits of ML and other approaches.

Too many people in the recent years looked at machine learning approaches as a sort of silver bullet solutions. The approach seems like: "ah you build a neural network (or whatever other technique that sounds cool) and after a bit of time it should quickly find the solutions for your". Then they proceed to mention deepmind achievements with alphazero, muzero, alphago, alphastar and so on.

Some months ago I read here, if I am not mistaken, a nice subthread in a discussion where some people pointed out that it all depends on how good the domain is modeled.
If the domain is incomplete, inaccurate or wrong, the most effective machine learning techniques won't help. Some people, correctly, pointed out that one cannot boast ML methods if at the end the problem is not properly modeled.

The best example to me is the current pandemic. If those methods would be a that effective, we could expect quick solutions. Instead modeling the problem of a disease in a human body is so complex that good luck. Surely it will be eventually done, even if with good approximations, but to get the point – that the domain has to be properly simulated – into the most hyped people is really hard. And even when the simulation is proper, it is not granted that a good solution will be found.

That is really frustrating at times in a discussion. Sometimes one reads "Go is incredibly complex, why shouldn't they achieve a similar goal for real life problems", and that shows how people underestimate reality.

submitted by /u/pier4r
[link] [comments]

Source: Reddit Machine Learning

Best countries to immigrate to as a Data Scientist – from Iran!

Hey Everyone,

So after a decade of work as an economic analyst, business analyst and data scientist in Iran, I have come to realize that the local economy has zero interest in transforming into a more data-driven state. Data resources are scarce and something as crucial as EDA is frowned upon out of fear that it may expose financial corruption, contradict government-provided statistics, etc. The most exciting thing that could happen in a data scientist’s life here is creating recommender systems for e-commerce platforms, the ones that have not gone under as results of sanctions anyway.

Situation is so severe that the job search term “Data Scientist” which returns tons of opportunities for other nations, offers next to nothing in Iran (Try this with LinkedIn yourselves for an impression of what I am talking about.)

I’m 32, B.Sc. in Industrial Mathematics, M.Sc. in Industrial Engineering, (specializing in Technology Foresight and Sustainable Development) both from top technology schools in Iran. I do have a fairly solid track record in quantitative business analysis/development.

This great community has always helped me make informed decisions. I would welcome any advice as to which countries offer better prospects for an aspiring data person. I do realize that I may have to start by interning at a new company overseas and I am humble enough to do just that.

Thank you!

submitted by /u/payamv2
[link] [comments]

Source: Reddit Data Science

[P] SOLT: A fast, user friendly, flexible and PyTorch-integrated data augmentation library

Hi,

I have recently made a new release of a new data augmentation library SOLT: https://github.com/MIPT-Oulu/solt.

Docs and examples: https://mipt-oulu.github.io/solt/.

I did not really PR it much before, since I've been really busy with my PhD, but the project is almost 2 years old. Now, when it has matured enough, I would like to share it with the community.

Some features:

  • Tight r/pytorch integration
  • Speed. Due to the tight integration with r/pytorch, we beat other libraries when evaluating full pipelines.
  • Full documentation
  • Convenient serialization / deserialization from yaml or json.
  • Native support of geometric transformations for all datatypes (including keypoints)
  • Focus on code quality (100% codecov)

You can find the examples of usage in the repo 🙂

submitted by /u/aleksei_tiulpin
[link] [comments]

Source: Reddit Machine Learning

Latest Posts